从零开始设计RISC-V处理器——ALU的优化

系列文章目录

(一)从零开始设计RISC-V处理器——指令系统
(二)从零开始设计RISC-V处理器——单周期处理器的设计
(三)从零开始设计RISC-V处理器——单周期处理器的仿真
(四)从零开始设计RISC-V处理器——ALU的优化


文章目录


前言

在前面的三篇文章中,我们已经完成了单周期CPU的设计,从这篇文章开始对这个单周期的CPU进行优化。
首先是对ALU部件进行优化,因为ALU是CPU最重要的部件之一,所有的运算都在这里进行。之前我们把ALU的运算部件分为算数运算单元(加法器),移位运算(移位器),小于置一(比较器),逻辑运算(逻辑门)几个部分。前面的设计中,加法器直接用verilog中的运算符“+”实现,移位器直接用verilog中的运算符“>>,<<”实现,这样相当于把加法器和移位器的设计交给综合工具自动完成,这样设计的ALU的性能是我们不可掌控的。因此,这篇文章就对加法器和移位器的优化做一个介绍。


一、加法器的设计

常见的加法器有很多种,有兴趣的读者可自行学习,此处主要介绍两种加法器:行波进位加法器和超前进位加法器。
行波进位加法器,是由n个一位全加器串联而成,结构非常简单,上一级的进位输出连接到下一级的进位输入。这种加法器结构简单,使用的逻辑资源很少,但是由于是串行计算,下一级的结果依赖于上一级的进位输出,因此对于多位的加法器,计算一次加法所需要的时间非常久。
如果能够提前把每一级的进位输出计算出来,那么就可以有效缩短延时,这便是所谓的超前进位加法器。
下面开始分析。
首先列出一位全加器的真值表

Ai Bi Ci Si Ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

由真值表可以得出以下关系:
Si=Ai ^ Bi ^ Ci;
Ci+1=Bi&Ci + Ai&Ci + Ai&Bi=Ai&Bi + (Ai + Bi)&Ci=Gi + Pi&Ci;
这里定义了两个新的信号:Gi=Ai&Bi, Pi=Ai | Bi;
Gi称为进位产生信号,Pi称为进位传递信号。
Gi为1时,一定产生进位,即Ci+1一定为1;
Gi为0且Pi为1时,进位传递,即把Ci的值传递给Ci+1,Ci为0时,Ci+1为0,Ci为1时,Ci+1为1。
将每一级的进位信号表示如下:
C0=Cin;
C1=A0&B0+(A0+B0)&C0=G0+P0&C0
C2=G1+P1&C1=G1+P1&(G0+P0&C0)=G1+P1&G0+P0&P1&C0
C3=G2+P2&C2=G2+P2&G1+P2&P1&G0+P2&P1&P0&C0
C4=G3+P3&G2+P3&P2&G1+P3&P2&P1&G0+P3&P2&P1&P0&C0
……
通过上式可以看到,以这样的方式展开,每一级的进位信号仅仅依赖于Pi,Gi以及C0,这样我们就可以根据加数A和被加数B,以及进位信号C0,同时求出每一级的进位信号Ci,同时可以求出每一级的Si。
通过以上分析,可以得出结论:
行波加法器,串行计算,结构简单(面积小),延时大
超前进位加法器,并行计算,结构复杂(面积大),延时小。

想要设计一个高性能的32位加法器,理论上可以同时并行生成所有的Si和Ci,但是这样会使电路结构非常复杂,极大的增大电路的扇入和扇出,这在实际设计时也是不可行的。
因此,我们采用组内并行和组间并行的方式实现一个32位加法器,即先实现一个4位超前进位加法器,再用4个4位超前进位加法器组间并行实现16位超前进位加法器,再用2个16位超前进位加法器组间并行实现32位超前进位加法器。
其结构如下:
从零开始设计RISC-V处理器——ALU的优化
实现代码如下:

//4位CLA部件
module cla_4(p,g,c_in,c,gx,px);
input[3:0] p,g;
input c_in;
output[4:1] c;
output gx,px;

assign c[1] = p[0]&c_in | g[0];
assign c[2] = p[1]&p[0]&c_in | p[1]&g[0] | g[1];
assign c[3] = p[2]&p[1]&p[0]&c_in | p[2]&p[1]&g[0] | p[2]&g[1] | g[2];
assign c[4] = gx | px&c_in;

assign px = p[3]&p[2]&p[1]&p[0];
assign gx = g[3] | p[3]&g[2] | p[3]&p[2]&g[1] | p[3]&p[2]&p[1]&g[0];

endmodule

//32位全加器
module cla_adder32(A,B,cin,result,cout);

input [31:0] A;
input [31:0] B;
input cin;
output[31:0] result;
output cout;

进位产生信号,进位传递信号
wire[31:0] TAG,TAP;
wire[32:1] TAC;
wire[15:0] TAG_0,TAP_0;
wire[3:0] TAG_1,TAP_1;
wire[8:1] TAC_1;
wire[4:1] TAC_2;
 
assign result = A ^ B ^ {TAC[31:1],cin};  

进位产生信号,进位传递信号
assign TAG = A&B;
assign TAP = A|B;

cla_4 cla_0_0( .p(TAP[3:0]),  .g(TAG[3:0]),  .c_in(cin), .c(TAC[4:1]),  .gx(TAG_0[0]),.px(TAP_0[0]));
cla_4 cla_0_1( .p(TAP[7:4]),  .g(TAG[7:4]),  .c_in(TAC_1[1]),.c(TAC[8:5]),  .gx(TAG_0[1]),.px(TAP_0[1]));
cla_4 cla_0_2( .p(TAP[11:8]), .g(TAG[11:8]), .c_in(TAC_1[2]),.c(TAC[12:9]), .gx(TAG_0[2]),.px(TAP_0[2]));
cla_4 cla_0_3( .p(TAP[15:12]),.g(TAG[15:12]),.c_in(TAC_1[3]),.c(TAC[16:13]),.gx(TAG_0[3]),.px(TAP_0[3]));
cla_4 cla_0_4( .p(TAP[19:16]),.g(TAG[19:16]),.c_in(TAC_1[4]),.c(TAC[20:17]),.gx(TAG_0[4]),.px(TAP_0[4]));
cla_4 cla_0_5( .p(TAP[23:20]),.g(TAG[23:20]),.c_in(TAC_1[5]),.c(TAC[24:21]),.gx(TAG_0[5]),.px(TAP_0[5]));
cla_4 cla_0_6( .p(TAP[27:24]),.g(TAG[27:24]),.c_in(TAC_1[6]),.c(TAC[28:25]),.gx(TAG_0[6]),.px(TAP_0[6]));
cla_4 cla_0_7( .p(TAP[31:28]),.g(TAG[31:28]),.c_in(TAC_1[7]),.c(TAC[32:29]),.gx(TAG_0[7]),.px(TAP_0[7]));


cla_4 cla_1_0(.p(TAP_0[3:0]),  .g(TAG_0[3:0]),  .c_in(cin),.c(TAC_1[4:1]),  .gx(TAG_1[0]),.px(TAP_1[0]));
cla_4 cla_1_1(.p(TAP_0[7:4]),  .g(TAG_0[7:4]),  .c_in(TAC_1[4]),.c(TAC_1[8:5]),  .gx(TAG_1[1]),.px(TAP_1[1]));

assign TAG_1[3:2] = 2'b00;
assign TAP_1[3:2] = 2'b00;

cla_4 cla_2_0(.p(TAP_1[3:0]),   .g(TAG_1[3:0]),    .c_in(1'b0), .c(TAC_2[4:1]),  .gx(),.px());

assign cout = TAC_2[2];

endmodule

二、移位器的设计

移位运算不涉及到时序,用组合逻辑电路的移位器实现。
移位的原理是使用不同的接线,错位ALU_SHIFT位就实现了移动ALU_SHIFT位,所以使用数据选择器实现。

module Shifter(input [31:0] ALU_DA,
               input [4:0] ALU_SHIFT,
			   input [1:0] Shiftctr,
			   output reg [31:0] shift_result);
			  
  			   reg[31:0] SLL_M,SRL_M,SRA_M;
  			   
    always@(*)//SRL
    begin
      case(ALU_SHIFT)
         5'b00000:SRL_M[31:0]=ALU_DA[31:0]; 
         5'b00001:SRL_M[31:0]={1'd0 ,ALU_DA[31:1]};
         5'b00010:SRL_M[31:0]={2'd0 ,ALU_DA[31:2]};
         5'b00011:SRL_M[31:0]={3'd0 ,ALU_DA[31:3]}; 		 
         5'b00100:SRL_M[31:0]={4'd0 ,ALU_DA[31:4]}; 	
		 5'b00101:SRL_M[31:0]={5'd0 ,ALU_DA[31:5]}; 	
		 5'b00110:SRL_M[31:0]={6'd0 ,ALU_DA[31:6]}; 	
		 5'b00111:SRL_M[31:0]={7'd0 ,ALU_DA[31:7]}; 	
		 5'b01000:SRL_M[31:0]={8'd0 ,ALU_DA[31:8]}; 	
		 5'b01001:SRL_M[31:0]={9'd0 ,ALU_DA[31:9]}; 	
		 5'b01010:SRL_M[31:0]={10'd0,ALU_DA[31:10]}; 	
		 5'b01011:SRL_M[31:0]={11'd0,ALU_DA[31:11]}; 	
		 5'b01100:SRL_M[31:0]={12'd0,ALU_DA[31:12]}; 	
		 5'b01101:SRL_M[31:0]={13'd0,ALU_DA[31:13]}; 	
		 5'b01110:SRL_M[31:0]={14'd0,ALU_DA[31:14]}; 	
		 5'b01111:SRL_M[31:0]={15'd0,ALU_DA[31:15]}; 	
		 5'b10000:SRL_M[31:0]={16'd0,ALU_DA[31:16]}; 	
		 5'b10001:SRL_M[31:0]={17'd0,ALU_DA[31:17]}; 	
		 5'b10010:SRL_M[31:0]={18'd0,ALU_DA[31:18]}; 	
		 5'b10011:SRL_M[31:0]={19'd0,ALU_DA[31:19]}; 	
		 5'b10100:SRL_M[31:0]={20'd0,ALU_DA[31:20]}; 	
		 5'b10101:SRL_M[31:0]={21'd0,ALU_DA[31:21]}; 	
		 5'b10110:SRL_M[31:0]={22'd0,ALU_DA[31:22]}; 	
		 5'b10111:SRL_M[31:0]={23'd0,ALU_DA[31:23]}; 	
		 5'b11000:SRL_M[31:0]={24'd0,ALU_DA[31:24]}; 	
		 5'b11001:SRL_M[31:0]={25'd0,ALU_DA[31:25]}; 
		 5'b11010:SRL_M[31:0]={26'd0,ALU_DA[31:26]}; 
		 5'b11011:SRL_M[31:0]={27'd0,ALU_DA[31:27]}; 
		 5'b11100:SRL_M[31:0]={28'd0,ALU_DA[31:28]}; 
		 5'b11101:SRL_M[31:0]={29'd0,ALU_DA[31:29]}; 
		 5'b11110:SRL_M[31:0]={30'd0,ALU_DA[31:30]};  
         5'b11111:SRL_M[31:0]={31'd0,ALU_DA[31]}; 
         default: SRL_M[31:0]=ALU_DA[31:0]; 
      endcase
    end
    
    always@(*) //SLL
    begin
      case(ALU_SHIFT)
         5'b00000:SLL_M[31:0]=ALU_DA[31:0]; 
         5'b00001:SLL_M[31:0]={ALU_DA[30:0],1'd0};
         5'b00010:SLL_M[31:0]={ALU_DA[29:0],2'd0}; 
         5'b00011:SLL_M[31:0]={ALU_DA[28:0],3'd0};
		 5'b00100:SLL_M[31:0]={ALU_DA[27:0],4'd0};
		 5'b00101:SLL_M[31:0]={ALU_DA[26:0],5'd0};
		 5'b00110:SLL_M[31:0]={ALU_DA[25:0],6'd0};
		 5'b00111:SLL_M[31:0]={ALU_DA[24:0],7'd0};
		 5'b01000:SLL_M[31:0]={ALU_DA[23:0],8'd0};
		 5'b01001:SLL_M[31:0]={ALU_DA[22:0],9'd0};
		 5'b01010:SLL_M[31:0]={ALU_DA[21:0],10'd0};
		 5'b01011:SLL_M[31:0]={ALU_DA[20:0],11'd0};
		 5'b01100:SLL_M[31:0]={ALU_DA[19:0],12'd0};
		 5'b01101:SLL_M[31:0]={ALU_DA[18:0],13'd0};
		 5'b01110:SLL_M[31:0]={ALU_DA[17:0],14'd0};
		 5'b01111:SLL_M[31:0]={ALU_DA[16:0],15'd0};
		 5'b10000:SLL_M[31:0]={ALU_DA[15:0],16'd0};
		 5'b10001:SLL_M[31:0]={ALU_DA[14:0],17'd0};
		 5'b10010:SLL_M[31:0]={ALU_DA[13:0],18'd0};
		 5'b10011:SLL_M[31:0]={ALU_DA[12:0],19'd0};
		 5'b10100:SLL_M[31:0]={ALU_DA[11:0],20'd0};
		 5'b10101:SLL_M[31:0]={ALU_DA[10:0],21'd0};
		 5'b10110:SLL_M[31:0]={ALU_DA[9:0] ,22'd0};
		 5'b10111:SLL_M[31:0]={ALU_DA[8:0] ,23'd0};
		 5'b11000:SLL_M[31:0]={ALU_DA[7:0] ,24'd0};
		 5'b11001:SLL_M[31:0]={ALU_DA[6:0] ,25'd0};
		 5'b11010:SLL_M[31:0]={ALU_DA[5:0] ,26'd0};
		 5'b11011:SLL_M[31:0]={ALU_DA[4:0] ,27'd0};
		 5'b11100:SLL_M[31:0]={ALU_DA[3:0] ,28'd0};
		 5'b11101:SLL_M[31:0]={ALU_DA[2:0] ,29'd0};
		 5'b11110:SLL_M[31:0]={ALU_DA[1:0] ,30'd0};
		 5'b11111:SLL_M[31:0]={ALU_DA[0],31'd0}; 
         default: SLL_M[31:0]=ALU_DA[31:0]; 
      endcase
    end
    
    always@(*) //SRA
    begin
      case(ALU_SHIFT)
         5'b00000:SRA_M[31:0]=ALU_DA[31:0]; 
         5'b00001:SRA_M[31:0]={{1{ALU_DA[31]}},ALU_DA[31:1]};
         5'b00010:SRA_M[31:0]={{2{ALU_DA[31]}},ALU_DA[31:2]}; 
         5'b00011:SRA_M[31:0]={{3{ALU_DA[31]}},ALU_DA[31:3]};
		 5'b00100:SRA_M[31:0]={{4{ALU_DA[31]}},ALU_DA[31:4]}; 
		 5'b00101:SRA_M[31:0]={{5{ALU_DA[31]}},ALU_DA[31:5]};
		 5'b00110:SRA_M[31:0]={{6{ALU_DA[31]}},ALU_DA[31:6]};
		 5'b00111:SRA_M[31:0]={{7{ALU_DA[31]}},ALU_DA[31:7]};
		 5'b01000:SRA_M[31:0]={{8{ALU_DA[31]}},ALU_DA[31:8]};
		 5'b01001:SRA_M[31:0]={{9{ALU_DA[31]}},ALU_DA[31:9]};
		 5'b01010:SRA_M[31:0]={{10{ALU_DA[31]}},ALU_DA[31:10]};
		 5'b01011:SRA_M[31:0]={{11{ALU_DA[31]}},ALU_DA[31:11]};
		 5'b01100:SRA_M[31:0]={{12{ALU_DA[31]}},ALU_DA[31:12]};
		 5'b01101:SRA_M[31:0]={{13{ALU_DA[31]}},ALU_DA[31:13]};
		 5'b01110:SRA_M[31:0]={{14{ALU_DA[31]}},ALU_DA[31:14]};
		 5'b01111:SRA_M[31:0]={{15{ALU_DA[31]}},ALU_DA[31:15]};
		 5'b10000:SRA_M[31:0]={{16{ALU_DA[31]}},ALU_DA[31:16]};
		 5'b10001:SRA_M[31:0]={{17{ALU_DA[31]}},ALU_DA[31:17]};
		 5'b10010:SRA_M[31:0]={{18{ALU_DA[31]}},ALU_DA[31:18]};
		 5'b10011:SRA_M[31:0]={{19{ALU_DA[31]}},ALU_DA[31:19]};
		 5'b10100:SRA_M[31:0]={{20{ALU_DA[31]}},ALU_DA[31:20]};
		 5'b10101:SRA_M[31:0]={{21{ALU_DA[31]}},ALU_DA[31:21]};
		 5'b10110:SRA_M[31:0]={{22{ALU_DA[31]}},ALU_DA[31:22]};
		 5'b10111:SRA_M[31:0]={{23{ALU_DA[31]}},ALU_DA[31:23]};
		 5'b11000:SRA_M[31:0]={{24{ALU_DA[31]}},ALU_DA[31:24]};
		 5'b11001:SRA_M[31:0]={{25{ALU_DA[31]}},ALU_DA[31:25]};
		 5'b11010:SRA_M[31:0]={{26{ALU_DA[31]}},ALU_DA[31:26]};
		 5'b11011:SRA_M[31:0]={{27{ALU_DA[31]}},ALU_DA[31:27]};
		 5'b11100:SRA_M[31:0]={{28{ALU_DA[31]}},ALU_DA[31:28]};
		 5'b11101:SRA_M[31:0]={{29{ALU_DA[31]}},ALU_DA[31:29]};
		 5'b11110:SRA_M[31:0]={{30{ALU_DA[31]}},ALU_DA[31:30]};
		 5'b11111:SRA_M[31:0]={{31{ALU_DA[31]}},ALU_DA[31]}; 
         default: SRA_M[31:0]=ALU_DA[31:0]; 
      endcase
    end
    
    always@(*) //SHIFT
    begin
      case(Shiftctr)
         2'b00:shift_result=SLL_M;
         2'b01:shift_result=SRL_M;
         2'b10:shift_result=SRA_M;
         default: shift_result=ALU_DA; 
      endcase
    end

endmodule

三、仿真测试

单独的加法器和移位器的仿真请读者自行完成,在此不再赘述。
将改进后的ALU模块的代码展示如下:

module alu(
	   ALU_DA,
       ALU_DB,
       ALU_CTL,
       ALU_ZERO,
       ALU_OverFlow,
       ALU_DC   
        );
	input [31:0]    ALU_DA;
    input [31:0]    ALU_DB;
    input [3:0]     ALU_CTL;
    output          ALU_ZERO;
    output          ALU_OverFlow;
    output reg [31:0]   ALU_DC;
		   
//********************generate ctr***********************
wire SUBctr;
wire SIGctr;
wire Ovctr;
wire [1:0] Opctr;
wire [1:0] Logicctr;
wire [1:0] Shiftctr;

assign SUBctr = (~ ALU_CTL[3]  & ~ALU_CTL[2]  & ALU_CTL[1]) | ( ALU_CTL[3]  & ~ALU_CTL[2]);
assign Opctr = ALU_CTL[3:2];
assign Ovctr = ALU_CTL[0] & ~ ALU_CTL[3]  & ~ALU_CTL[2] ;
assign SIGctr = ALU_CTL[0];
assign Logicctr = ALU_CTL[1:0]; 
assign Shiftctr = ALU_CTL[1:0]; 

//********************************************************

//*********************logic op***************************
reg [31:0] logic_result;

always@(*) begin
    case(Logicctr)
	2'b00:logic_result = ALU_DA & ALU_DB;
	2'b01:logic_result = ALU_DA | ALU_DB;
	2'b10:logic_result = ALU_DA ^ ALU_DB;
	2'b11:logic_result = ~(ALU_DA | ALU_DB);
	endcase
end 

//********************************************************
//************************shift op************************
wire [4:0]     ALU_SHIFT;
wire [31:0] shift_result;
assign ALU_SHIFT=ALU_DB[4:0];

Shifter Shifter(.ALU_DA(ALU_DA),
                .ALU_SHIFT(ALU_SHIFT),
				.Shiftctr(Shiftctr),
				.shift_result(shift_result));

//********************************************************
//************************add sub op**********************
wire [31:0] BIT_M,XOR_M;
wire ADD_carry,ADD_OverFlow;
wire [31:0] ADD_result;

assign BIT_M={32{SUBctr}};
assign XOR_M=BIT_M^ALU_DB;

Adder Adder(.A(ALU_DA),
            .B(XOR_M),
			.Cin(SUBctr),
			.ALU_CTL(ALU_CTL),
			.ADD_carry(ADD_carry),
			.ADD_OverFlow(ADD_OverFlow),
			.ADD_zero(ALU_ZERO),
			.ADD_result(ADD_result));

assign ALU_OverFlow = ADD_OverFlow & Ovctr;

//********************************************************
//**************************slt op************************
wire [31:0] SLT_result;
wire LESS_M1,LESS_M2,LESS_S,SLT_M;

assign LESS_M1 = ADD_carry ^ SUBctr;
assign LESS_M2 = ADD_OverFlow ^ ADD_result[31];
assign LESS_S = (SIGctr==1'b0)?LESS_M1:LESS_M2;
assign SLT_result = (LESS_S)?32'h00000001:32'h00000000;

//********************************************************
//**************************ALU result********************
always @(*) 
begin
  case(Opctr)
     2'b00:ALU_DC<=ADD_result;
     2'b01:ALU_DC<=logic_result;
     2'b10:ALU_DC<=SLT_result;
     2'b11:ALU_DC<=shift_result; 
  endcase
end

//********************************************************
endmodule


//********************************************************
//*************************shifter************************
//`define BEHAVOR
module Shifter(input [31:0] ALU_DA,
               input [4:0] ALU_SHIFT,
			   input [1:0] Shiftctr,
			   output reg [31:0] shift_result);
			   
`ifdef BEHAVOR
     wire [5:0] shift_n;
	 assign shift_n = 6'd32 - Shiftctr;
     always@(*) begin
	   case(Shiftctr)
	   2'b00:shift_result = ALU_DA << ALU_SHIFT;
	   2'b01:shift_result = ALU_DA >> ALU_SHIFT;
	   2'b10:shift_result = ({32{ALU_DA[31]}} << shift_n) | (ALU_DA >> ALU_SHIFT);
	   default:shift_result = ALU_DA;
	   endcase
	 end
`else
    reg[31:0] SLL_M,SRL_M,SRA_M;
    always@(*)//SRL
    begin
      case(ALU_SHIFT)
         5'b00000:SRL_M[31:0]=ALU_DA[31:0]; 
         5'b00001:SRL_M[31:0]={1'd0 ,ALU_DA[31:1]};
         5'b00010:SRL_M[31:0]={2'd0 ,ALU_DA[31:2]};
         5'b00011:SRL_M[31:0]={3'd0 ,ALU_DA[31:3]}; 		 
         5'b00100:SRL_M[31:0]={4'd0 ,ALU_DA[31:4]}; 	
		 5'b00101:SRL_M[31:0]={5'd0 ,ALU_DA[31:5]}; 	
		 5'b00110:SRL_M[31:0]={6'd0 ,ALU_DA[31:6]}; 	
		 5'b00111:SRL_M[31:0]={7'd0 ,ALU_DA[31:7]}; 	
		 5'b01000:SRL_M[31:0]={8'd0 ,ALU_DA[31:8]}; 	
		 5'b01001:SRL_M[31:0]={9'd0 ,ALU_DA[31:9]}; 	
		 5'b01010:SRL_M[31:0]={10'd0,ALU_DA[31:10]}; 	
		 5'b01011:SRL_M[31:0]={11'd0,ALU_DA[31:11]}; 	
		 5'b01100:SRL_M[31:0]={12'd0,ALU_DA[31:12]}; 	
		 5'b01101:SRL_M[31:0]={13'd0,ALU_DA[31:13]}; 	
		 5'b01110:SRL_M[31:0]={14'd0,ALU_DA[31:14]}; 	
		 5'b01111:SRL_M[31:0]={15'd0,ALU_DA[31:15]}; 	
		 5'b10000:SRL_M[31:0]={16'd0,ALU_DA[31:16]}; 	
		 5'b10001:SRL_M[31:0]={17'd0,ALU_DA[31:17]}; 	
		 5'b10010:SRL_M[31:0]={18'd0,ALU_DA[31:18]}; 	
		 5'b10011:SRL_M[31:0]={19'd0,ALU_DA[31:19]}; 	
		 5'b10100:SRL_M[31:0]={20'd0,ALU_DA[31:20]}; 	
		 5'b10101:SRL_M[31:0]={21'd0,ALU_DA[31:21]}; 	
		 5'b10110:SRL_M[31:0]={22'd0,ALU_DA[31:22]}; 	
		 5'b10111:SRL_M[31:0]={23'd0,ALU_DA[31:23]}; 	
		 5'b11000:SRL_M[31:0]={24'd0,ALU_DA[31:24]}; 	
		 5'b11001:SRL_M[31:0]={25'd0,ALU_DA[31:25]}; 
		 5'b11010:SRL_M[31:0]={26'd0,ALU_DA[31:26]}; 
		 5'b11011:SRL_M[31:0]={27'd0,ALU_DA[31:27]}; 
		 5'b11100:SRL_M[31:0]={28'd0,ALU_DA[31:28]}; 
		 5'b11101:SRL_M[31:0]={29'd0,ALU_DA[31:29]}; 
		 5'b11110:SRL_M[31:0]={30'd0,ALU_DA[31:30]};  
         5'b11111:SRL_M[31:0]={31'd0,ALU_DA[31]}; 
         default: SRL_M[31:0]=ALU_DA[31:0]; 
      endcase
    end
    
    always@(*) //SLL
    begin
      case(ALU_SHIFT)
         5'b00000:SLL_M[31:0]=ALU_DA[31:0]; 
         5'b00001:SLL_M[31:0]={ALU_DA[30:0],1'd0};
         5'b00010:SLL_M[31:0]={ALU_DA[29:0],2'd0}; 
         5'b00011:SLL_M[31:0]={ALU_DA[28:0],3'd0};
		 5'b00100:SLL_M[31:0]={ALU_DA[27:0],4'd0};
		 5'b00101:SLL_M[31:0]={ALU_DA[26:0],5'd0};
		 5'b00110:SLL_M[31:0]={ALU_DA[25:0],6'd0};
		 5'b00111:SLL_M[31:0]={ALU_DA[24:0],7'd0};
		 5'b01000:SLL_M[31:0]={ALU_DA[23:0],8'd0};
		 5'b01001:SLL_M[31:0]={ALU_DA[22:0],9'd0};
		 5'b01010:SLL_M[31:0]={ALU_DA[21:0],10'd0};
		 5'b01011:SLL_M[31:0]={ALU_DA[20:0],11'd0};
		 5'b01100:SLL_M[31:0]={ALU_DA[19:0],12'd0};
		 5'b01101:SLL_M[31:0]={ALU_DA[18:0],13'd0};
		 5'b01110:SLL_M[31:0]={ALU_DA[17:0],14'd0};
		 5'b01111:SLL_M[31:0]={ALU_DA[16:0],15'd0};
		 5'b10000:SLL_M[31:0]={ALU_DA[15:0],16'd0};
		 5'b10001:SLL_M[31:0]={ALU_DA[14:0],17'd0};
		 5'b10010:SLL_M[31:0]={ALU_DA[13:0],18'd0};
		 5'b10011:SLL_M[31:0]={ALU_DA[12:0],19'd0};
		 5'b10100:SLL_M[31:0]={ALU_DA[11:0],20'd0};
		 5'b10101:SLL_M[31:0]={ALU_DA[10:0],21'd0};
		 5'b10110:SLL_M[31:0]={ALU_DA[9:0] ,22'd0};
		 5'b10111:SLL_M[31:0]={ALU_DA[8:0] ,23'd0};
		 5'b11000:SLL_M[31:0]={ALU_DA[7:0] ,24'd0};
		 5'b11001:SLL_M[31:0]={ALU_DA[6:0] ,25'd0};
		 5'b11010:SLL_M[31:0]={ALU_DA[5:0] ,26'd0};
		 5'b11011:SLL_M[31:0]={ALU_DA[4:0] ,27'd0};
		 5'b11100:SLL_M[31:0]={ALU_DA[3:0] ,28'd0};
		 5'b11101:SLL_M[31:0]={ALU_DA[2:0] ,29'd0};
		 5'b11110:SLL_M[31:0]={ALU_DA[1:0] ,30'd0};
		 5'b11111:SLL_M[31:0]={ALU_DA[0],31'd0}; 
         default: SLL_M[31:0]=ALU_DA[31:0]; 
      endcase
    end
    
    always@(*) //SRA
    begin
      case(ALU_SHIFT)
         5'b00000:SRA_M[31:0]=ALU_DA[31:0]; 
         5'b00001:SRA_M[31:0]={{1{ALU_DA[31]}},ALU_DA[31:1]};
         5'b00010:SRA_M[31:0]={{2{ALU_DA[31]}},ALU_DA[31:2]}; 
         5'b00011:SRA_M[31:0]={{3{ALU_DA[31]}},ALU_DA[31:3]};
		 5'b00100:SRA_M[31:0]={{4{ALU_DA[31]}},ALU_DA[31:4]}; 
		 5'b00101:SRA_M[31:0]={{5{ALU_DA[31]}},ALU_DA[31:5]};
		 5'b00110:SRA_M[31:0]={{6{ALU_DA[31]}},ALU_DA[31:6]};
		 5'b00111:SRA_M[31:0]={{7{ALU_DA[31]}},ALU_DA[31:7]};
		 5'b01000:SRA_M[31:0]={{8{ALU_DA[31]}},ALU_DA[31:8]};
		 5'b01001:SRA_M[31:0]={{9{ALU_DA[31]}},ALU_DA[31:9]};
		 5'b01010:SRA_M[31:0]={{10{ALU_DA[31]}},ALU_DA[31:10]};
		 5'b01011:SRA_M[31:0]={{11{ALU_DA[31]}},ALU_DA[31:11]};
		 5'b01100:SRA_M[31:0]={{12{ALU_DA[31]}},ALU_DA[31:12]};
		 5'b01101:SRA_M[31:0]={{13{ALU_DA[31]}},ALU_DA[31:13]};
		 5'b01110:SRA_M[31:0]={{14{ALU_DA[31]}},ALU_DA[31:14]};
		 5'b01111:SRA_M[31:0]={{15{ALU_DA[31]}},ALU_DA[31:15]};
		 5'b10000:SRA_M[31:0]={{16{ALU_DA[31]}},ALU_DA[31:16]};
		 5'b10001:SRA_M[31:0]={{17{ALU_DA[31]}},ALU_DA[31:17]};
		 5'b10010:SRA_M[31:0]={{18{ALU_DA[31]}},ALU_DA[31:18]};
		 5'b10011:SRA_M[31:0]={{19{ALU_DA[31]}},ALU_DA[31:19]};
		 5'b10100:SRA_M[31:0]={{20{ALU_DA[31]}},ALU_DA[31:20]};
		 5'b10101:SRA_M[31:0]={{21{ALU_DA[31]}},ALU_DA[31:21]};
		 5'b10110:SRA_M[31:0]={{22{ALU_DA[31]}},ALU_DA[31:22]};
		 5'b10111:SRA_M[31:0]={{23{ALU_DA[31]}},ALU_DA[31:23]};
		 5'b11000:SRA_M[31:0]={{24{ALU_DA[31]}},ALU_DA[31:24]};
		 5'b11001:SRA_M[31:0]={{25{ALU_DA[31]}},ALU_DA[31:25]};
		 5'b11010:SRA_M[31:0]={{26{ALU_DA[31]}},ALU_DA[31:26]};
		 5'b11011:SRA_M[31:0]={{27{ALU_DA[31]}},ALU_DA[31:27]};
		 5'b11100:SRA_M[31:0]={{28{ALU_DA[31]}},ALU_DA[31:28]};
		 5'b11101:SRA_M[31:0]={{29{ALU_DA[31]}},ALU_DA[31:29]};
		 5'b11110:SRA_M[31:0]={{30{ALU_DA[31]}},ALU_DA[31:30]};
		 5'b11111:SRA_M[31:0]={{31{ALU_DA[31]}},ALU_DA[31]}; 
         default: SRA_M[31:0]=ALU_DA[31:0]; 
      endcase
    end
    
    always@(*) //SHIFT
    begin
      case(Shiftctr)
         2'b00:shift_result=SLL_M;
         2'b01:shift_result=SRL_M;
         2'b10:shift_result=SRA_M;
         default: shift_result=ALU_DA; 
      endcase
    end

`endif

endmodule

//*************************************************************
//***********************************adder*********************

//`define ALGORITHM
module Adder(input [31:0] A,
             input [31:0] B,
			 input Cin,
			 input [3:0] ALU_CTL,
			 output ADD_carry,
			 output ADD_OverFlow,
			 output ADD_zero,
			 output [31:0] ADD_result);

`ifdef ALGORITHM
    assign {ADD_carry,ADD_result}=A+B+Cin;
`else
    cla_adder32 cla_adder32_inst1(.A(A),
	                        .B(B),
							.cin(Cin),
							.cout(ADD_carry),
							.result(ADD_result));	
`endif

   assign ADD_zero = ~(|ADD_result);
   assign ADD_OverFlow=((ALU_CTL==4'b0001) & ~A[31] & ~B[31] & ADD_result[31]) 
                      | ((ALU_CTL==4'b0001) & A[31] & B[31] & ~ADD_result[31])
                      | ((ALU_CTL==4'b0011) & A[31] & ~B[31] & ~ADD_result[31]) 
					  | ((ALU_CTL==4'b0011) & ~A[31] & B[31] & ADD_result[31]);
endmodule

//************************************************************************************
//***************************************cla_adder************************************

4位CLA部件
module cla_4(p,g,c_in,c,gx,px);
input[3:0] p,g;
input c_in;
output[4:1] c;
output gx,px;

assign c[1] = p[0]&c_in | g[0];
assign c[2] = p[1]&p[0]&c_in | p[1]&g[0] | g[1];
assign c[3] = p[2]&p[1]&p[0]&c_in | p[2]&p[1]&g[0] | p[2]&g[1] | g[2];
assign c[4] = gx | px&c_in;

assign px = p[3]&p[2]&p[1]&p[0];
assign gx = g[3] | p[3]&g[2] | p[3]&p[2]&g[1] | p[3]&p[2]&p[1]&g[0];

endmodule


//32位全加器
module cla_adder32(A,B,cin,result,cout);

input [31:0] A;
input [31:0] B;
input cin;
output[31:0] result;
output cout;


进位产生信号,进位传递信号
wire[31:0] TAG,TAP;
wire[32:1] TAC;
wire[15:0] TAG_0,TAP_0;
wire[3:0] TAG_1,TAP_1;
wire[8:1] TAC_1;
wire[4:1] TAC_2;
 
assign result = A ^ B ^ {TAC[31:1],cin};  
assign TAG = A&B;
assign TAP = A|B;

cla_4 cla_0_0( .p(TAP[3:0]),  .g(TAG[3:0]),  .c_in(cin), .c(TAC[4:1]),  .gx(TAG_0[0]),.px(TAP_0[0]));
cla_4 cla_0_1( .p(TAP[7:4]),  .g(TAG[7:4]),  .c_in(TAC_1[1]),.c(TAC[8:5]),  .gx(TAG_0[1]),.px(TAP_0[1]));
cla_4 cla_0_2( .p(TAP[11:8]), .g(TAG[11:8]), .c_in(TAC_1[2]),.c(TAC[12:9]), .gx(TAG_0[2]),.px(TAP_0[2]));
cla_4 cla_0_3( .p(TAP[15:12]),.g(TAG[15:12]),.c_in(TAC_1[3]),.c(TAC[16:13]),.gx(TAG_0[3]),.px(TAP_0[3]));
cla_4 cla_0_4( .p(TAP[19:16]),.g(TAG[19:16]),.c_in(TAC_1[4]),.c(TAC[20:17]),.gx(TAG_0[4]),.px(TAP_0[4]));
cla_4 cla_0_5( .p(TAP[23:20]),.g(TAG[23:20]),.c_in(TAC_1[5]),.c(TAC[24:21]),.gx(TAG_0[5]),.px(TAP_0[5]));
cla_4 cla_0_6( .p(TAP[27:24]),.g(TAG[27:24]),.c_in(TAC_1[6]),.c(TAC[28:25]),.gx(TAG_0[6]),.px(TAP_0[6]));
cla_4 cla_0_7( .p(TAP[31:28]),.g(TAG[31:28]),.c_in(TAC_1[7]),.c(TAC[32:29]),.gx(TAG_0[7]),.px(TAP_0[7]));



cla_4 cla_1_0(.p(TAP_0[3:0]),  .g(TAG_0[3:0]),  .c_in(cin),.c(TAC_1[4:1]),  .gx(TAG_1[0]),.px(TAP_1[0]));


cla_4 cla_1_1(.p(TAP_0[7:4]),  .g(TAG_0[7:4]),  .c_in(TAC_1[4]),.c(TAC_1[8:5]),  .gx(TAG_1[1]),.px(TAP_1[1]));

assign TAG_1[3:2] = 2'b00;
assign TAP_1[3:2] = 2'b00;

cla_4 cla_2_0(.p(TAP_1[3:0]),   .g(TAG_1[3:0]),    .c_in(1'b0), .c(TAC_2[4:1]),  .gx(),.px());

assign cout = TAC_2[2];

endmodule

此处对改进后的CPU进行测试。
测试用例是上一篇文章中的跳转指令,移位运算指令和算数运算指令。
算数运算指令如下:

addi x1,x0,1
add x2,x1,x1
add x3,x2,x2
sub x3,x3,x1

addi x4,x0,-4
add x5,x4,x4
sub x6,x5,x3
sub x7,x6,x4
add x8,x7,x3

汇编器执行结果如下:
从零开始设计RISC-V处理器——ALU的优化

CPU执行结果如下:
从零开始设计RISC-V处理器——ALU的优化

移位运算指令如下:

addi x1,x0,0xff
addi x2,x0,4
sll x3,x1,x2
srl x4,x1,x2
sra x5,x1,x2

addi x6,x0,-0xff
sll x7,x6,x2
srl x8,x6,x2
sra x9,x6,x2

slli x11,x1,4
srli x12,x1,4
srai x13,x1,4

slli x11,x6,4
srli x12,x6,4
srai x13,x6,4

汇编器执行结果如下:
从零开始设计RISC-V处理器——ALU的优化
从零开始设计RISC-V处理器——ALU的优化

CPU执行结果如下:
从零开始设计RISC-V处理器——ALU的优化

跳转指令如下:

addi x1,x0,1
addi x2,x0,2
jal x31,label1
addi x3,x0,3
label1:
addi x4,x0,4
add x5,x2,x2
beq x4,x5,label2
addi x6,x0,6
label2:
bne x4,x5,label3
addi x7,x0,7
label3:
bne x7,x6,label4
addi x8,x0,8
label4:
addi x9,x0,0x30
jalr x10,x9,12
addi x11,x0,11
addi x12,x0,-12
addi x13,x0,-13
blt x13,x12,label5
addi x14,x0,-14
label5:
bltu x13,x12,label6
addi x15,x0,-15
label6:
bltu x12,x13,label7
addi x16,x0,-16
label7:
bge x12,x13,label8
addi x17,x0,-17
label8:
bge x1,x2,label9
addi x18,x0,-18
label9:
bgeu x12,x13,label10
addi x19,x0,-19
label10:
bgeu x13,x12,label11
addi x20,x0,-20
label11:
addi x21,x0,-20
addi x22,x0,-20
bge x21,x22,label12
addi x23,x0,-23
label12:
addi x24,x0,-24

汇编器执行结果如下:
从零开始设计RISC-V处理器——ALU的优化
从零开始设计RISC-V处理器——ALU的优化
从零开始设计RISC-V处理器——ALU的优化

CPU执行结果如下:
从零开始设计RISC-V处理器——ALU的优化
从零开始设计RISC-V处理器——ALU的优化

总结

以上就是今天要讲的内容,对ALU的加法器和移位两个运算部件进行了详细的设计。下一篇文章开始进行流水线处理器的设计。
上一篇:JAVA基本语法-switch--从键盘上输入year、month和day,要求通过程序输出该日期为该年的第几天


下一篇:2022年规划