RISC-V is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. As a RISC architecture, the RISC-V ISA is a load–store architecture. Its floating-point instructions use IEEE 754 floating-point. Notable features of the RISC-V ISA include: instruction bit field locations chosen to simplify the use of multiplexers in a CPU,a design that is architecturally neutral, and a fixed location for the sign bit of immediate values to speed up sign extension.
The instruction set is designed for a wide range of uses. The base instruction set has a fixed length of 32-bit naturally aligned instructions, and the ISA supports variable length extensions where each instruction can be any number of 16-bit parcels in length. Subsets support small embedded systems, personal computers, supercomputers with vector processors, and warehouse-scale 19 inch rack-mounted parallel computers.
Introduction to RISC-V basic keywords
Introduction to RISC-V basic keywords
Why does a computer needs a RISC or CISC ISA?
Any computer program or software inorder to work on a computer hardware needs to communicate to the layout(chip present on system). Accomplishment of which requires a process to be followed. First the high level language program is converted to assembly level program(which follows a particular architecture RISC-V in this case). After which it's converted to machine level program for computer to understand.For communication between architeture to layout there is need for a interface, called HDL(Hardware Description Language).
Below image show the whole process of program or application execution.
Applications to Hardware
Inorder to run any application on the computer system. Below process needs to be followed.
Operating system, compiler, assembler all three combined are termed as system software.
The assembly language program is dependent on the processor and its architecture. Every architeture has its own assembly language program. Converting assembly language program to machine level program is done using a specific process, which is elaborated in the flowchart below.
Detailed description of detailed of Course content
The course deals with a elaborative study of the instruction types present in the RISC-V architeture. Here I have mentioned types of instruction sets present in the RISC-V architecture
-
Pseduo Instuctions- Examples of pseduo instructions are mv,li,ret.
-
Base Integer Instructions - The nomenclature for these instructions is RV64I here RV stands for RISC-V, 64 stands for 64 bit integer. Few examples of base integer instructions are lui,addi,jalr,auipc,ld.
-
Multiply extension- If there is multiply or divide operation needs to be performed on the numbers these instructions are used. Nomencalture for these instructions is RV64M, and if its multiplication or division on base integer than its nomencleture would be RV64Im
-
Single & double precision floating point extension- If add/sub/divide/multiply is performed on the floating point number this instruction set is used. RV64F & RV64D. Few examples are flw,fadd.s,fcvt.s.s,fmv.x.d,fsd,fmul.s,fdiv.s,fmv.x.d. A CPU which performs all above operations is termed as RV64IMFD.
-
Application Binary interface- This is made so that application programmers can access resources of processor like register. Few examples are a0,SP,s0.
-
Memory allocation & stack pointer- Transfer of data from memory to registers, stack pointer. Example ra,24(sp),s0, 16(sp),Sp,32.
Labwork for RISC-V software toolchain
Labwork for RISC-V software toolchain
C Program to compute sum from 1 to N.
Here I wrote a C program to calculate the sum of n numbers. Input is taken from user. C code for is as follows
#include <stdio.h>
int main()
{
int n,sum=0;
printf("Enter n: ");
scanf("%d",&n);
for(int i=1;i<=n;i++)
sum =sum+i;
printf("sum of %d numbers is %d\n",n,sum);
return 1;
}
To get the output of the above program i wrote following commands
gcc file_name.c
./a.out
The following I got in when program is run on the system. The image shows the sum first 100 natural numbers
RISC-V GCC compile And Dissemble Here I observed the difference in RISC-V instructions first I used the command
/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/risv64-unknown-elf-gcc-O1 -mabi=lp64 -march=rv64i -o sum1ton.o sum1ton.c
The following assembly level codes list was way too long to filtered the main portion in which we are interested is seen by the following command
riscv64-unknown-elf-objdump -d sum1ton.o | less
The following instructions were obtained
After this I entered the command
/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/risv64-unknown-elf-gcc -Ofast -ch=rv64i -o sum1ton.o sum1ton.c
Using the less command above mentioned I got the following results
Spike simulation and debug
To get the same output on RISCV I used the following commands
/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk sum1ton.o
Now here are the commands which I used to debug the assembly level program
/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/spike -d pk sum1ton.o
Following are the the debug commands I used
until pc 0 1000b0 // This indicates start and end address for the commands.
reg 0 a2 // This command is used to check the contents of the register
lui // load upper immediate
q // quit
reg 0 sp // Knowing value stored in Stack Pointer
addi // add immediate
Below is the screenshot for the commands used
Below is a self explanatory image of 64 bit instruction and instruction used in the RISC-V
Integer number representation
Integer number representation
64-bit Number System For Unsigned numbers Here first of all we will get familiar with few basic terminologies
Double Word:- Entire 64 bit number in processor language is called double word.
Word:- 32 bit number in processor language
Byte:- Group of 8 bits.
Total no. of pattern that can be formed is = (2^n -1); where n:- number of bits.
RISC-V doubleword can represent "0" to (2^64-1) unsigned numbers.
The following images shows terminologies range and binary to decimal conversion
64 Number System for Signed Numbers
For getting negative numbers we use concept of 2's complement which is shown in the image below.
Here we are devoting MSB for sign representation.
if MSB =1; number is negative if MSB =0; number is positive.
The image below describes the two method to convert negative binary numbers into decimal numbers
Range for positive & negative numbers is shown below
Lab for signed & unsigned numbers Here we will look at the range of unsigned and signed numbers.
Following is the code for highest unsigned number
#include <stdio.h>
#include <math.h>
int main ()
{ unsigned long long int max = (unsigned long long int) (pow(2,64) - 1);
printf("highest number represented by unsigned long long int is %llu\n", max);
return 0;
}
To run the command I used following commands in the terminal
/home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o unsigned.o unsigned.c
/home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk unsigned.o
One can observe the output in the below image
For getting the lowest negative number following C code was used
#include <stdio.h>
#include <math.h>
int main ()
{ long long int max = ( long long int) (pow(2,64) * - 1);
printf("highest number represented by long long int is %lld\n", max);
return 0;
}
To run the above code following commands were used
/home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o signed.o signed.c
/home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk signed.o
The below image shows the output obtained
Now we will look at the range of least negative and highest positive number, code for which is given below
#include <stdio.h>
#include <math.h>
int main() {
long long int max = (int) (pow(2,63) -1);
long long int min = (int) (pow(2,63) * -1);
printf("highest number represented by long long int is %lld\n", max);
printf("lowest number represented by long long int is %lld\n", min);
return 0;
Here we can see the range is not correct.
The correct code and output is given below
#include <stdio.h>
#include <math.h>
int main() {
long long int max = (long long int) (pow(2,63) -1);
long long int min = (long long int) (pow(2,63) * -1);
printf("highest number represented by long long int is %lld\n", max);
printf("lowest number represented by long long int is %lld\n", min);
return 0;
}
Application Binary Interface
Application Binary interface
Introduction to Application binary interface- The way a user can access a architeture resources through system call is called application binary interface, its also calledsystem call interface. If application programmmer wants to access the hardware resources it is done via registers.
The below image shows the different levels between user and layout.
In RISc-V programmer there are 32 registers & width is defined by XLEN. XLEN is 32 bit for RV32 XLEN is 64 bit for RV64.
Memory Allocation for Double words
RISC-V has 32 64-bit registers. There are two ways in which data can be loaded to the register.
- Direct loading- In this method data is directly loaded to the register. The below image shows the method
- Via memory- Since we have limited registers in RISC-V the data is first stored in the memory this data is then transfered to registers. The below image show the method.
Little endian method- The RISC-V uses the little endian approach to fill the data in the memory i.e. the data from LSB gets start filling in the memory, from bottom to top respectively. A pictorial presentation of which is shown in the image below.
Load,Add and Store Instructions with examples
Here I came to know about the how data is transfered from memory to register and add operation on the data and then transfer of data from register to memory. Following commands were used to do the above operations.
ld x8, 16(x23) // ld stands from load. Initially the pointer is at 0. Since the data is at 16th location the register x23 will go to 16th location and load that
data to into x8. x8 is destination register and x23 is source register.
add x8, x24,x8 // here the data of x8 and x24 is added and then finally stored in x8.
sd x8, 8(x23) // here the data from x23 register is stored to the memory location starting from 8.
The whole process discussed above is shown in the below two images.
The above picture also describes which bits are indicate which part of the assembly level language. Every instruction in RISC-V is 32 bit.
Concluding 32-registers And their respective ABI names
There are following type of instructions
-
R-type:- These instructions operate on registers.
-
I-type:- These instructions consists immediate in it and operates on registers.
-
S-type:- Instructions that consists store in it.
As we can observe there are 5 bits dedicated for register in the machine level code. As 2^5= 32 this the logic behind having 32 registers in the RISC-V architeture.
The RISC-V instructions are bifurgated in following types shown in the table below.
Labwork using ABI function calls
Study New Algorithm For Sum 1 to N using ASM Here we are going to apply the knowledge of instructions which we got familiar in the previous tutorial. Here we are going to push some functionalities from C program to assembly language program. And get fetch the end result from assembly level program to the C program. A pictorial view of the above mention method is shown below.
To apply this method we are going to follow the below algorithm shown in the picture
Review ASM Function call Here I have modified my C code inorder to implement the method discussed above in the previous section, the modified C code is given below.
#include <stdio.h>
extern int load(int x, int y);
int main ()
{
int result = 0;
int count = 9;
result = load(0x0, count+1);
printf ("Sum of number from 1 to %d is %d\n", count, result);
}
Here I have written assembly level program as well inorder to execute the algorithm the code for which is given below
.section .text
.global load
.type load, @function
load:
add a4, a0, zero //Initialize sum register a4 with 0x0
add a2, a0, a1 //store count of 10 in register a2.Register a1 is loaded with 0xa (decimal 10) from main
add a3, a0, zero //initialize intermediate sum register a3 by 0
loop: add a4, a3, a4 //Incremental addition
addi a3, a3, 1 //Increment intermediate register by 1
blt a3, a2, loop //If a3 is less than a2, branch to label named <loop>
add a0, a4, zero //Store final result to register a0 so that it can be read by main program
ret
Simulate New C Program With Function Call Here I run the modified codes of C as well as the assembly langguage. The commands are similar to the ones used before one can observe them in the images below.
Lab to run C program on RISC-V CPU
Here we have a RISC-V CPU written in verilog & we will create a testbench. Then we will read the hex format C program through RISC-V CPU & output will be displayed.The whole process is described below.
To run the program in the terminal using following commands.
chmod 777 rv32im.sh
./rv32im..sh
The image below shows the output displayed in ubuntu terminal.
Combinational Logic in TL-verilog using Makerchip.
Introduction to Logic gates
Logic gates are the fundamental basic building blocks
As logic gates are the basic building blocks of a circuit. Here I learned how I can implement the logic gates using TL-verilog. The table below describes respective code for the logic gates.
A full adder circuit madeup of logic gates.
A adder circuit made using logic gates.
Basic Mux implementation & Introduction to makerchip
Basic mux 2x1 is made using the following commands, here we are using ternary operator which is similar to if statement in C program.
assign f = s ? x1 : x2;
The below image shows the 4x1 mux implemented using 2x1 mux and verilog code for that as well
Introduction to makerchip
- Type maker chip in tab of your search engine & launch Makerchip IDE.
- Go to Learn, click on Examples and select FPGA multipler.
Inverter Gate on makerchip
Vector of 5 bits
Mux with single bit
Mux with vector input
Combinational Calculator
Sequential Logic
Introduction to sequntial logic & counter lab
Sequential Circuit essentially consists a clock over combinational circuit. The value transition takes place on either positive or negative edge of the clock. The below image describes the basic idea of sequential circuit.
Fibonacci Series
The below image gives an idea how the circuit for performing Fibonacci series is implemented.
Free Running counter
The below image show code and working of a free running counter designed using sequential circuit, one can observe the importance of clock in the circuit as the output changes only for positive clock.
The basic circuit block diagram is given below
Sequential calculator lab
Pipeline Logic
Pipelined logic & retiming
The concept of pipeling is explained using the Pythagoras theorem.
Basics of pythagoras theorem on makerchip
TL-verilog gives the ability to model the process in timing abstract representation. The basic idea of pipelining is to break the whole process in different stages. The below image shows the use of pipelining concept in TL-verilog compared to other RTL languages.
Timing abstract gives the advantage to manipulate pipelining & its stages. i.e staging is a physical attribute it has no impact on behaviour as shown in the below image
The below image show the code for pipelining in TL-verilog.
Image shows comparison of code between system verilog and TL-verilog.
Pipeline logic advantages and demo in platform
- By applying pipelining we are able to run our clock at higher speed.
- In diagram 2, one can observe that we can introduce new input at every clock cycle. So we can introduce more inputs using pipeline.
Here we will understand the minute details of pipelining concept.
Here in the below image one can observe that there is single stage pipeline, so the output for C comes at the same stage.
Now when we change the single stage pipeline to 3 stage pipeline, now the output C comes 2 stage later than a & b. This can be observed in the below image.
At last here we are seeing the concept of feedback how varying the no. of feedback stages in code gets reflected in the diagram of pipeline. Here in the code we have set the code for 4 stage feedback which can be observed in the diagram as well.
Lab on Error Conditions within Computation Pipeline
Classification
Pipe Signal- All the instuctions are written in lower case. e.g.-$lower_case
Pascal case/State Signal - In this the first letter of both terms is written in upper case. eg.- $CamelCase
Keyword Signal - All the letters in the instructions are written in upper case. e.g.- $UPPER_CASE.
Numbers end tokens - $base64_value-- This was is considered as a good practice in TL-Verilog. $bad_name_5 -- This is avoidable practice in TL--Verilog
Numeric identifiers- e.g. >>1 this instruction indicates ahead by 1.
For pipelining of error I used following code in makerchip
$reset = *reset;
|comp
@1
$err1 = $bad_input || $illegal_op;
@2
$err2 = $err1 || $overflow;
@3
$err3 = $err2 || $div_by_zero;
The following picture shows the output
Lab on 2-Cycle Calculator
Value Representation in Verilog
The below image show how numbers are represnted in verilog.
Validity
Validity is a notion for when the values or the signals are meaningful. Validity provides
- Easier Debug
- Cleaner Design
- Better error checking
- Automated Clock gating
Let us implement the Pythagoran's theorem with validity:
Clock Gating is a power-saving property.
-
Motivation
1.1 Clock signals are distributed to EVERY flipflop.
1.2 Clocks toggle twice per cycle.
1.3 This consumes power.
-
Clock gating avoids toggling clock signals.
-
TL-verilog can produce fine-grained gating (or enables).
LAB- Distance Accumulator with Pythagoran's theorem.
LAB- Cycle calculator with Validity The pipeline structure is
The makerchip implementation output:
LAB- Calculator with single value Memory
The pipeline str. is as follows
Makerchip Implementation
Introduction to Simple RISC-V Microarchiteture
The micro architecture for the RISC-V implementation is shown here:
Basic terminologies
Program counter - The Program counter is a pointer to the instruction memory as to which instrcution must be executed next.
Decoder - The Decoder interprets the instruction and send signals regarding the action of the processor and the location of data. The decoder also sends incremented by 1 value to the PC, instructing it to move to the next instruction.
Register Files - These implements the read and write operations on the data/memory.
ALU - ALU computes the arithmetic operations and write the result back to the register file.
Fetch & Decode
The implementation plan of RISC-V CPU Core:
LAB - PC:
The implementation pipeline
The makerchip output
LAB - FETCH The pipeline structure(part-1):
The pipeline structure(part-2):
The makerchip implementation output:
LAB - INSTRUCTION TYPE DECODE
The Pipeline Structure
The makechip output
LAB - INSTRUCTION IMMEDIATE DECODE
The implementation output:
LAB - INSTRUCTION FIELD DECODE
The implementation output:
LAB - INSTRUCTION DECODE_2
The implementation output:
RISC-V control logic
LAB - REGISTER FILE READ_1
The pipeline structure is as follows
LAB - REGISTER FILE READ_2
The pipeline structure:
The makerchip implementation output:
LAB - ALU
The pipeline structure
The implementation output
LAB - REGISTER FILE WRITE
The makerchip implementation output:
Arrays
The detailed implementation of Register files is given below:
The implementation output:
The output is shown in the image below
LAB- Test bench
The makerchip implementation output is as shown below:-
Pipelining the CPU
LAB - 3-CYCLE VALID SIGNAL
The implementation output is:
LAB - CYCLE RISC-V
The implementation output is shown below
Solutions to Pipeline Hazards
REGISTER FILE BYPASS
The pipeline structure is as follows
The implementation output is shown below in the image
LAB - BRANCHES
The implemented output is shown below
LAB-ALU
The makerchip implementation results are:
Load/Store Instructions and Completing RISC-V CPU
LOAD
The pipeline structure
The implementation output:
LOAD/STORE
The implementation output
JUMPS
The makerchip output:
RISC-V CORE CPU - FINAL IMPLEMENTATION
The RISC-V final code is shown below:
\m4_TLV_version 1d: tl-x.org
\SV
// This code can be found in: https://github.com/stevehoover/RISC-V_MYTH_Workshop
m4_include_lib(['https://raw.githubusercontent.com/Dsatle/Risc_V/main/risc-v_shell_lib.tlv'])
\SV
m4_makerchip_module // (Expanded in Nav-TLV pane.)
\TLV
// /====================\
// | Sum 1 to 9 Program |
// \====================/
//
// Program for MYTH Workshop to test RV32I
// Add 1,2,3,...,9 (in that order).
//
// Regs:
// r10 (a0): In: 0, Out: final sum
// r12 (a2): 10
// r13 (a3): 1..10
// r14 (a4): Sum
//
// External to function:
m4_asm(ADD, r10, r0, r0) // Initialize r10 (a0) to 0.
// Function:
m4_asm(ADD, r14, r10, r0) // Initialize sum register a4 with 0x0
m4_asm(ADDI, r12, r10, 1010) // Store count of 10 in register a2.
m4_asm(ADD, r13, r10, r0) // Initialize intermediate sum register a3 with 0
// Loop:
m4_asm(ADD, r14, r13, r14) // Incremental addition
m4_asm(ADDI, r13, r13, 1) // Increment intermediate register by 1
m4_asm(BLT, r13, r12, 1111111111000) // If a3 is less than a2, branch to label named <loop>
m4_asm(ADD, r10, r14, r0) // Store final result to register a0 so that it can be read by main program
m4_asm(SW, r0, r10, 10000) // Store the final result value to byte address 16
m4_asm(LW, r15, r0, 10000) // Load the final result value from adress 16 to x17
// Optional:
// m4_asm(JAL, r7, 00000000000000000000) // Done. Jump to itself (infinite loop). (Up to 20-bit signed immediate plus implicit 0 bit (unlike JALR) provides byte address; last immediate bit should also be 0)
m4_define_hier(['M4_IMEM'], M4_NUM_INSTRS)
|cpu
@0
$reset = *reset;
//MODIFIED NEXT PC LOGIC FOR INCLUDING BRANCH INSTRCUTIONS
$pc[31:0] = >>1$reset ? 32'b0 :
>>3$valid_taken_branch ? >>3$br_target_pc :
>>3$valid_load ? >>3$inc_pc :
>>3$valid_jump && >>3$is_jal ? >>3$br_target_pc :
>>3$valid_jump && >>3$is_jalr ? >>3$jalr_target_pc :
>>1$inc_pc ;
//START LOGIC TO PROVIDE FIRST VALID LOGIC
//$start = (>>1$reset && $reset == 0) ? 1'b1 : 1'b0;
//$valid = $reset ? 1'b0 :
//$start ? 1'b1 : >>3$valid;
@1
//INSTRUCTION FETCH
$inc_pc[31:0] = $pc + 32'd4;
$imem_rd_en = !$reset;
$imem_rd_addr[M4_IMEM_INDEX_CNT-1:0] = $pc[M4_IMEM_INDEX_CNT+1:2];
$instr[31:0] = $imem_rd_data[31:0];
//INSTRUCTION TYPES DECODE
$is_u_instr = $instr[6:2] ==? 5'b0x101;
$is_s_instr = $instr[6:2] ==? 5'b0100x;
$is_r_instr = $instr[6:2] ==? 5'b011x0 ||
$instr[6:2] ==? 5'b01011 ||
$instr[6:2] ==? 5'b10100;
$is_j_instr = $instr[6:2] ==? 5'b11011;
$is_i_instr = $instr[6:2] ==? 5'b0000x ||
$instr[6:2] ==? 5'b001x0 ||
$instr[6:2] ==? 5'b11001;
$is_b_instr = $instr[6:2] ==? 5'b11000;
//INSTRUCTION IMMEDIATE DECODE
$imm[31:0] = $is_i_instr ? {{21{$instr[31]}}, $instr[30:20]} :
$is_s_instr ? {{21{$instr[31]}}, $instr[30:25], $instr[11:7]} :
$is_b_instr ? {{20{$instr[31]}}, $instr[7], $instr[30:25], $instr[11:8], 1'b0} :
$is_u_instr ? {$instr[31:12], 12'b0} :
$is_j_instr ? {{12{$instr[31]}}, $instr[19:12], $instr[20], $instr[30:21], 1'b0} :
32'b0;
//INSTRUCTION DECODE
$opcode[6:0] = $instr[6:0];
//INSTRUCTION FIELD DECODE
$rs2_valid = $is_r_instr || $is_s_instr || $is_b_instr;
?$rs2_valid
$rs2[4:0] = $instr[24:20];
$rs1_valid = $is_r_instr || $is_s_instr || $is_b_instr || $is_i_instr;
?$rs1_valid
$rs1[4:0] = $instr[19:15];
$funct3_valid = $is_r_instr || $is_s_instr || $is_b_instr || $is_i_instr;
?$funct3_valid
$funct3[2:0] = $instr[14:12];
$funct7_valid = $is_r_instr ;
?$funct7_valid
$funct7[6:0] = $instr[31:25];
$rd_valid = $is_r_instr || $is_u_instr || $is_j_instr || $is_i_instr;
?$rd_valid
$rd[4:0] = $instr[11:7];
@2
//INSTRUCTION DECODE
$dec_bits[10:0] = {$funct7[5],$funct3,$opcode};
$is_beq = $dec_bits ==? 11'bx_000_1100011;
$is_bne = $dec_bits ==? 11'bx_001_1100011;
$is_blt = $dec_bits ==? 11'bx_100_1100011;
$is_bge = $dec_bits ==? 11'bx_101_1100011;
$is_bltu = $dec_bits ==? 11'bx_110_1100011;
$is_bgeu = $dec_bits ==? 11'bx_111_1100011;
$is_addi = $dec_bits ==? 11'bx_000_0010011;
$is_add = $dec_bits ==? 11'b0_000_0110011;
$is_lui = $dec_bits ==? 11'bx_xxx_0110111;
$is_auipc = $dec_bits ==? 11'bx_xxx_0010111;
$is_jal = $dec_bits ==? 11'bx_xxx_1101111;
$is_jalr = $dec_bits ==? 11'bx_000_1100111;
$is_load = $opcode == 7'b0000011;
$is_sb = $dec_bits ==? 11'bx_000_0100011;
$is_sh = $dec_bits ==? 11'bx_001_0100011;
$is_sw = $dec_bits ==? 11'bx_010_0100011;
$is_slti = $dec_bits ==? 11'bx_010_0010011;
$is_sltiu = $dec_bits ==? 11'bx_011_0100011;
$is_xori = $dec_bits ==? 11'bx_100_0100011;
$is_ori = $dec_bits ==? 11'bx_110_0100011;
$is_andi = $dec_bits ==? 11'bx_111_0100011;
$is_slli = $dec_bits ==? 11'b0_001_0100011;
$is_srli = $dec_bits ==? 11'b0_101_0100011;
$is_srai = $dec_bits ==? 11'b1_101_0100011;
$is_sub = $dec_bits ==? 11'b1_000_0110011;
$is_sll = $dec_bits ==? 11'b0_001_0110011;
$is_slt = $dec_bits ==? 11'b0_010_0110011;
$is_sltu = $dec_bits ==? 11'b0_011_0110011;
$is_xor = $dec_bits ==? 11'b0_100_0110011;
$is_srl = $dec_bits ==? 11'b0_101_0110011;
$is_sra = $dec_bits ==? 11'b1_101_0110011;
$is_or = $dec_bits ==? 11'b0_110_0110011;
$is_and = $dec_bits ==? 11'b0_111_0110011;
$jalr_target_pc[31:0] = $src1_value +$imm ;
@3
$is_jump = $is_jal || $is_jalr ;
`BOGUS_USE($is_beq $is_bne $is_blt $is_bge $is_bltu $is_bgeu $is_addi $is_add
$is_lui $is_auipc $is_jal $is_jalr $is_load $is_sb $is_sh $is_sw $is_slti
$is_sltiu $is_xori $is_ori $is_andi $is_slli $is_srli $is_srai $is_sub $is_sll
$is_slt $is_sltu $is_xor $is_srl $is_sra $is_or $is_and)
@2
//REGISTER FILE READ
//$rf_wr_en = 1'b0;
//$rf_wr_index[4:0] = 5'b0;
//$rf_wr_data[31:0] = 32'b0;
$rf_rd_en1 = $rs1_valid;
$rf_rd_index1[4:0] = $rs1;
$rf_rd_en2 = $rs2_valid;
$rf_rd_index2[4:0] = $rs2;
$src1_value[31:0] = >>1$rf_wr_en && (>>1$rf_wr_index == $rf_rd_index1) ? >>1$result : $rf_rd_data1;
$src2_value[31:0] = >>1$rf_wr_en && (>>1$rf_wr_index == $rf_rd_index2) ? >>1$result : $rf_rd_data2;
$br_target_pc[31:0] = $pc +$imm;
@3
//ARITHMETIC AND LOGIC UNIT (ALU)
$sltu_rslt[31:0] = $src1_value < $src2_value;
$sltiu_rslt[31:0] = $src1_value < $imm;
$result[31:0] = $is_addi ? $src1_value + $imm :
$is_add ? $src1_value + $src2_value :
$is_andi ? $src1_value & $imm :
$is_ori ? $src1_value | $imm :
$is_xori ? $src1_value ^ $imm :
$is_slli ? $src1_value << $imm[5:0] :
($is_addi || $is_load || $is_s_instr) ? $src1_value + $imm :
$is_srli ? $src1_value >> $imm[5:0] :
$is_and ? $src1_value & $src2_value :
$is_or ? $src1_value | $src2_value :
$is_xor ? $src1_value ^ $src2_value :
$is_sub ? $src1_value - $src2_value :
$is_sll ? $src1_value << $src2_value[4:0] :
$is_srl ? $src1_value >> $src2_value[4:0] :
$is_sltu ? $sltu_rslt :
$is_sltiu ? $sltiu_rslt :
$is_lui ? {$imm[31:12],12'b0} :
$is_auipc ? $pc + $imm :
$is_jal ? $pc + 4 :
$is_jalr ? $pc + 4 :
$is_srai ? { {32{$src1_value[31]}},$src1_value} >> $imm[4:0] :
$is_slt ? ($src1_value[31] == $src2_value[31]) ? $sltu_rslt : {31'b0,$src1_value[31]} :
$is_slti ? ($src1_value[31] == $imm[31]) ? $sltiu_rslt : {31'b0,$src1_value[31]} :
$is_sra ? { {32{$src1_value[31]}},$src1_value} >> $src2_value[4:0] :
32'bx;
//REGISTER FILE WRITE
$rf_wr_en = ($rd_valid && $rd != 5'b0 && $valid) || >>2$valid_load;
$rf_wr_index[4:0] = >>2$valid_load ? >>2$rd : $rd;
$rf_wr_data[31:0] = >>2$valid_load ? >>2$ld_data : $result;
//BRANCH INSTRUCTIONS 1
$taken_branch = $is_beq ? ($src1_value == $src2_value):
$is_bne ? ($src1_value != $src2_value):
$is_blt ? (($src1_value < $src2_value) ^ ($src1_value[31] != $src2_value[31])):
$is_bge ? (($src1_value >= $src2_value) ^ ($src1_value[31] != $src2_value[31])):
$is_bltu ? ($src1_value < $src2_value):
$is_bgeu ? ($src1_value >= $src2_value):
1'b0;
//CYCLE VALID INSTRUCTIONS
$valid = !(>>1$valid_taken_branch || >>2$valid_taken_branch ||
>>1$valid_load || >>2$valid_load) ;
$valid_load = $valid && $is_load ;
//$valid = !(>>1$valid_taken_branch || >>2$valid_taken_branch);
$valid_taken_branch = $valid && $taken_branch;
$valid_jump = $is_jump && $valid ;
`BOGUS_USE($taken_branch)
@4
//MINI 1-R/W MEMORY
$dmem_wr_en = $is_s_instr && $valid ;
$dmem_addr[3:0] = $result[5:2] ;
$dmem_wr_data[31:0] = $src2_value ;
$dmem_rd_en = $is_load ;
@5
//LOAD DATA
$ld_data[31:0] = $dmem_rd_data ;
// Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
// be sure to avoid having unassigned signals (which you might be using for random inputs)
// other than those specifically expected in the labs. You'll get strange errors for these.
// Assert these to end simulation (before Makerchip cycle limit).
//*passed = *cyc_cnt > 40;
*passed = |cpu/xreg[15]>>5$value == (1+2+3+4+5+6+7+8+9) ;
*failed = 1'b0;
// Macro instantiations for:
// o instruction memory
// o register file
// o data memory
// o CPU visualization
|cpu
m4+imem(@1) // Args: (read stage)
m4+rf(@2, @3) // Args: (read stage, write stage) - if equal, no register bypass is required
m4+dmem(@4) // Args: (read/write stage)
m4+viz(@4) // For visualisation, argument should be at least equal to the last stage of CPU logic
//@4 would work for all lab
\SV
endmodule
Word Of Thanks
I am thankful to Kunal Ghosh ( cofounder & CEO VSD) for providing me good quality content & resources and guiding me through out the workshop. I would also like to thank Steeve Hoover (founder of Redwood EDA), for making me understand concepts of TL-Verilog and how to implement those on makerchip.
References
-
Alwin Shahju, colleague IIIT Bangalore
-
Lasya, colleague IIIT Bangalore
-
Bhargava DV colleague IIIT Bangalore
-
Pruthvi Parate colleague IIIT Bangalore