RISC-V_ISA

Introduction to RISC-V ISA and GNU compiler toolchain

RISC-V is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. As a RISC architecture, the RISC-V ISA is a load–store architecture. Its floating-point instructions use IEEE 754 floating-point. Notable features of the RISC-V ISA include: instruction bit field locations chosen to simplify the use of multiplexers in a CPU,a design that is architecturally neutral, and a fixed location for the sign bit of immediate values to speed up sign extension.

The instruction set is designed for a wide range of uses. The base instruction set has a fixed length of 32-bit naturally aligned instructions, and the ISA supports variable length extensions where each instruction can be any number of 16-bit parcels in length. Subsets support small embedded systems, personal computers, supercomputers with vector processors, and warehouse-scale 19 inch rack-mounted parallel computers.

Day 1

Day_1 Introduction to RISC-V and GNU compiler toochain

Introduction to RISC-V basic keywords

Introduction to RISC-V basic keywords

Why does a computer needs a RISC or CISC ISA?

Any computer program or software inorder to work on a computer hardware needs to communicate to the layout(chip present on system). Accomplishment of which requires a process to be followed. First the high level language program is converted to assembly level program(which follows a particular architecture RISC-V in this case). After which it's converted to machine level program for computer to understand.For communication between architeture to layout there is need for a interface, called HDL(Hardware Description Language).

Below image show the whole process of program or application execution.

Applications to Hardware

Inorder to run any application on the computer system. Below process needs to be followed.

Operating system, compiler, assembler all three combined are termed as system software.

The assembly language program is dependent on the processor and its architecture. Every architeture has its own assembly language program. Converting assembly language program to machine level program is done using a specific process, which is elaborated in the flowchart below.

Detailed description of detailed of Course content

The course deals with a elaborative study of the instruction types present in the RISC-V architeture. Here I have mentioned types of instruction sets present in the RISC-V architecture

Pseduo Instuctions- Examples of pseduo instructions are mv,li,ret.
Base Integer Instructions - The nomenclature for these instructions is RV64I here RV stands for RISC-V, 64 stands for 64 bit integer. Few examples of base integer instructions are lui,addi,jalr,auipc,ld.
Multiply extension- If there is multiply or divide operation needs to be performed on the numbers these instructions are used. Nomencalture for these instructions is RV64M, and if its multiplication or division on base integer than its nomencleture would be RV64Im
Single & double precision floating point extension- If add/sub/divide/multiply is performed on the floating point number this instruction set is used. RV64F & RV64D. Few examples are flw,fadd.s,fcvt.s.s,fmv.x.d,fsd,fmul.s,fdiv.s,fmv.x.d. A CPU which performs all above operations is termed as RV64IMFD.
Application Binary interface- This is made so that application programmers can access resources of processor like register. Few examples are a0,SP,s0.
Memory allocation & stack pointer- Transfer of data from memory to registers, stack pointer. Example ra,24(sp),s0, 16(sp),Sp,32.

Labwork for RISC-V software toolchain

Labwork for RISC-V software toolchain

C Program to compute sum from 1 to N.

Here I wrote a C program to calculate the sum of n numbers. Input is taken from user. C code for is as follows

#include <stdio.h>
int main()
{
   int n,sum=0;
   printf("Enter n: ");
   scanf("%d",&n);
   for(int i=1;i<=n;i++)
   sum =sum+i;
printf("sum of %d numbers is %d\n",n,sum);
return 1;
}

To get the output of the above program i wrote following commands

  gcc file_name.c
  ./a.out

The following I got in when program is run on the system. The image shows the sum first 100 natural numbers

RISC-V GCC compile And Dissemble Here I observed the difference in RISC-V instructions first I used the command

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/risv64-unknown-elf-gcc-O1 -mabi=lp64 -march=rv64i -o sum1ton.o sum1ton.c

The following assembly level codes list was way too long to filtered the main portion in which we are interested is seen by the following command

riscv64-unknown-elf-objdump -d sum1ton.o | less

The following instructions were obtained

After this I entered the command

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/risv64-unknown-elf-gcc -Ofast -ch=rv64i -o sum1ton.o sum1ton.c

Using the less command above mentioned I got the following results

Spike simulation and debug

To get the same output on RISCV I used the following commands

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk sum1ton.o

Now here are the commands which I used to debug the assembly level program

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/spike -d pk sum1ton.o

Following are the the debug commands I used

until pc 0 1000b0 // This indicates start and end address for the commands.

reg 0 a2 // This command is used to check the contents of the register

lui // load upper immediate

q // quit

reg 0 sp // Knowing value stored in Stack Pointer

addi // add immediate

Below is the screenshot for the commands used

Below is a self explanatory image of 64 bit instruction and instruction used in the RISC-V

Integer number representation

Integer number representation

64-bit Number System For Unsigned numbers Here first of all we will get familiar with few basic terminologies

Double Word:- Entire 64 bit number in processor language is called double word.

Word:- 32 bit number in processor language

Byte:- Group of 8 bits.

Total no. of pattern that can be formed is = (2^n -1); where n:- number of bits.

RISC-V doubleword can represent "0" to (2^64-1) unsigned numbers.

The following images shows terminologies range and binary to decimal conversion

64 Number System for Signed Numbers

For getting negative numbers we use concept of 2's complement which is shown in the image below.

Here we are devoting MSB for sign representation.

if MSB =1; number is negative if MSB =0; number is positive.

The image below describes the two method to convert negative binary numbers into decimal numbers

Range for positive & negative numbers is shown below

Lab for signed & unsigned numbers Here we will look at the range of unsigned and signed numbers.

Following is the code for highest unsigned number

#include <stdio.h>
#include <math.h>

int main ()
{ unsigned long long int max = (unsigned long long int) (pow(2,64) - 1);
  printf("highest number represented by unsigned long long int is %llu\n", max);
  return 0;
  }

To run the command I used following commands in the terminal

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o unsigned.o unsigned.c

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk unsigned.o

One can observe the output in the below image

For getting the lowest negative number following C code was used

#include <stdio.h>
#include <math.h>

int main ()
{  long long int max = ( long long int) (pow(2,64) * - 1);
  printf("highest number represented by  long long int is %lld\n", max);
  return 0;
  }

To run the above code following commands were used

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o signed.o signed.c

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk signed.o

The below image shows the output obtained

Now we will look at the range of least negative and highest positive number, code for which is given below

#include <stdio.h>
#include <math.h>
int main() {
long long int max = (int) (pow(2,63) -1);
long long int min = (int) (pow(2,63) * -1);
printf("highest number represented by long long int is %lld\n", max);
printf("lowest number represented by long long int is %lld\n", min);
return 0;

Here we can see the range is not correct.

The correct code and output is given below

#include <stdio.h>
#include <math.h>
int main() {
long long int max = (long long int) (pow(2,63) -1);
long long int min = (long long int) (pow(2,63) * -1);

printf("highest number represented by long long int is %lld\n", max);
printf("lowest number represented by long long int is %lld\n", min);
return 0;
}

Day_2 Introduction to ABI & basic verification flow

Application Binary Interface

Application Binary interface

Introduction to Application binary interface- The way a user can access a architeture resources through system call is called application binary interface, its also calledsystem call interface. If application programmmer wants to access the hardware resources it is done via registers.

The below image shows the different levels between user and layout.

In RISc-V programmer there are 32 registers & width is defined by XLEN. XLEN is 32 bit for RV32 XLEN is 64 bit for RV64.

Memory Allocation for Double words

RISC-V has 32 64-bit registers. There are two ways in which data can be loaded to the register.

Direct loading- In this method data is directly loaded to the register. The below image shows the method

Via memory- Since we have limited registers in RISC-V the data is first stored in the memory this data is then transfered to registers. The below image show the method.

Little endian method- The RISC-V uses the little endian approach to fill the data in the memory i.e. the data from LSB gets start filling in the memory, from bottom to top respectively. A pictorial presentation of which is shown in the image below.

Load,Add and Store Instructions with examples

Here I came to know about the how data is transfered from memory to register and add operation on the data and then transfer of data from register to memory. Following commands were used to do the above operations.

ld x8, 16(x23) // ld stands from load. Initially the pointer is at 0. Since the data is at 16th location the register x23 will go to 16th location and load that 
                  data to into x8. x8 is destination register and x23 is source register.

add x8, x24,x8  // here the data of x8 and x24 is added and then finally stored in x8.

sd x8, 8(x23)  // here the data from x23 register is stored to the memory location starting from 8.

The whole process discussed above is shown in the below two images.

The above picture also describes which bits are indicate which part of the assembly level language. Every instruction in RISC-V is 32 bit.

Concluding 32-registers And their respective ABI names

There are following type of instructions

R-type:- These instructions operate on registers.
I-type:- These instructions consists immediate in it and operates on registers.
S-type:- Instructions that consists store in it.

As we can observe there are 5 bits dedicated for register in the machine level code. As 2^5= 32 this the logic behind having 32 registers in the RISC-V architeture.

The RISC-V instructions are bifurgated in following types shown in the table below.

Labwork using ABI function calls

Study New Algorithm For Sum 1 to N using ASM Here we are going to apply the knowledge of instructions which we got familiar in the previous tutorial. Here we are going to push some functionalities from C program to assembly language program. And get fetch the end result from assembly level program to the C program. A pictorial view of the above mention method is shown below.

To apply this method we are going to follow the below algorithm shown in the picture

Review ASM Function call Here I have modified my C code inorder to implement the method discussed above in the previous section, the modified C code is given below.

#include <stdio.h>

extern int load(int x, int y);

int main ()
{
   int result = 0;
   int count = 9;
   result = load(0x0, count+1);
   printf ("Sum of number from 1 to %d is %d\n", count, result);
   }

Here I have written assembly level program as well inorder to execute the algorithm the code for which is given below

.section .text
.global load
.type load, @function

load: 
        add      a4, a0, zero //Initialize sum register a4 with 0x0
        add      a2, a0, a1   //store count of 10 in register a2.Register a1 is loaded with 0xa (decimal 10) from main
        add      a3, a0, zero //initialize intermediate sum register a3 by 0
loop:   add      a4, a3, a4 //Incremental addition
        addi     a3, a3, 1 //Increment intermediate register by 1
        blt      a3, a2, loop //If a3 is less than a2, branch to label named <loop>
        add      a0, a4, zero //Store final result to register a0 so that it can be read by main program
        ret

Simulate New C Program With Function Call Here I run the modified codes of C as well as the assembly langguage. The commands are similar to the ones used before one can observe them in the images below.

Lab to run C program on RISC-V CPU

Here we have a RISC-V CPU written in verilog & we will create a testbench. Then we will read the hex format C program through RISC-V CPU & output will be displayed.The whole process is described below.

To run the program in the terminal using following commands.

chmod 777 rv32im.sh
./rv32im..sh

The image below shows the output displayed in ubuntu terminal.

Day_3 Digital Logic with TL-verilog & makerchip

Combinational Logic in TL-verilog using Makerchip.

Introduction to Logic gates

Logic gates are the fundamental basic building blocks

As logic gates are the basic building blocks of a circuit. Here I learned how I can implement the logic gates using TL-verilog. The table below describes respective code for the logic gates.

A full adder circuit madeup of logic gates.

A adder circuit made using logic gates.

Basic Mux implementation & Introduction to makerchip

Basic mux 2x1 is made using the following commands, here we are using ternary operator which is similar to if statement in C program.

assign f = s ? x1 : x2;

The below image shows the 4x1 mux implemented using 2x1 mux and verilog code for that as well

Introduction to makerchip

Type maker chip in tab of your search engine & launch Makerchip IDE.
Go to Learn, click on Examples and select FPGA multipler.

Inverter Gate on makerchip

Vector of 5 bits

Mux with single bit

Mux with vector input

Combinational Calculator

Sequential Logic

Introduction to sequntial logic & counter lab

Sequential Circuit essentially consists a clock over combinational circuit. The value transition takes place on either positive or negative edge of the clock. The below image describes the basic idea of sequential circuit.

Fibonacci Series

The below image gives an idea how the circuit for performing Fibonacci series is implemented.

Free Running counter

The below image show code and working of a free running counter designed using sequential circuit, one can observe the importance of clock in the circuit as the output changes only for positive clock.

The basic circuit block diagram is given below

Sequential calculator lab

Pipeline Logic

Pipelined logic & retiming

The concept of pipeling is explained using the Pythagoras theorem.

Basics of pythagoras theorem on makerchip

TL-verilog gives the ability to model the process in timing abstract representation. The basic idea of pipelining is to break the whole process in different stages. The below image shows the use of pipelining concept in TL-verilog compared to other RTL languages.

Timing abstract gives the advantage to manipulate pipelining & its stages. i.e staging is a physical attribute it has no impact on behaviour as shown in the below image

The below image show the code for pipelining in TL-verilog.

Image shows comparison of code between system verilog and TL-verilog.

Pipeline logic advantages and demo in platform

By applying pipelining we are able to run our clock at higher speed.
In diagram 2, one can observe that we can introduce new input at every clock cycle. So we can introduce more inputs using pipeline.

Here we will understand the minute details of pipelining concept.

Here in the below image one can observe that there is single stage pipeline, so the output for C comes at the same stage.

Now when we change the single stage pipeline to 3 stage pipeline, now the output C comes 2 stage later than a & b. This can be observed in the below image.

At last here we are seeing the concept of feedback how varying the no. of feedback stages in code gets reflected in the diagram of pipeline. Here in the code we have set the code for 4 stage feedback which can be observed in the diagram as well.

Lab on Error Conditions within Computation Pipeline

Classification

Pipe Signal- All the instuctions are written in lower case. e.g.-$lower_case

Pascal case/State Signal - In this the first letter of both terms is written in upper case. eg.- $CamelCase

Keyword Signal - All the letters in the instructions are written in upper case. e.g.- $UPPER_CASE.

Numbers end tokens - $base64_value-- This was is considered as a good practice in TL-Verilog. $bad_name_5 -- This is avoidable practice in TL--Verilog

Numeric identifiers- e.g. >>1 this instruction indicates ahead by 1.

For pipelining of error I used following code in makerchip

$reset = *reset;
  |comp 
     @1
        $err1 = $bad_input || $illegal_op;
     @2 
        $err2 = $err1 || $overflow;
     @3
        $err3 = $err2 || $div_by_zero;

The following picture shows the output

Lab on 2-Cycle Calculator

Value Representation in Verilog

The below image show how numbers are represnted in verilog.

Validity

Validity is a notion for when the values or the signals are meaningful. Validity provides

Easier Debug
Cleaner Design
Better error checking
Automated Clock gating

Let us implement the Pythagoran's theorem with validity:

Clock Gating is a power-saving property.

Motivation

1.1 Clock signals are distributed to EVERY flipflop.

1.2 Clocks toggle twice per cycle.

1.3 This consumes power.
Clock gating avoids toggling clock signals.
TL-verilog can produce fine-grained gating (or enables).

LAB- Distance Accumulator with Pythagoran's theorem.

LAB- Cycle calculator with Validity The pipeline structure is

The makerchip implementation output:

LAB- Calculator with single value Memory

The pipeline str. is as follows

Makerchip Implementation

Wrap-UP

LAB - conway's game of life:

LAB - Pythagoran's theorem:

The makerchip output:

Day_4 Basic RISC-V CPU Archituecture

Introduction to Simple RISC-V Microarchiteture

The micro architecture for the RISC-V implementation is shown here:

Basic terminologies

Program counter - The Program counter is a pointer to the instruction memory as to which instrcution must be executed next.

Decoder - The Decoder interprets the instruction and send signals regarding the action of the processor and the location of data. The decoder also sends incremented by 1 value to the PC, instructing it to move to the next instruction.

Register Files - These implements the read and write operations on the data/memory.

ALU - ALU computes the arithmetic operations and write the result back to the register file.

Fetch & Decode

The implementation plan of RISC-V CPU Core:

LAB - PC:

The implementation pipeline

The makerchip output

LAB - FETCH The pipeline structure(part-1):

The pipeline structure(part-2):

The makerchip implementation output:

LAB - INSTRUCTION TYPE DECODE

The Pipeline Structure

The makechip output

LAB - INSTRUCTION IMMEDIATE DECODE

The implementation output:

LAB - INSTRUCTION FIELD DECODE

The implementation output:

LAB - INSTRUCTION DECODE_2

The implementation output:

RISC-V control logic

LAB - REGISTER FILE READ_1

The pipeline structure is as follows

LAB - REGISTER FILE READ_2

The pipeline structure:

The makerchip implementation output:

LAB - ALU

The pipeline structure

The implementation output

LAB - REGISTER FILE WRITE

The makerchip implementation output:

Arrays

The detailed implementation of Register files is given below:

The implementation output:

The output is shown in the image below

LAB- Test bench

The makerchip implementation output is as shown below:-

Day_5 Complete Pipelined RISC-V CPU micro-architecture

Pipelining the CPU

LAB - 3-CYCLE VALID SIGNAL

The implementation output is:

LAB - CYCLE RISC-V

The implementation output is shown below

Solutions to Pipeline Hazards

REGISTER FILE BYPASS

The pipeline structure is as follows

The implementation output is shown below in the image

LAB - BRANCHES

The implemented output is shown below

LAB-ALU

The makerchip implementation results are:

Load/Store Instructions and Completing RISC-V CPU

LOAD

The pipeline structure

The implementation output:

LOAD/STORE

The implementation output

JUMPS

The makerchip output:

RISC-V CORE CPU - FINAL IMPLEMENTATION

The RISC-V final code is shown below:

\m4_TLV_version 1d: tl-x.org
\SV
   // This code can be found in: https://github.com/stevehoover/RISC-V_MYTH_Workshop
   
   m4_include_lib(['https://raw.githubusercontent.com/Dsatle/Risc_V/main/risc-v_shell_lib.tlv'])

\SV
   m4_makerchip_module   // (Expanded in Nav-TLV pane.)
\TLV

   // /====================\
   // | Sum 1 to 9 Program |
   // \====================/
   //
   // Program for MYTH Workshop to test RV32I
   // Add 1,2,3,...,9 (in that order).
   //
   // Regs:
   //  r10 (a0): In: 0, Out: final sum
   //  r12 (a2): 10
   //  r13 (a3): 1..10
   //  r14 (a4): Sum
   //
   // External to function:
   m4_asm(ADD, r10, r0, r0)             // Initialize r10 (a0) to 0.
   // Function:
   m4_asm(ADD, r14, r10, r0)            // Initialize sum register a4 with 0x0
   m4_asm(ADDI, r12, r10, 1010)         // Store count of 10 in register a2.
   m4_asm(ADD, r13, r10, r0)            // Initialize intermediate sum register a3 with 0
   // Loop:
   m4_asm(ADD, r14, r13, r14)           // Incremental addition
   m4_asm(ADDI, r13, r13, 1)            // Increment intermediate register by 1
   m4_asm(BLT, r13, r12, 1111111111000) // If a3 is less than a2, branch to label named <loop>
   m4_asm(ADD, r10, r14, r0)            // Store final result to register a0 so that it can be read by main program
   m4_asm(SW, r0, r10, 10000)           // Store the final result value to byte address 16
   m4_asm(LW, r15, r0, 10000)           // Load the final result value from adress 16 to x17
   
   // Optional:
   // m4_asm(JAL, r7, 00000000000000000000) // Done. Jump to itself (infinite loop). (Up to 20-bit signed immediate plus implicit 0 bit (unlike JALR) provides byte address; last immediate bit should also be 0)
   m4_define_hier(['M4_IMEM'], M4_NUM_INSTRS)


   |cpu
      @0
         $reset = *reset;
         
         //MODIFIED NEXT PC LOGIC FOR INCLUDING BRANCH INSTRCUTIONS
         $pc[31:0] = >>1$reset ? 32'b0 :
                     >>3$valid_taken_branch ? >>3$br_target_pc :
                     >>3$valid_load ? >>3$inc_pc :
                     >>3$valid_jump && >>3$is_jal ? >>3$br_target_pc :
                     >>3$valid_jump && >>3$is_jalr ? >>3$jalr_target_pc :
                     >>1$inc_pc ;
         //START LOGIC TO PROVIDE FIRST VALID LOGIC
         //$start = (>>1$reset && $reset == 0) ? 1'b1 : 1'b0;
         //$valid = $reset ? 1'b0 :
                  //$start ? 1'b1 : >>3$valid;
     
      @1  
         //INSTRUCTION FETCH
         $inc_pc[31:0] = $pc + 32'd4;
         
         $imem_rd_en = !$reset;
         $imem_rd_addr[M4_IMEM_INDEX_CNT-1:0] = $pc[M4_IMEM_INDEX_CNT+1:2];
         
         $instr[31:0] = $imem_rd_data[31:0];
         
         //INSTRUCTION TYPES DECODE        
         
         $is_u_instr = $instr[6:2] ==? 5'b0x101;
         
         $is_s_instr = $instr[6:2] ==? 5'b0100x;
         
         $is_r_instr = $instr[6:2] ==? 5'b011x0 ||
                       $instr[6:2] ==? 5'b01011 ||
                       $instr[6:2] ==? 5'b10100;
         
         $is_j_instr = $instr[6:2] ==? 5'b11011;
         
         $is_i_instr = $instr[6:2] ==? 5'b0000x ||
                       $instr[6:2] ==? 5'b001x0 ||
                       $instr[6:2] ==? 5'b11001;
         
         $is_b_instr = $instr[6:2] ==? 5'b11000;
         
         //INSTRUCTION IMMEDIATE DECODE
         $imm[31:0] = $is_i_instr ? {{21{$instr[31]}}, $instr[30:20]} :
                      $is_s_instr ? {{21{$instr[31]}}, $instr[30:25], $instr[11:7]} :
                      $is_b_instr ? {{20{$instr[31]}}, $instr[7], $instr[30:25], $instr[11:8], 1'b0} :
                      $is_u_instr ? {$instr[31:12], 12'b0} :
                      $is_j_instr ? {{12{$instr[31]}}, $instr[19:12], $instr[20], $instr[30:21], 1'b0} :
                                                            32'b0;
         //INSTRUCTION DECODE
         $opcode[6:0] = $instr[6:0];
         
         
         //INSTRUCTION FIELD DECODE
         $rs2_valid = $is_r_instr || $is_s_instr || $is_b_instr;
         ?$rs2_valid
            $rs2[4:0] = $instr[24:20];
           
         $rs1_valid = $is_r_instr  || $is_s_instr || $is_b_instr || $is_i_instr;
         ?$rs1_valid
            $rs1[4:0] = $instr[19:15];
         
         $funct3_valid = $is_r_instr  || $is_s_instr || $is_b_instr || $is_i_instr;
         ?$funct3_valid
            $funct3[2:0] = $instr[14:12];
           
         $funct7_valid = $is_r_instr ;
         ?$funct7_valid
            $funct7[6:0] = $instr[31:25];
           
         $rd_valid = $is_r_instr  || $is_u_instr || $is_j_instr || $is_i_instr;
         ?$rd_valid
            $rd[4:0] = $instr[11:7];
         
         
      @2
         //INSTRUCTION DECODE
         $dec_bits[10:0] = {$funct7[5],$funct3,$opcode};
         $is_beq = $dec_bits ==? 11'bx_000_1100011;
         $is_bne = $dec_bits ==? 11'bx_001_1100011;
         $is_blt = $dec_bits ==? 11'bx_100_1100011;
         $is_bge = $dec_bits ==? 11'bx_101_1100011;
         $is_bltu = $dec_bits ==? 11'bx_110_1100011;
         $is_bgeu = $dec_bits ==? 11'bx_111_1100011;
         $is_addi = $dec_bits ==? 11'bx_000_0010011;
         $is_add = $dec_bits ==? 11'b0_000_0110011;
         $is_lui = $dec_bits ==? 11'bx_xxx_0110111;
         $is_auipc = $dec_bits ==? 11'bx_xxx_0010111;
         $is_jal = $dec_bits ==? 11'bx_xxx_1101111;
         $is_jalr = $dec_bits ==? 11'bx_000_1100111;
         $is_load = $opcode == 7'b0000011;
         $is_sb = $dec_bits ==? 11'bx_000_0100011;
         $is_sh = $dec_bits ==? 11'bx_001_0100011;
         $is_sw = $dec_bits ==? 11'bx_010_0100011;
         $is_slti = $dec_bits ==? 11'bx_010_0010011;
         $is_sltiu = $dec_bits ==? 11'bx_011_0100011;
         $is_xori = $dec_bits ==? 11'bx_100_0100011;
         $is_ori = $dec_bits ==? 11'bx_110_0100011;
         $is_andi = $dec_bits ==? 11'bx_111_0100011;
         $is_slli = $dec_bits ==? 11'b0_001_0100011;
         $is_srli = $dec_bits ==? 11'b0_101_0100011;
         $is_srai = $dec_bits ==? 11'b1_101_0100011;
         $is_sub = $dec_bits ==? 11'b1_000_0110011;
         $is_sll = $dec_bits ==? 11'b0_001_0110011;
         $is_slt = $dec_bits ==? 11'b0_010_0110011;
         $is_sltu = $dec_bits ==? 11'b0_011_0110011;
         $is_xor = $dec_bits ==? 11'b0_100_0110011;
         $is_srl = $dec_bits ==? 11'b0_101_0110011;
         $is_sra = $dec_bits ==? 11'b1_101_0110011;
         $is_or = $dec_bits ==? 11'b0_110_0110011;
         $is_and = $dec_bits ==? 11'b0_111_0110011;
         
         $jalr_target_pc[31:0] = $src1_value +$imm ;
      @3
         $is_jump = $is_jal || $is_jalr ;   
         `BOGUS_USE($is_beq $is_bne $is_blt $is_bge $is_bltu $is_bgeu $is_addi $is_add
                    $is_lui $is_auipc $is_jal $is_jalr $is_load $is_sb $is_sh $is_sw $is_slti
                    $is_sltiu $is_xori $is_ori $is_andi $is_slli $is_srli $is_srai $is_sub $is_sll
                    $is_slt $is_sltu $is_xor $is_srl $is_sra $is_or $is_and)
         
      @2  
         //REGISTER FILE READ
         //$rf_wr_en = 1'b0;
         //$rf_wr_index[4:0] = 5'b0;
         //$rf_wr_data[31:0] = 32'b0;
         $rf_rd_en1 = $rs1_valid;
         $rf_rd_index1[4:0] = $rs1;
         $rf_rd_en2 = $rs2_valid;
         $rf_rd_index2[4:0] = $rs2;
         
         $src1_value[31:0] = >>1$rf_wr_en && (>>1$rf_wr_index == $rf_rd_index1) ? >>1$result : $rf_rd_data1;
         $src2_value[31:0] = >>1$rf_wr_en && (>>1$rf_wr_index == $rf_rd_index2) ? >>1$result : $rf_rd_data2;
         $br_target_pc[31:0] = $pc +$imm;
         
      @3  
         //ARITHMETIC AND LOGIC UNIT (ALU)
         
         $sltu_rslt[31:0] = $src1_value < $src2_value;
         $sltiu_rslt[31:0] = $src1_value < $imm;
         $result[31:0] = $is_addi ? $src1_value + $imm :
                         $is_add ? $src1_value + $src2_value :
                         $is_andi ? $src1_value & $imm :
                         $is_ori ? $src1_value | $imm :
                         $is_xori ? $src1_value ^ $imm :
                         $is_slli ? $src1_value << $imm[5:0] :
                         ($is_addi || $is_load || $is_s_instr) ? $src1_value + $imm :
                         $is_srli ? $src1_value >> $imm[5:0] :
                         $is_and ? $src1_value & $src2_value :
                         $is_or ? $src1_value | $src2_value :
                         $is_xor ? $src1_value ^ $src2_value :
                         $is_sub ? $src1_value - $src2_value :
                         $is_sll ? $src1_value << $src2_value[4:0] :
                         $is_srl ? $src1_value >> $src2_value[4:0] :
                         $is_sltu ? $sltu_rslt :
                         $is_sltiu ? $sltiu_rslt :
                         $is_lui ? {$imm[31:12],12'b0} :
                         $is_auipc ? $pc + $imm :
                         $is_jal ? $pc + 4 :
                         $is_jalr ? $pc + 4 :
                         $is_srai ? { {32{$src1_value[31]}},$src1_value} >> $imm[4:0] :
                         $is_slt ? ($src1_value[31] == $src2_value[31]) ? $sltu_rslt : {31'b0,$src1_value[31]} :
                         $is_slti ? ($src1_value[31] == $imm[31]) ? $sltiu_rslt : {31'b0,$src1_value[31]} :
                         $is_sra ? { {32{$src1_value[31]}},$src1_value} >> $src2_value[4:0] :
                         32'bx;
         
         
         //REGISTER FILE WRITE
         $rf_wr_en = ($rd_valid && $rd != 5'b0 && $valid) || >>2$valid_load;
         $rf_wr_index[4:0] = >>2$valid_load ? >>2$rd : $rd;
         $rf_wr_data[31:0] = >>2$valid_load ? >>2$ld_data : $result;
         
         
         //BRANCH INSTRUCTIONS 1
         $taken_branch = $is_beq ? ($src1_value == $src2_value):
                         $is_bne ? ($src1_value != $src2_value):
                         $is_blt ? (($src1_value < $src2_value) ^ ($src1_value[31] != $src2_value[31])):
                         $is_bge ? (($src1_value >= $src2_value) ^ ($src1_value[31] != $src2_value[31])):
                         $is_bltu ? ($src1_value < $src2_value):
                         $is_bgeu ? ($src1_value >= $src2_value):
                         1'b0;
          //CYCLE VALID INSTRUCTIONS
         $valid = !(>>1$valid_taken_branch || >>2$valid_taken_branch ||
                    >>1$valid_load || >>2$valid_load) ;
         
         $valid_load = $valid && $is_load ;
         //$valid = !(>>1$valid_taken_branch || >>2$valid_taken_branch);
         $valid_taken_branch = $valid && $taken_branch;
         $valid_jump = $is_jump && $valid ;
         `BOGUS_USE($taken_branch)
      @4
         //MINI 1-R/W MEMORY
         $dmem_wr_en = $is_s_instr && $valid ;
         $dmem_addr[3:0] = $result[5:2] ;
         $dmem_wr_data[31:0] = $src2_value ;
         $dmem_rd_en = $is_load ;
         
      @5
         //LOAD DATA
         $ld_data[31:0] = $dmem_rd_data ;   
         
         
         

      // Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
      //       be sure to avoid having unassigned signals (which you might be using for random inputs)
      //       other than those specifically expected in the labs. You'll get strange errors for these.

   
   // Assert these to end simulation (before Makerchip cycle limit).
   //*passed = *cyc_cnt > 40;
   *passed = |cpu/xreg[15]>>5$value == (1+2+3+4+5+6+7+8+9) ;
   *failed = 1'b0;
   
   // Macro instantiations for:
   //  o instruction memory
   //  o register file
   //  o data memory
   //  o CPU visualization
   |cpu
      m4+imem(@1)    // Args: (read stage)
      m4+rf(@2, @3)  // Args: (read stage, write stage) - if equal, no register bypass is required
      m4+dmem(@4)    // Args: (read/write stage)
   
   m4+viz(@4)    // For visualisation, argument should be at least equal to the last stage of CPU logic
   //@4 would work for all lab
\SV
   endmodule

Word Of Thanks

I am thankful to Kunal Ghosh ( cofounder & CEO VSD) for providing me good quality content & resources and guiding me through out the workshop. I would also like to thank Steeve Hoover (founder of Redwood EDA), for making me understand concepts of TL-Verilog and how to implement those on makerchip.

References

https://www.vsdiat.com
https://github.com/stevehoover/RISC-V_MYTH_Workshop
http://makerchip.com/sandbox/
https://github.com/kunalg123/riscv_workshop_collaterals
Alwin Shahju, colleague IIIT Bangalore
Lasya, colleague IIIT Bangalore
Bhargava DV colleague IIIT Bangalore
Pruthvi Parate colleague IIIT Bangalore

DSatle / RISC-V_ISA

RISC-V_ISA

Introduction to RISC-V ISA and GNU compiler toolchain

Table of Contents

Day_1 Introduction to RISC-V and GNU compiler toochain

Day_2 Introduction to ABI & basic verification flow

Day_3 Digital Logic with TL-verilog & makerchip

Day_4 Basic RISC-V CPU Archituecture

Day_5 Complete Pipelined RISC-V CPU micro-architecture

About