DSatle / RISC-V_ISA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RISC-V_ISA

Introduction to RISC-V ISA and GNU compiler toolchain

RISC-V is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. As a RISC architecture, the RISC-V ISA is a load–store architecture. Its floating-point instructions use IEEE 754 floating-point. Notable features of the RISC-V ISA include: instruction bit field locations chosen to simplify the use of multiplexers in a CPU,a design that is architecturally neutral, and a fixed location for the sign bit of immediate values to speed up sign extension.

The instruction set is designed for a wide range of uses. The base instruction set has a fixed length of 32-bit naturally aligned instructions, and the ISA supports variable length extensions where each instruction can be any number of 16-bit parcels in length. Subsets support small embedded systems, personal computers, supercomputers with vector processors, and warehouse-scale 19 inch rack-mounted parallel computers.

Table of Contents

Day 1

Day 2

Day 3

Day 4

Day 5

Day_1 Introduction to RISC-V and GNU compiler toochain

Introduction to RISC-V basic keywords

Introduction to RISC-V basic keywords

Why does a computer needs a RISC or CISC ISA?

Any computer program or software inorder to work on a computer hardware needs to communicate to the layout(chip present on system). Accomplishment of which requires a process to be followed. First the high level language program is converted to assembly level program(which follows a particular architecture RISC-V in this case). After which it's converted to machine level program for computer to understand.For communication between architeture to layout there is need for a interface, called HDL(Hardware Description Language).

Below image show the whole process of program or application execution.

comb

Applications to Hardware

Inorder to run any application on the computer system. Below process needs to be followed.

Architure

Operating system, compiler, assembler all three combined are termed as system software.

The assembly language program is dependent on the processor and its architecture. Every architeture has its own assembly language program. Converting assembly language program to machine level program is done using a specific process, which is elaborated in the flowchart below.

Assembly to Physical implementation

Detailed description of detailed of Course content

The course deals with a elaborative study of the instruction types present in the RISC-V architeture. Here I have mentioned types of instruction sets present in the RISC-V architecture

  1. Pseduo Instuctions- Examples of pseduo instructions are mv,li,ret.

  2. Base Integer Instructions - The nomenclature for these instructions is RV64I here RV stands for RISC-V, 64 stands for 64 bit integer. Few examples of base integer instructions are lui,addi,jalr,auipc,ld.

  3. Multiply extension- If there is multiply or divide operation needs to be performed on the numbers these instructions are used. Nomencalture for these instructions is RV64M, and if its multiplication or division on base integer than its nomencleture would be RV64Im

  4. Single & double precision floating point extension- If add/sub/divide/multiply is performed on the floating point number this instruction set is used. RV64F & RV64D. Few examples are flw,fadd.s,fcvt.s.s,fmv.x.d,fsd,fmul.s,fdiv.s,fmv.x.d. A CPU which performs all above operations is termed as RV64IMFD.

  5. Application Binary interface- This is made so that application programmers can access resources of processor like register. Few examples are a0,SP,s0.

  6. Memory allocation & stack pointer- Transfer of data from memory to registers, stack pointer. Example ra,24(sp),s0, 16(sp),Sp,32.

Labwork for RISC-V software toolchain

Labwork for RISC-V software toolchain

C Program to compute sum from 1 to N.

Here I wrote a C program to calculate the sum of n numbers. Input is taken from user. C code for is as follows

#include <stdio.h>
int main()
{
   int n,sum=0;
   printf("Enter n: ");
   scanf("%d",&n);
   for(int i=1;i<=n;i++)
   sum =sum+i;
printf("sum of %d numbers is %d\n",n,sum);
return 1;
}

To get the output of the above program i wrote following commands

  gcc file_name.c
  ./a.out

The following I got in when program is run on the system. The image shows the sum first 100 natural numbers

7

RISC-V GCC compile And Dissemble Here I observed the difference in RISC-V instructions first I used the command

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/risv64-unknown-elf-gcc-O1 -mabi=lp64 -march=rv64i -o sum1ton.o sum1ton.c

The following assembly level codes list was way too long to filtered the main portion in which we are interested is seen by the following command

riscv64-unknown-elf-objdump -d sum1ton.o | less

The following instructions were obtained

-01 fast less

After this I entered the command

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/risv64-unknown-elf-gcc -Ofast -ch=rv64i -o sum1ton.o sum1ton.c

Using the less command above mentioned I got the following results

ofast less

Spike simulation and debug

To get the same output on RISCV I used the following commands

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk sum1ton.o

1

Now here are the commands which I used to debug the assembly level program

/home/divyam/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin/spike -d pk sum1ton.o

Following are the the debug commands I used

until pc 0 1000b0 // This indicates start and end address for the commands.

reg 0 a2 // This command is used to check the contents of the register

lui // load upper immediate

q // quit

reg 0 sp // Knowing value stored in Stack Pointer

addi // add immediate 

Below is the screenshot for the commands used

2

Below is a self explanatory image of 64 bit instruction and instruction used in the RISC-V

3

Integer number representation

Integer number representation

64-bit Number System For Unsigned numbers Here first of all we will get familiar with few basic terminologies

Double Word:- Entire 64 bit number in processor language is called double word.

Word:- 32 bit number in processor language

Byte:- Group of 8 bits.

Total no. of pattern that can be formed is = (2^n -1); where n:- number of bits.

RISC-V doubleword can represent "0" to (2^64-1) unsigned numbers.

The following images shows terminologies range and binary to decimal conversion

Capture

range

Screenshot (76)

64 Number System for Signed Numbers

For getting negative numbers we use concept of 2's complement which is shown in the image below.

2's complement

Here we are devoting MSB for sign representation.

if MSB =1; number is negative if MSB =0; number is positive.

The image below describes the two method to convert negative binary numbers into decimal numbers

Screenshot (77)

Range for positive & negative numbers is shown below

positive number signed range

range -ve numbers

Lab for signed & unsigned numbers Here we will look at the range of unsigned and signed numbers.

Following is the code for highest unsigned number

#include <stdio.h>
#include <math.h>

int main ()
{ unsigned long long int max = (unsigned long long int) (pow(2,64) - 1);
  printf("highest number represented by unsigned long long int is %llu\n", max);
  return 0;
  }

To run the command I used following commands in the terminal

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o unsigned.o unsigned.c

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk unsigned.o

One can observe the output in the below image

signed highest

For getting the lowest negative number following C code was used

#include <stdio.h>
#include <math.h>

int main ()
{  long long int max = ( long long int) (pow(2,64) * - 1);
  printf("highest number represented by  long long int is %lld\n", max);
  return 0;
  }

To run the above code following commands were used

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o signed.o signed.c

 /home/divyam/riscv_toolchain/riscv64-unknown-2019.08.0-x86_64-linux-ubuntu14/bin/spike pk signed.o

The below image shows the output obtained

lowest

Now we will look at the range of least negative and highest positive number, code for which is given below

#include <stdio.h>
#include <math.h>
int main() {
long long int max = (int) (pow(2,63) -1);
long long int min = (int) (pow(2,63) * -1);
printf("highest number represented by long long int is %lld\n", max);
printf("lowest number represented by long long int is %lld\n", min);
return 0;

Here we can see the range is not correct.

both

The correct code and output is given below

#include <stdio.h>
#include <math.h>
int main() {
long long int max = (long long int) (pow(2,63) -1);
long long int min = (long long int) (pow(2,63) * -1);

printf("highest number represented by long long int is %lld\n", max);
printf("lowest number represented by long long int is %lld\n", min);
return 0;
}

both_modify

Day_2 Introduction to ABI & basic verification flow

Application Binary Interface

Application Binary interface

Introduction to Application binary interface- The way a user can access a architeture resources through system call is called application binary interface, its also calledsystem call interface. If application programmmer wants to access the hardware resources it is done via registers.

The below image shows the different levels between user and layout.

Screenshot (75)

3

In RISc-V programmer there are 32 registers & width is defined by XLEN. XLEN is 32 bit for RV32 XLEN is 64 bit for RV64.

registers

Memory Allocation for Double words

RISC-V has 32 64-bit registers. There are two ways in which data can be loaded to the register.

  1. Direct loading- In this method data is directly loaded to the register. The below image shows the method

1

  1. Via memory- Since we have limited registers in RISC-V the data is first stored in the memory this data is then transfered to registers. The below image show the method.

2

Little endian method- The RISC-V uses the little endian approach to fill the data in the memory i.e. the data from LSB gets start filling in the memory, from bottom to top respectively. A pictorial presentation of which is shown in the image below.

4

Load,Add and Store Instructions with examples

Here I came to know about the how data is transfered from memory to register and add operation on the data and then transfer of data from register to memory. Following commands were used to do the above operations.

ld x8, 16(x23) // ld stands from load. Initially the pointer is at 0. Since the data is at 16th location the register x23 will go to 16th location and load that 
                  data to into x8. x8 is destination register and x23 is source register.

add x8, x24,x8  // here the data of x8 and x24 is added and then finally stored in x8.

sd x8, 8(x23)  // here the data from x23 register is stored to the memory location starting from 8.

The whole process discussed above is shown in the below two images.

4

8

The above picture also describes which bits are indicate which part of the assembly level language. Every instruction in RISC-V is 32 bit.

Concluding 32-registers And their respective ABI names

There are following type of instructions

  1. R-type:- These instructions operate on registers.

  2. I-type:- These instructions consists immediate in it and operates on registers.

  3. S-type:- Instructions that consists store in it.

8

As we can observe there are 5 bits dedicated for register in the machine level code. As 2^5= 32 this the logic behind having 32 registers in the RISC-V architeture.

The RISC-V instructions are bifurgated in following types shown in the table below.

9

Labwork using ABI function calls

Study New Algorithm For Sum 1 to N using ASM Here we are going to apply the knowledge of instructions which we got familiar in the previous tutorial. Here we are going to push some functionalities from C program to assembly language program. And get fetch the end result from assembly level program to the C program. A pictorial view of the above mention method is shown below.

basic working

To apply this method we are going to follow the below algorithm shown in the picture

algorithm

Review ASM Function call Here I have modified my C code inorder to implement the method discussed above in the previous section, the modified C code is given below.

#include <stdio.h>

extern int load(int x, int y);

int main ()
{
   int result = 0;
   int count = 9;
   result = load(0x0, count+1);
   printf ("Sum of number from 1 to %d is %d\n", count, result);
   }

Here I have written assembly level program as well inorder to execute the algorithm the code for which is given below

.section .text
.global load
.type load, @function

load: 
        add      a4, a0, zero //Initialize sum register a4 with 0x0
        add      a2, a0, a1   //store count of 10 in register a2.Register a1 is loaded with 0xa (decimal 10) from main
        add      a3, a0, zero //initialize intermediate sum register a3 by 0
loop:   add      a4, a3, a4 //Incremental addition
        addi     a3, a3, 1 //Increment intermediate register by 1
        blt      a3, a2, loop //If a3 is less than a2, branch to label named <loop>
        add      a0, a4, zero //Store final result to register a0 so that it can be read by main program
        ret 

Simulate New C Program With Function Call Here I run the modified codes of C as well as the assembly langguage. The commands are similar to the ones used before one can observe them in the images below.

1

2

3

4

Lab to run C program on RISC-V CPU

Here we have a RISC-V CPU written in verilog & we will create a testbench. Then we will read the hex format C program through RISC-V CPU & output will be displayed.The whole process is described below.

10

To run the program in the terminal using following commands.

chmod 777 rv32im.sh
./rv32im..sh

The image below shows the output displayed in ubuntu terminal.

11

Day_3 Digital Logic with TL-verilog & makerchip

Combinational Logic in TL-verilog using Makerchip.

Introduction to Logic gates

Logic gates are the fundamental basic building blocks

gates

As logic gates are the basic building blocks of a circuit. Here I learned how I can implement the logic gates using TL-verilog. The table below describes respective code for the logic gates.

Logic gates verilog

A full adder circuit madeup of logic gates.

adder

A adder circuit made using logic gates.

carr

Basic Mux implementation & Introduction to makerchip

Basic mux 2x1 is made using the following commands, here we are using ternary operator which is similar to if statement in C program.

assign f = s ? x1 : x2;

2x1 mux

The below image shows the 4x1 mux implemented using 2x1 mux and verilog code for that as well

4x1 mux

Introduction to makerchip

  1. Type maker chip in tab of your search engine & launch Makerchip IDE.
  2. Go to Learn, click on Examples and select FPGA multipler.

MakerChip tutorial

Inverter Gate on makerchip

Inverter

Vector of 5 bits

vector

Mux with single bit

mux made me

Mux with vector input

mux vector

Combinational Calculator

calculator

Sequential Logic

Introduction to sequntial logic & counter lab

Sequential Circuit essentially consists a clock over combinational circuit. The value transition takes place on either positive or negative edge of the clock. The below image describes the basic idea of sequential circuit.

basic seq  circuit

Fibonacci Series

The below image gives an idea how the circuit for performing Fibonacci series is implemented.

fibbonacci series ckt and waveform

Free Running counter

The below image show code and working of a free running counter designed using sequential circuit, one can observe the importance of clock in the circuit as the output changes only for positive clock.

Counter circuit

The basic circuit block diagram is given below

count ckt

Sequential calculator lab

seq  calculator

Pipeline Logic

Pipelined logic & retiming

The concept of pipeling is explained using the Pythagoras theorem.

Basics of pythagoras theorem on makerchip

pytha

TL-verilog gives the ability to model the process in timing abstract representation. The basic idea of pipelining is to break the whole process in different stages. The below image shows the use of pipelining concept in TL-verilog compared to other RTL languages.

rtl vs tl-verilog

Timing abstract gives the advantage to manipulate pipelining & its stages. i.e staging is a physical attribute it has no impact on behaviour as shown in the below image

remtiming

The below image show the code for pipelining in TL-verilog.

tl verilog code

Image shows comparison of code between system verilog and TL-verilog.

s vl vs tlvl

Pipeline logic advantages and demo in platform

  1. By applying pipelining we are able to run our clock at higher speed.
  2. In diagram 2, one can observe that we can introduce new input at every clock cycle. So we can introduce more inputs using pipeline.

basic idea of pipelining

Here we will understand the minute details of pipelining concept.

Here in the below image one can observe that there is single stage pipeline, so the output for C comes at the same stage.

pytha single pipeline

Now when we change the single stage pipeline to 3 stage pipeline, now the output C comes 2 stage later than a & b. This can be observed in the below image.

pipelining pytha 3step

At last here we are seeing the concept of feedback how varying the no. of feedback stages in code gets reflected in the diagram of pipeline. Here in the code we have set the code for 4 stage feedback which can be observed in the diagram as well.

feedback concept

Lab on Error Conditions within Computation Pipeline

Classification

Pipe Signal- All the instuctions are written in lower case. e.g.-$lower_case

Pascal case/State Signal - In this the first letter of both terms is written in upper case. eg.- $CamelCase

Keyword Signal - All the letters in the instructions are written in upper case. e.g.- $UPPER_CASE.

Numbers end tokens - $base64_value-- This was is considered as a good practice in TL-Verilog. $bad_name_5 -- This is avoidable practice in TL--Verilog

Numeric identifiers- e.g. >>1 this instruction indicates ahead by 1.

For pipelining of error I used following code in makerchip

$reset = *reset;
  |comp 
     @1
        $err1 = $bad_input || $illegal_op;
     @2 
        $err2 = $err1 || $overflow;
     @3
        $err3 = $err2 || $div_by_zero;

The following picture shows the output

error ip

asked

Lab on 2-Cycle Calculator

Value Representation in Verilog

The below image show how numbers are represnted in verilog.

value representation

Validity

Validity is a notion for when the values or the signals are meaningful. Validity provides

  • Easier Debug
  • Cleaner Design
  • Better error checking
  • Automated Clock gating

Let us implement the Pythagoran's theorem with validity:

validity pythagoran

Clock Gating is a power-saving property.

  • Motivation

    1.1 Clock signals are distributed to EVERY flipflop.

    1.2 Clocks toggle twice per cycle.

    1.3 This consumes power.

  • Clock gating avoids toggling clock signals.

  • TL-verilog can produce fine-grained gating (or enables).

LAB- Distance Accumulator with Pythagoran's theorem.

validity pythagoran

LAB- Cycle calculator with Validity The pipeline structure is

validity pythagoran

The makerchip implementation output:

cycle calculator makerchip

LAB- Calculator with single value Memory

The pipeline str. is as follows

single memory strc

Makerchip Implementation

single memory strc

Wrap-UP

LAB - conway's game of life:

Conway game of life

LAB - Pythagoran's theorem:

pytha str

The makerchip output:

makerchip implementation

Day_4 Basic RISC-V CPU Archituecture

Introduction to Simple RISC-V Microarchiteture

The micro architecture for the RISC-V implementation is shown here:

1

Basic terminologies

Program counter - The Program counter is a pointer to the instruction memory as to which instrcution must be executed next.

Decoder - The Decoder interprets the instruction and send signals regarding the action of the processor and the location of data. The decoder also sends incremented by 1 value to the PC, instructing it to move to the next instruction.

Register Files - These implements the read and write operations on the data/memory.

ALU - ALU computes the arithmetic operations and write the result back to the register file.

Fetch & Decode

The implementation plan of RISC-V CPU Core:

2

LAB - PC:

The implementation pipeline

3

The makerchip output

4

LAB - FETCH The pipeline structure(part-1):

6

The pipeline structure(part-2):

7

The makerchip implementation output:

8

LAB - INSTRUCTION TYPE DECODE

The Pipeline Structure

9

The makechip output

10

LAB - INSTRUCTION IMMEDIATE DECODE

11

The implementation output:

12

LAB - INSTRUCTION FIELD DECODE

13

The implementation output:

14

LAB - INSTRUCTION DECODE_2

15

The implementation output:

16

RISC-V control logic

LAB - REGISTER FILE READ_1

The pipeline structure is as follows

17

18

LAB - REGISTER FILE READ_2

The pipeline structure:

19

The makerchip implementation output:

20

LAB - ALU

The pipeline structure

22

The implementation output

22

LAB - REGISTER FILE WRITE

23

The makerchip implementation output:

24

Arrays

25

The detailed implementation of Register files is given below:

26

The implementation output:

27

28

The output is shown in the image below

30

LAB- Test bench

The makerchip implementation output is as shown below:-

31

Day_5 Complete Pipelined RISC-V CPU micro-architecture

Pipelining the CPU

1

LAB - 3-CYCLE VALID SIGNAL

The implementation output is:

2

LAB - CYCLE RISC-V

3

The implementation output is shown below

4

Solutions to Pipeline Hazards

REGISTER FILE BYPASS

The pipeline structure is as follows

5

The implementation output is shown below in the image

6

LAB - BRANCHES

7

The implemented output is shown below

8

LAB-ALU

The makerchip implementation results are:

9

Load/Store Instructions and Completing RISC-V CPU

LOAD

The pipeline structure

10

The implementation output:

11

LOAD/STORE

12

The implementation output

13

JUMPS

14

The makerchip output:

15

RISC-V CORE CPU - FINAL IMPLEMENTATION

The RISC-V final code is shown below:

\m4_TLV_version 1d: tl-x.org
\SV
   // This code can be found in: https://github.com/stevehoover/RISC-V_MYTH_Workshop
   
   m4_include_lib(['https://raw.githubusercontent.com/Dsatle/Risc_V/main/risc-v_shell_lib.tlv'])

\SV
   m4_makerchip_module   // (Expanded in Nav-TLV pane.)
\TLV

   // /====================\
   // | Sum 1 to 9 Program |
   // \====================/
   //
   // Program for MYTH Workshop to test RV32I
   // Add 1,2,3,...,9 (in that order).
   //
   // Regs:
   //  r10 (a0): In: 0, Out: final sum
   //  r12 (a2): 10
   //  r13 (a3): 1..10
   //  r14 (a4): Sum
   //
   // External to function:
   m4_asm(ADD, r10, r0, r0)             // Initialize r10 (a0) to 0.
   // Function:
   m4_asm(ADD, r14, r10, r0)            // Initialize sum register a4 with 0x0
   m4_asm(ADDI, r12, r10, 1010)         // Store count of 10 in register a2.
   m4_asm(ADD, r13, r10, r0)            // Initialize intermediate sum register a3 with 0
   // Loop:
   m4_asm(ADD, r14, r13, r14)           // Incremental addition
   m4_asm(ADDI, r13, r13, 1)            // Increment intermediate register by 1
   m4_asm(BLT, r13, r12, 1111111111000) // If a3 is less than a2, branch to label named <loop>
   m4_asm(ADD, r10, r14, r0)            // Store final result to register a0 so that it can be read by main program
   m4_asm(SW, r0, r10, 10000)           // Store the final result value to byte address 16
   m4_asm(LW, r15, r0, 10000)           // Load the final result value from adress 16 to x17
   
   // Optional:
   // m4_asm(JAL, r7, 00000000000000000000) // Done. Jump to itself (infinite loop). (Up to 20-bit signed immediate plus implicit 0 bit (unlike JALR) provides byte address; last immediate bit should also be 0)
   m4_define_hier(['M4_IMEM'], M4_NUM_INSTRS)


   |cpu
      @0
         $reset = *reset;
         
         //MODIFIED NEXT PC LOGIC FOR INCLUDING BRANCH INSTRCUTIONS
         $pc[31:0] = >>1$reset ? 32'b0 :
                     >>3$valid_taken_branch ? >>3$br_target_pc :
                     >>3$valid_load ? >>3$inc_pc :
                     >>3$valid_jump && >>3$is_jal ? >>3$br_target_pc :
                     >>3$valid_jump && >>3$is_jalr ? >>3$jalr_target_pc :
                     >>1$inc_pc ;
         //START LOGIC TO PROVIDE FIRST VALID LOGIC
         //$start = (>>1$reset && $reset == 0) ? 1'b1 : 1'b0;
         //$valid = $reset ? 1'b0 :
                  //$start ? 1'b1 : >>3$valid;
     
      @1  
         //INSTRUCTION FETCH
         $inc_pc[31:0] = $pc + 32'd4;
         
         $imem_rd_en = !$reset;
         $imem_rd_addr[M4_IMEM_INDEX_CNT-1:0] = $pc[M4_IMEM_INDEX_CNT+1:2];
         
         $instr[31:0] = $imem_rd_data[31:0];
         
         //INSTRUCTION TYPES DECODE        
         
         $is_u_instr = $instr[6:2] ==? 5'b0x101;
         
         $is_s_instr = $instr[6:2] ==? 5'b0100x;
         
         $is_r_instr = $instr[6:2] ==? 5'b011x0 ||
                       $instr[6:2] ==? 5'b01011 ||
                       $instr[6:2] ==? 5'b10100;
         
         $is_j_instr = $instr[6:2] ==? 5'b11011;
         
         $is_i_instr = $instr[6:2] ==? 5'b0000x ||
                       $instr[6:2] ==? 5'b001x0 ||
                       $instr[6:2] ==? 5'b11001;
         
         $is_b_instr = $instr[6:2] ==? 5'b11000;
         
         //INSTRUCTION IMMEDIATE DECODE
         $imm[31:0] = $is_i_instr ? {{21{$instr[31]}}, $instr[30:20]} :
                      $is_s_instr ? {{21{$instr[31]}}, $instr[30:25], $instr[11:7]} :
                      $is_b_instr ? {{20{$instr[31]}}, $instr[7], $instr[30:25], $instr[11:8], 1'b0} :
                      $is_u_instr ? {$instr[31:12], 12'b0} :
                      $is_j_instr ? {{12{$instr[31]}}, $instr[19:12], $instr[20], $instr[30:21], 1'b0} :
                                                            32'b0;
         //INSTRUCTION DECODE
         $opcode[6:0] = $instr[6:0];
         
         
         //INSTRUCTION FIELD DECODE
         $rs2_valid = $is_r_instr || $is_s_instr || $is_b_instr;
         ?$rs2_valid
            $rs2[4:0] = $instr[24:20];
           
         $rs1_valid = $is_r_instr  || $is_s_instr || $is_b_instr || $is_i_instr;
         ?$rs1_valid
            $rs1[4:0] = $instr[19:15];
         
         $funct3_valid = $is_r_instr  || $is_s_instr || $is_b_instr || $is_i_instr;
         ?$funct3_valid
            $funct3[2:0] = $instr[14:12];
           
         $funct7_valid = $is_r_instr ;
         ?$funct7_valid
            $funct7[6:0] = $instr[31:25];
           
         $rd_valid = $is_r_instr  || $is_u_instr || $is_j_instr || $is_i_instr;
         ?$rd_valid
            $rd[4:0] = $instr[11:7];
         
         
      @2
         //INSTRUCTION DECODE
         $dec_bits[10:0] = {$funct7[5],$funct3,$opcode};
         $is_beq = $dec_bits ==? 11'bx_000_1100011;
         $is_bne = $dec_bits ==? 11'bx_001_1100011;
         $is_blt = $dec_bits ==? 11'bx_100_1100011;
         $is_bge = $dec_bits ==? 11'bx_101_1100011;
         $is_bltu = $dec_bits ==? 11'bx_110_1100011;
         $is_bgeu = $dec_bits ==? 11'bx_111_1100011;
         $is_addi = $dec_bits ==? 11'bx_000_0010011;
         $is_add = $dec_bits ==? 11'b0_000_0110011;
         $is_lui = $dec_bits ==? 11'bx_xxx_0110111;
         $is_auipc = $dec_bits ==? 11'bx_xxx_0010111;
         $is_jal = $dec_bits ==? 11'bx_xxx_1101111;
         $is_jalr = $dec_bits ==? 11'bx_000_1100111;
         $is_load = $opcode == 7'b0000011;
         $is_sb = $dec_bits ==? 11'bx_000_0100011;
         $is_sh = $dec_bits ==? 11'bx_001_0100011;
         $is_sw = $dec_bits ==? 11'bx_010_0100011;
         $is_slti = $dec_bits ==? 11'bx_010_0010011;
         $is_sltiu = $dec_bits ==? 11'bx_011_0100011;
         $is_xori = $dec_bits ==? 11'bx_100_0100011;
         $is_ori = $dec_bits ==? 11'bx_110_0100011;
         $is_andi = $dec_bits ==? 11'bx_111_0100011;
         $is_slli = $dec_bits ==? 11'b0_001_0100011;
         $is_srli = $dec_bits ==? 11'b0_101_0100011;
         $is_srai = $dec_bits ==? 11'b1_101_0100011;
         $is_sub = $dec_bits ==? 11'b1_000_0110011;
         $is_sll = $dec_bits ==? 11'b0_001_0110011;
         $is_slt = $dec_bits ==? 11'b0_010_0110011;
         $is_sltu = $dec_bits ==? 11'b0_011_0110011;
         $is_xor = $dec_bits ==? 11'b0_100_0110011;
         $is_srl = $dec_bits ==? 11'b0_101_0110011;
         $is_sra = $dec_bits ==? 11'b1_101_0110011;
         $is_or = $dec_bits ==? 11'b0_110_0110011;
         $is_and = $dec_bits ==? 11'b0_111_0110011;
         
         $jalr_target_pc[31:0] = $src1_value +$imm ;
      @3
         $is_jump = $is_jal || $is_jalr ;   
         `BOGUS_USE($is_beq $is_bne $is_blt $is_bge $is_bltu $is_bgeu $is_addi $is_add
                    $is_lui $is_auipc $is_jal $is_jalr $is_load $is_sb $is_sh $is_sw $is_slti
                    $is_sltiu $is_xori $is_ori $is_andi $is_slli $is_srli $is_srai $is_sub $is_sll
                    $is_slt $is_sltu $is_xor $is_srl $is_sra $is_or $is_and)
         
      @2  
         //REGISTER FILE READ
         //$rf_wr_en = 1'b0;
         //$rf_wr_index[4:0] = 5'b0;
         //$rf_wr_data[31:0] = 32'b0;
         $rf_rd_en1 = $rs1_valid;
         $rf_rd_index1[4:0] = $rs1;
         $rf_rd_en2 = $rs2_valid;
         $rf_rd_index2[4:0] = $rs2;
         
         $src1_value[31:0] = >>1$rf_wr_en && (>>1$rf_wr_index == $rf_rd_index1) ? >>1$result : $rf_rd_data1;
         $src2_value[31:0] = >>1$rf_wr_en && (>>1$rf_wr_index == $rf_rd_index2) ? >>1$result : $rf_rd_data2;
         $br_target_pc[31:0] = $pc +$imm;
         
      @3  
         //ARITHMETIC AND LOGIC UNIT (ALU)
         
         $sltu_rslt[31:0] = $src1_value < $src2_value;
         $sltiu_rslt[31:0] = $src1_value < $imm;
         $result[31:0] = $is_addi ? $src1_value + $imm :
                         $is_add ? $src1_value + $src2_value :
                         $is_andi ? $src1_value & $imm :
                         $is_ori ? $src1_value | $imm :
                         $is_xori ? $src1_value ^ $imm :
                         $is_slli ? $src1_value << $imm[5:0] :
                         ($is_addi || $is_load || $is_s_instr) ? $src1_value + $imm :
                         $is_srli ? $src1_value >> $imm[5:0] :
                         $is_and ? $src1_value & $src2_value :
                         $is_or ? $src1_value | $src2_value :
                         $is_xor ? $src1_value ^ $src2_value :
                         $is_sub ? $src1_value - $src2_value :
                         $is_sll ? $src1_value << $src2_value[4:0] :
                         $is_srl ? $src1_value >> $src2_value[4:0] :
                         $is_sltu ? $sltu_rslt :
                         $is_sltiu ? $sltiu_rslt :
                         $is_lui ? {$imm[31:12],12'b0} :
                         $is_auipc ? $pc + $imm :
                         $is_jal ? $pc + 4 :
                         $is_jalr ? $pc + 4 :
                         $is_srai ? { {32{$src1_value[31]}},$src1_value} >> $imm[4:0] :
                         $is_slt ? ($src1_value[31] == $src2_value[31]) ? $sltu_rslt : {31'b0,$src1_value[31]} :
                         $is_slti ? ($src1_value[31] == $imm[31]) ? $sltiu_rslt : {31'b0,$src1_value[31]} :
                         $is_sra ? { {32{$src1_value[31]}},$src1_value} >> $src2_value[4:0] :
                         32'bx;
         
         
         //REGISTER FILE WRITE
         $rf_wr_en = ($rd_valid && $rd != 5'b0 && $valid) || >>2$valid_load;
         $rf_wr_index[4:0] = >>2$valid_load ? >>2$rd : $rd;
         $rf_wr_data[31:0] = >>2$valid_load ? >>2$ld_data : $result;
         
         
         //BRANCH INSTRUCTIONS 1
         $taken_branch = $is_beq ? ($src1_value == $src2_value):
                         $is_bne ? ($src1_value != $src2_value):
                         $is_blt ? (($src1_value < $src2_value) ^ ($src1_value[31] != $src2_value[31])):
                         $is_bge ? (($src1_value >= $src2_value) ^ ($src1_value[31] != $src2_value[31])):
                         $is_bltu ? ($src1_value < $src2_value):
                         $is_bgeu ? ($src1_value >= $src2_value):
                         1'b0;
          //CYCLE VALID INSTRUCTIONS
         $valid = !(>>1$valid_taken_branch || >>2$valid_taken_branch ||
                    >>1$valid_load || >>2$valid_load) ;
         
         $valid_load = $valid && $is_load ;
         //$valid = !(>>1$valid_taken_branch || >>2$valid_taken_branch);
         $valid_taken_branch = $valid && $taken_branch;
         $valid_jump = $is_jump && $valid ;
         `BOGUS_USE($taken_branch)
      @4
         //MINI 1-R/W MEMORY
         $dmem_wr_en = $is_s_instr && $valid ;
         $dmem_addr[3:0] = $result[5:2] ;
         $dmem_wr_data[31:0] = $src2_value ;
         $dmem_rd_en = $is_load ;
         
      @5
         //LOAD DATA
         $ld_data[31:0] = $dmem_rd_data ;   
         
         
         

      // Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
      //       be sure to avoid having unassigned signals (which you might be using for random inputs)
      //       other than those specifically expected in the labs. You'll get strange errors for these.

   
   // Assert these to end simulation (before Makerchip cycle limit).
   //*passed = *cyc_cnt > 40;
   *passed = |cpu/xreg[15]>>5$value == (1+2+3+4+5+6+7+8+9) ;
   *failed = 1'b0;
   
   // Macro instantiations for:
   //  o instruction memory
   //  o register file
   //  o data memory
   //  o CPU visualization
   |cpu
      m4+imem(@1)    // Args: (read stage)
      m4+rf(@2, @3)  // Args: (read stage, write stage) - if equal, no register bypass is required
      m4+dmem(@4)    // Args: (read/write stage)
   
   m4+viz(@4)    // For visualisation, argument should be at least equal to the last stage of CPU logic
   //@4 would work for all lab
\SV
   endmodule

16

Word Of Thanks

I am thankful to Kunal Ghosh ( cofounder & CEO VSD) for providing me good quality content & resources and guiding me through out the workshop. I would also like to thank Steeve Hoover (founder of Redwood EDA), for making me understand concepts of TL-Verilog and how to implement those on makerchip.

References

  1. https://www.vsdiat.com

  2. https://github.com/stevehoover/RISC-V_MYTH_Workshop

  3. http://makerchip.com/sandbox/

  4. https://github.com/kunalg123/riscv_workshop_collaterals

  5. Alwin Shahju, colleague IIIT Bangalore

  6. Lasya, colleague IIIT Bangalore

  7. Bhargava DV colleague IIIT Bangalore

  8. Pruthvi Parate colleague IIIT Bangalore

About