decoder support for register operands to compressed instructions

Question

decoder support for register operands to compressed instructions

iamkarthikbk opened this issue a year ago · comments

rvopcodesdecoder decodes reg operands and immediate values for all instructions. it does not do so for compressed instructions.

Here's an example of how the compressed instructions look post-decode:

{'instr': 17304, 'instr_name': 'c.lw', 'instr_addr': 2147487848, 'rd': None, 'rs1': None, 'rs2': None, 'rs3': None, 'imm': None, 'zimm': None, 'csr': None, 'shamt': None, 'succ': None, 'pred': None, 'rl': None, 'aq': None, 'rm': None, 'reg_commit': ('x', '14', '0x000000000000000a'), 'csr_commit': None, 'mnemonic': None, 'is_rvp': False, 'rs1_nregs': 1, 'rs2_nregs': 1, 'rs3_nregs': 1, 'rd_nregs': 1}

here's the commit for that instruction (in my program, based on spike):

instr: 17304 addr: 0x80001068 instr_name: c.lw reg_commit: ('x', '14', '0x000000000000000a')

here's what is actually supposed to happen (example of lw):

{'instr': 67250179, 'instr_name': 'lw', 'instr_addr': 2147488558, 'rd': (16, 'x'), 'rs1': (4, 'x'), 'rs2': None, 'rs3': None, 'imm': 64, 'zimm': None, 'csr': None, 'shamt': None, 'succ': None, 'pred': None, 'rl': None, 'aq': None, 'rm': None, 'reg_commit': ('x', '16', '0x0000000000000007'), 'csr_commit': None, 'mnemonic': None, 'is_rvp': False, 'rs1_nregs': 1, 'rs2_nregs': 1, 'rs3_nregs': 1, 'rd_nregs': 1}

isac's decoder currently does this (link):

            for arg in args[:-1]:
                if arg == 'rd':
                    treg = reg_type
                    if any([instr_name.startswith(x) for x in [
                            'fcvt.w','fcvt.l','fmv.s','fmv.d','flt','feq','fle','fclass']]):
                        treg = 'x'
                    temp_instrobj.rd = (int(get_arg_val(arg)(mcode), 2), treg)
                if arg == 'rs1':
                    treg = reg_type
                    if any([instr_name.startswith(x) for x in [
                            'fsw','fsd','fcvt.s','fcvt.d','fmv.w','fmv.l']]):
                        treg = 'x'
                    temp_instrobj.rs1 = (int(get_arg_val(arg)(mcode), 2), treg)
                if arg == 'rs2':
                    treg = reg_type
                    temp_instrobj.rs2 = (int(get_arg_val(arg)(mcode), 2), treg)
                if arg == 'rs3':

here's the actual list of variable fields from riscv-opcodes -- being looked up in the arg_lut:

c.lw :: ['rd_p', 'rs1_p', 'c_uimm7lo', 'c_uimm7hi', 'rv_c']

isac's decoder needs to be modified to take into account, these compressed field names, and decode them appropriately from here(link). here's an example code implementation (not compiled) for starters:

                 if 'rd' in arg:
                    treg = reg_type
                    if any([instr_name.startswith(x) for x in [
                            'fcvt.w','fcvt.l','fmv.s','fmv.d','flt','feq','fle','fclass']]):
                        treg = 'x'
                    temp_instrobj.rd = (int(get_arg_val(arg)(mcode), 2), treg)
                if 'rs1' in arg:
                    treg = reg_type
                    if any([instr_name.startswith(x) for x in [
                            'fsw','fsd','fcvt.s','fcvt.d','fmv.w','fmv.l']]):
                        treg = 'x'
                    temp_instrobj.rs1 = (int(get_arg_val(arg)(mcode), 2), treg)
                    # print(f'{instr_name} rs1: {temp_instrobj.rs1}')
                if 'rs2' in arg:
                    treg = reg_type
                    temp_instrobj.rs2 = (int(get_arg_val(arg)(mcode), 2), treg)

the change is in the if condition where i replaced the check for arg == 'rd' with 'rd' in arg.
this change reflects the correct register operand in the decoded artifacts:

{'instr': 17304, 'instr_name': 'c.lw', 'instr_addr': 2147487848, 'rd': (6, 'x'), 'rs1': (7, 'x'), 'rs2': None, 'rs3': None, 'imm': None, 'zimm': None, 'csr': None, 'shamt': None, 'succ': None, 'pred': None, 'rl': None, 'aq': None, 'rm': None, 'reg_commit': ('x', '14', '0x000000000000000a'), 'csr_commit': None, 'mnemonic': None, 'is_rvp': False, 'rs1_nregs': 1, 'rs2_nregs': 1, 'rs3_nregs': 1, 'rd_nregs': 1}

immediates have not been fixed in this example.
there might be a minor quibble in the get_instr function in the same rvopcodesdecoder when it comes to compressed instructions. this will need to be checked as well.

Edwin Joy · Answer 1 · Thu Jun 22 2023 18:47:54 GMT+0800 (China Standard Time)

The constant operand field names defined in link for compressed instructions have changed over time and the same has not been reflected in rvopcodesdecoder.py.

For example, c.lw instruction has arguments ['rd_p', 'rs1_p', 'c_uimm7lo', 'c_uimm7hi']. Now, we need to set rd_p as rd and rs1_p as rs1 in the instance of InstructionObject class. The immediate fields c_uimm7lo and c_uimm7hi should be combined together to form a single imm as specified by the instruction's encoding in the spec.

We can go ahead with the potential fix mentioned above in the issue statement for rs1*, rs2* and rd*. For immediate fields, it would translate to adding/changing the following in the if-elsif-else ladder:

if arg == 'c_uimm7hi':
    imm_temp = get_arg_val(arg)(mcode)
    if imm:
        imm = imm[-1] + imm_temp + imm[0] + '00'
    else:
        imm = imm_temp + imm
if arg == 'c_uimm7lo':
    imm_temp = get_arg_val(arg)(mcode)
    if imm:
        imm = imm_temp[-1] + imm + imm_temp[0] + '00'
    else:
        imm = imm + imm_temp

Similar additions will be required for all other entries which are not present in this said if-elseif-else ladder.

The disassembler class, correctly parses all instruction encodings properly from riscv-opcodes and generates the instruction dictionary. But the get_instr() function has some bugs when it comes to decoding a compressed instruction. The following function could be a suitable replacement when decoding only compressed instruction. I have not tested this for other instructions:

def get_instr(func_dict, mcode: int, xlen):
    '''
    Recursively decodes given mcode
    '''
    # Get list of functions
    tuple_flag = False
    keys = func_dict.keys()
    num_keys = len(keys)
    for key in keys:
        if type(key) == str and ((num_keys == 1) or tuple_flag):
            return (key, func_dict[key])
        elif type(key) == tuple:
            val = get_funct(key, mcode)
            tuple_flag = True
        else:        # Multiple instructions in leaf node (for different xlen)
            for key in keys:
                args = func_dict[key]
                if xlen in args[-1]:
                    return (key, func_dict[key])
        if val in list(func_dict[key].keys()):           
            temp_func_dict = func_dict[key][val]
        else:
            continue

        if temp_func_dict.keys():
            a = get_instr(temp_func_dict, mcode, xlen)
            if a == None:
                continue
            else:
                return a
        else:
            continue

Pawan Kumar Sanjaya · Answer 2 · Fri Jun 23 2023 12:11:35 GMT+0800 (China Standard Time)

Feel free to raise a PR after testing.

Karthik B K · Answer 3 · Fri Jul 28 2023 18:22:01 GMT+0800 (China Standard Time)

This issue was resolved in #73.