decoder support for register operands to compressed instructions
iamkarthikbk opened this issue · comments
rvopcodesdecoder
decodes reg operands and immediate values for all instructions. it does not do so for compressed instructions.
Here's an example of how the compressed instructions look post-decode:
{'instr': 17304, 'instr_name': 'c.lw', 'instr_addr': 2147487848, 'rd': None, 'rs1': None, 'rs2': None, 'rs3': None, 'imm': None, 'zimm': None, 'csr': None, 'shamt': None, 'succ': None, 'pred': None, 'rl': None, 'aq': None, 'rm': None, 'reg_commit': ('x', '14', '0x000000000000000a'), 'csr_commit': None, 'mnemonic': None, 'is_rvp': False, 'rs1_nregs': 1, 'rs2_nregs': 1, 'rs3_nregs': 1, 'rd_nregs': 1}
here's the commit for that instruction (in my program, based on spike):
instr: 17304 addr: 0x80001068 instr_name: c.lw reg_commit: ('x', '14', '0x000000000000000a')
here's what is actually supposed to happen (example of lw
):
{'instr': 67250179, 'instr_name': 'lw', 'instr_addr': 2147488558, 'rd': (16, 'x'), 'rs1': (4, 'x'), 'rs2': None, 'rs3': None, 'imm': 64, 'zimm': None, 'csr': None, 'shamt': None, 'succ': None, 'pred': None, 'rl': None, 'aq': None, 'rm': None, 'reg_commit': ('x', '16', '0x0000000000000007'), 'csr_commit': None, 'mnemonic': None, 'is_rvp': False, 'rs1_nregs': 1, 'rs2_nregs': 1, 'rs3_nregs': 1, 'rd_nregs': 1}
isac's decoder currently does this (link):
for arg in args[:-1]:
if arg == 'rd':
treg = reg_type
if any([instr_name.startswith(x) for x in [
'fcvt.w','fcvt.l','fmv.s','fmv.d','flt','feq','fle','fclass']]):
treg = 'x'
temp_instrobj.rd = (int(get_arg_val(arg)(mcode), 2), treg)
if arg == 'rs1':
treg = reg_type
if any([instr_name.startswith(x) for x in [
'fsw','fsd','fcvt.s','fcvt.d','fmv.w','fmv.l']]):
treg = 'x'
temp_instrobj.rs1 = (int(get_arg_val(arg)(mcode), 2), treg)
if arg == 'rs2':
treg = reg_type
temp_instrobj.rs2 = (int(get_arg_val(arg)(mcode), 2), treg)
if arg == 'rs3':
here's the actual list of variable fields from riscv-opcodes -- being looked up in the arg_lut
:
c.lw :: ['rd_p', 'rs1_p', 'c_uimm7lo', 'c_uimm7hi', 'rv_c']
isac's decoder needs to be modified to take into account, these compressed field names, and decode them appropriately from here(link). here's an example code implementation (not compiled) for starters:
if 'rd' in arg:
treg = reg_type
if any([instr_name.startswith(x) for x in [
'fcvt.w','fcvt.l','fmv.s','fmv.d','flt','feq','fle','fclass']]):
treg = 'x'
temp_instrobj.rd = (int(get_arg_val(arg)(mcode), 2), treg)
if 'rs1' in arg:
treg = reg_type
if any([instr_name.startswith(x) for x in [
'fsw','fsd','fcvt.s','fcvt.d','fmv.w','fmv.l']]):
treg = 'x'
temp_instrobj.rs1 = (int(get_arg_val(arg)(mcode), 2), treg)
# print(f'{instr_name} rs1: {temp_instrobj.rs1}')
if 'rs2' in arg:
treg = reg_type
temp_instrobj.rs2 = (int(get_arg_val(arg)(mcode), 2), treg)
the change is in the if
condition where i replaced the check for arg == 'rd'
with 'rd' in arg
.
this change reflects the correct register operand in the decoded artifacts:
{'instr': 17304, 'instr_name': 'c.lw', 'instr_addr': 2147487848, 'rd': (6, 'x'), 'rs1': (7, 'x'), 'rs2': None, 'rs3': None, 'imm': None, 'zimm': None, 'csr': None, 'shamt': None, 'succ': None, 'pred': None, 'rl': None, 'aq': None, 'rm': None, 'reg_commit': ('x', '14', '0x000000000000000a'), 'csr_commit': None, 'mnemonic': None, 'is_rvp': False, 'rs1_nregs': 1, 'rs2_nregs': 1, 'rs3_nregs': 1, 'rd_nregs': 1}
immediates have not been fixed in this example.
there might be a minor quibble in the get_instr
function in the same rvopcodesdecoder
when it comes to compressed instructions. this will need to be checked as well.
The constant operand field names defined in link for compressed instructions have changed over time and the same has not been reflected in rvopcodesdecoder.py
.
For example, c.lw
instruction has arguments ['rd_p', 'rs1_p', 'c_uimm7lo', 'c_uimm7hi']
. Now, we need to set rd_p
as rd
and rs1_p
as rs1
in the instance of InstructionObject
class. The immediate fields c_uimm7lo
and c_uimm7hi
should be combined together to form a single imm
as specified by the instruction's encoding in the spec.
We can go ahead with the potential fix mentioned above in the issue statement for rs1*
, rs2*
and rd*
. For immediate fields, it would translate to adding/changing the following in the if-elsif-else
ladder:
if arg == 'c_uimm7hi':
imm_temp = get_arg_val(arg)(mcode)
if imm:
imm = imm[-1] + imm_temp + imm[0] + '00'
else:
imm = imm_temp + imm
if arg == 'c_uimm7lo':
imm_temp = get_arg_val(arg)(mcode)
if imm:
imm = imm_temp[-1] + imm + imm_temp[0] + '00'
else:
imm = imm + imm_temp
Similar additions will be required for all other entries which are not present in this said if-elseif-else ladder.
The disassembler
class, correctly parses all instruction encodings properly from riscv-opcodes and generates the instruction dictionary. But the get_instr()
function has some bugs when it comes to decoding a compressed instruction. The following function could be a suitable replacement when decoding only compressed instruction. I have not tested this for other instructions:
def get_instr(func_dict, mcode: int, xlen):
'''
Recursively decodes given mcode
'''
# Get list of functions
tuple_flag = False
keys = func_dict.keys()
num_keys = len(keys)
for key in keys:
if type(key) == str and ((num_keys == 1) or tuple_flag):
return (key, func_dict[key])
elif type(key) == tuple:
val = get_funct(key, mcode)
tuple_flag = True
else: # Multiple instructions in leaf node (for different xlen)
for key in keys:
args = func_dict[key]
if xlen in args[-1]:
return (key, func_dict[key])
if val in list(func_dict[key].keys()):
temp_func_dict = func_dict[key][val]
else:
continue
if temp_func_dict.keys():
a = get_instr(temp_func_dict, mcode, xlen)
if a == None:
continue
else:
return a
else:
continue
Feel free to raise a PR after testing.
This issue was resolved in #73.