simon987 / Much-Assembly-Required

Assembly programming game

Home Page:https://muchassemblyrequired.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SET family instructions

kevinramharak opened this issue · comments

I was wondering if there is any more room in the instruction set opcodes to implement the SET condition instructions.

These instructions will reduce the amount of boilerplate assembly and labels.

For example:

; a comparison of `2 != 3` where the result of the expression will be stored in X
func:
  MOV A, 3
  MOV B, 2
  CMP A. B
  JNZ  not_equals_true
  MOV X, 0
  JMP not_equals_end
not_equals_true:
  MOV X, 1
not_equals_end:
  ret

can be reduced to:

func:
  MOV A, 3
  MOV B, 2
  CMP A, B
  SETE X     ; will set X to ZERO_FLAG status
  ret

Or am I missing an easier implementation of a comparison?

Ok there should be enough space for those, make sure that the opcodes don't collide

There are a total of 15 available instruction opcodes not in use.

Implementing the SETcc family will reserve 14 opcodes 50 - 63 leaving opcode 8 as the only not used and not reserved opcode.

And opcode 8 would be reserved for the INTO instruction. #78

I suppose you could implement all the SET family instructions with one opcode. You could make it use an immediate value that determines what conditions to check.

An immediate value is 16 bits, and the condition flags (not counting the BRK flag) are 4 bits. There is more than enough bit-space for all possible combinations of condition flags.

@DBJ314 That would have my preference but I am not sure how to implement that.

machine code layout is (according to Instruction Set and source code):

Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Use S S S S S D D D D D O O O O O O
Word 0 Word 1 Word 2
Basic Instruction Source operand Destination operand

@simon987 Instruction Encoding states the bit layout is dddddsssssoooooo. This is incorrect right?

Looking at the ousob docs

    Instruction        SET to 1 if ... else to 0            Flags
    SETA, SETNBE       Above, Not Below or Equal            CF=0 AND ZF=0
    SETAE,SETNB,SETNC  Above or Equal, Not Below, No Carry  CF=0
    SETBE, SETNA       Below or Equal, Not Above            CF=1 OR ZF=1
    SETB, SETC,SETNAE  Below, Carry, Not Above or Equal     CF=1
    SETE, SETZ         Equal, Zero                          ZF=1
    SETNE, SETNZ       Not Equal, Not Zero                  ZF=0

    SETG, SETNLE       Greater, Not Less or Equal           SF=OF AND ZF=0
    SETGE, SETNL       Greater or Equal, Not Less           SF=OF
    SETLE, SETNG       Less or Equal, Not Greater           SF<>OF OR ZF=1
    SETL, SETNGE       Less, Not Greater or Equal           SF<>OF
    SETO               Overflow                             OF=1
    SETNO              No Overflow                          OF=0
    SETS               Sign (negative)                      SF=1
    SETNS              No Sign (positive)                   SF=0

We definitely could use the source selector as a bitmask against the the flags as in PUSHF does. We still need to encode if we are querying for a 0 or 1. I would suggest the remaining selector bit could function as that (Q bit below).

EDIT: after looking at CPU#executeInstruction this would not work without modifying that function.
It could still work with an instruction flag that states it needs special processing, but I realize that would probably make it complex just to save 1 word of instruction space.
An implementation where word 1 or word 2 of a 3 word instruction is used to specify what flags to check would introduce less complexity and no reworking of the CPU

The only problem I see is the SF<>OF (SIGN_FLAG != OVERFLOW_FLAG). I don't know if that would fit in this encoding. Another opcode could be used to encode those 2 specific instructions.

This would make it an instruction of 1 to 2 words, where word 1 would be encoded as:

Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Use Q S Z C O D D D D D O O O O O O

And word 2 would be an optional memory address if the destination is not a register.

Tough I would have to ask @simon987 if this will not conflict with other parts of the MAR CPU architecture. This will also add another layer of complexity on top of the instruction encoding.

This would save 12 (or 13) opcodes for other uses. I know @simon987 has no interest to add instructions in the foreseeable future but there are still a lot of instructions i
noticed at http://www.ousob.com/ng/iapx86/ng2e5.php that could be useful and allow the programmer to express intent more clearly. A downside of more instructions would be additional complexity and maybe a higher barrier of entrance for someone that is exploring MAR.

Also worth to note that this approach could be extended towards the Jcc family of instructions. That would bring in the same possible downsides and force a reassembling of existing code so there is no reason to implement this now.

A proposed implementation would add the SetccInstruction class and if the architecture needs a class for every instruction then these can extend the SetccInstruction class and set their mnemonic but keep the same opcode.

Yes, the instruction encoding page gets the encoding wrong.

The first instruction word of any instruction specifies the opcode and the 2 operand modes. An operand mode specifies the type of operand and the register used, but not an immediate value. If an immediate value (or memory address, or register displacement) is needed, it takes up a word right after the instruction. If both operands need 16-bit operands, the source operand comes first.

So the SET instruction would require the source operand specifier to be IMMEDIATE16 with the destination any operand except IMMEDIATE16.

The instruction classes seem to have several execute methods with a different prototype for each allowed combination of source and destination operands. You would only care about the ones where the source is an immediate value.

After digging trough the instruction set I noticed there are 2 more instruction opcodes in use. 0 is used for BRK and 63 is used for NOP. Which would leave 13 remaining opcodes.

@simon987
So there are 2 main problems of encoding the SETcc family as one opcode. The first is that the abstract class Instruction marks its void encode(...) functions without an access level modifier which will set them to package-private. Setting these to public (or protected) will allow the other instructions to override the encode function. This would allow the SETcc instruction to use its source operand word for its family opcode type.

Another would be that assembler fetches an instruction by its mnemonic. Because of how aliases currently work this would fetch the correct 'overseeing' SETcc instruction. The problem is that this function currently does not allow for the assembler to also pass the found mnemonic. Without that string the SetccInstruction#encode function can not figure out what family opcode type to encode in its source operand word.

Implemented in #191