Takashiidobe / vm

A toy virtual machine that can run its own assembly and compile to binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VM

This project implements a small VM language that supports encoding to binary and decoding from binary to run a program.

Usage

Instructions can be written in an assembly form, encoded to binary, and run from the binary form.

Take this file, which counts from 0 to 4:

$ cat asm/while-loop.asm
putreg 0 R0
putreg 1 R1
putreg 5 R2
eq R0 R2
jumptrue 3
printreg R0
add R1 R0
jump -5
putreg 0 R0
ret

We can encode this into binary:

$ cargo r -q -- -e asm/while-loop.asm out.bin

And xxd it to see its binary form:

$ xxd out.bin
00000000: 0100 0000 0101 0001 0105 0002 1300 0211  ................
00000010: 0300 0900 0501 0010 fbff 0100 0000 00    ...............

We can then run this binary in the VM:

$ cargo r -q -- -d out.bin
0
1
2
3
4

It's also possible to run the assembly directly on the VM:

$ cargo r -q -- -r asm/while-loop.asm
0
1
2
3
4

VM Internals

This project implements a VM with 16 registers (R0..R16), and a stack of 65536 two-byte words. There is also a condition flag, which is used for jumping back and forth, and an instruction pointer to keep track of which instruction is currently used.

The language currently supports a few instructions, which take either Immediates (currently, a u16), a stack pos (a u16), which indexes into the stack, an offset (an i16), to jump back and forth in the instructions, or a Register, a u8 that indicates which register to use.

#[derive(Debug, Clone, PartialEq)]
pub enum Instruction {
    Ret,                    // Return R0
    PutReg(Immediate, Reg), // Put u16 -> Reg
    CopySR(StackPos, Reg),  // Load Stack -> Reg
    CopyRR(Reg, Reg),       // Copy Reg -> Reg
    CopyRS(Reg, StackPos),  // Copy Reg -> Stack
    Add(Reg, Reg),          // Add R1, R2 -> R2
    Sub(Reg, Reg),          // Sub R1, R2 -> R2
    Mul(Reg, Reg),          // Mul R1, R2 -> R2
    Div(Reg, Reg),          // Div R1, R2 -> R2
    PrintReg(Reg),          // Print Reg
    Jump(Offset),           // Jump Forward or backward
    JumpTrue(Offset),       // Jump Forward or backwards if the condition flag is true.
    JumpFalse(Offset),      // Jump Forward or backwards if the condition flag is false.
    Eq(Reg, Reg),           // Compare R1 to R2, setting the condition flag to R1 == R2
    Neq(Reg, Reg),          // Compare R1 to R2, setting the condition flag to R1 != R2
    Lt(Reg, Reg),           // Compare R1 to R2, setting the condition flag to R1 < R2
    Lte(Reg, Reg),          // Compare R1 to R2, setting the condition flag to R1 <= R2
    Gt(Reg, Reg),           // Compare R1 to R2, setting the condition flag to R1 > R2
    Gte(Reg, Reg),          // Compare R1 to R2, setting the condition flag to R1 >= R2
    Fn(String),             // Define a function denoted by string
    Call(String),           // Call the function denoted by string
    Retfn,                  // Return from a function back to its caller
}

There's currently no way to load immediates onto the stack, so a load to the stack first requires loading an immediate to a register and then moving that to the stack:

Take for example, this program:

let x = [1, 2, 3];
print(x);

That would be converted into this code: (note loading each immediate into a register before moving that register into the stack):

PutReg(1, R0)
CopyRS(R0, 0)
PutReg(2, R0)
CopyRS(R0, 1)
PutReg(3, R0)
CopyRS(R0, 2)
CopySR(0, R0)
PrintReg(R0)
CopySR(1, R0)
PrintReg(R0)
CopySR(2, R0)
PrintReg(R0)
PutReg(0, R0)
Ret

Encoding and Decoding

Each instruction is encoded into bytes:

Take these instructions:

PutReg(20, R0),
PutReg(20, R1),
Eq(R0, R1),
JumpFalse(3),
PutReg(0, R0),
PrintReg(R0),
Jump(2),
PutReg(1, R0),
PrintReg(R0),
PutReg(0, R0),
Ret

That would be encoded into these bytes:

PutReg(20, R0): [0x01,0x14,0x00,0x00]
PutReg(20, R1): [0x01,0x14,0x00,0x01]
Eq(R0, R1): [0x13,0x00,0x01]
JumpFalse(3): [0x12,0x03,0x00]
PutReg(0, R0): [0x01,0x00,0x00,0x00]
PrintReg(R0): [0x09,0x00]
Jump(2): [0x10,0x02,0x00]
PutReg(1, R0): [0x01,0x01,0x00,0x00]
PrintReg(R0): [0x09,0x00]
PutReg(0, R0): [0x01,0x00,0x00,0x00]
Ret: [0x00]

Each instruction gets a unique byte: Ret, for example, is given the value 0x0, and PutReg is given 0x01. This is the first byte in every encoded instruction. Each instruction then either takes zero, one, or two extra arguments, where the arguments can be one or two bytes long, and be signed or unsigned.

For example, PrintReg, instruction 0x09 takes one argument, the register to print. Since there are only 16 registers, this is a one byte instruction. In the example, this prints R0, which in bytes is 0x00.

Another instruction, Jump, 0x10, will take one argument, but it takes an offset, which is signed and 2-bytes. In this example, it is denoted by the bytes 0x02, 0x00, which will be converted to 2 when run by the VM.

These instructions can thus be serialized in a compact form on disc and turned into instructions, which can then be run by the VM.

Testing

Currently, there are some property tests using quickcheck to generate arbitrary programs and then confirming that those instructions, when encoded to disk and decoded, still return the same program.

About

A toy virtual machine that can run its own assembly and compile to binary

License:MIT License


Languages

Language:Rust 98.7%Language:Assembly 1.3%