kowaalczyk / assembly-programming

Notes from assembly programming course / University of Warsaw 2020-21

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assembly programming

Notes from assembly programming course / University of Warsaw 2020-21:

Useful tricks

Lab 1

  • Use ldd command to check linker error if "File doesn't exist" error occurs

Lab 2

  • Long loops (jumps) can be impossible to work on some assembly types, worth to check reference
  • jrcxz can be used to jump if rcx is zero, very old instruction that makes it more intuitive to write loops
  • use16 and use32 can be used in nasm to operate on smaller numbers than defaults (see manual how to use them first + reference from the specific machine language - x86 or arm)

Lab 3

  • NASM allows you to perform pointer calculations like add rsi, [array + rcx * 4], but note that 4 above is a power of 2 (otherwise pointer calculation doesn't make sense)
  • common mistake: wrong argument order results in arrays passed to int args (and vice-versa)
  • convention: always use integer argument before array (as far as I remember from previous lab), this is supposed to be a common UNIX convention
  • LD failures regarding "cannot relocate .data section ..." / "nonrepresentable section on output":
    • is regarded to having multiple sections (eg. bufor or data and .text) addressed directly
    • modern unix and linux OSs are assuming sections can be moved anywhere (re-ordered, etc.)
    • cannot be recompiled with -fPIC because we're writing asm, not C
    • workaround: keep buffer on the stack: instead of resb 1 in bss + in text mov [bufor], rsi we use push rsi; mov rsi rsp
      • actually this solution didn't work when presented on labs (XD)
      • last year popular workaround was to add -no-pie to CFLAGS (when compiling main program), this solved the problem (at least during the lab)
  • trick: in 64-bit mode we can move stuff above the stack pointer (there is a guarantee that 128bytes above rsp are free to use for any purpose) this can be used the following way: mov [rsp-8], rsi; lea rsi, [rsp-8] (again, instead of using mov [bufor], rsi, to prevent using section .data which is immovable or using malloc which is expensive - the less syscalls the better)
  • when we're moving bytes (chars) around, they are always in a seal of the register, which is the lowest part (in a little-endian systems, so pretty much everywhere)

Lab 4

String instructions:

Block instructions:

  • some assemblers support operating on contiguous blocks of data (spanning multiple memory units) simultaneusly using these instructions
  • they are ususally related to string instructions


  • for all assignments expected tests (written in C) should include all usual edge cases (array start/end, etc.)

Lab 5


Lab 6

Compiling C:

  • to NASM text: gcc -S -masm=intel plik.c
  • to object file: gcc -c -masm=intel plik.c after which objdump -M intel -d plik.o prints binary with asm instructions (usually works better than readelf which was made for this purpose)

Lab 7

Terminal I/o:

  • Always check errno when read (and related functions) return a negative value (to retry if errno == EAGAIN)
  • Never use ioctl
  • Use man termios to check standards and local definitions of terminal sequences (would’ve been useful for SIK / telnet commands), and tcsetattr with tcgetattr to enable / disable some flags

Classic floating point arithmetic (FPU):

  • Instruction names starting with F are dedicated to floating point arithmetic
  • Arguments passed in XMM1...XMM8 registers
  • Result returned in XMM0
  • Arguments need to be moved to a stack, as all floating point instructions operate only on stack arguments:
    • FLD - push to stack
  • Default first argument is usually stack top (and the second one is usually specified for a given instruction, similarly to what MUL does for integers)
  • Sample operations:
    • FCOMI - comparison, stores result on the stack
    • FCOMIP - same, but also pops the result from the stack to xmm0
    • FLDZ - loads 0 to the stack
    • FADD and FSUB have a TO modifier that allows to customise where the output is written (to a register), but it is still better to stick to the convention of holding everything on the stack
    • FSIN, FCOS, FSQRT - there are many useful functions here
    • FILD - load integer
  • Stack needs to be cleared before leaving the function!!!
    • Trick is to use any of the operations to clear the stack (eg. fcomp st0) FPU is almost never used now, it’s only good parts are:
  • Non-standard 80bit floating point instructions (only on Intel X86)
  • Many useful operations (eg. SIN, SQRT mentioned above)

Modern floating point arithmetic (SSE):

  • First, only operations on 128bits (2 doubles or 4 floats)
  • Then (SSE2), adds YMM 256bit registers and packed integer registers
  • SSE3 only adds new operations, and is widely available on Xeon processors

Lab 8

FPU programming (to complete previous lab):

  • DEFAULT REL - when asm cannot determine whether to use absolute or relative address (wrt instruction pointer), it prefers the relative one (see quad_roots example)
  • Values from the stack can be passed as arguments:
    • st0 is the stack top
    • st1 is below it, and so on
  • Useful operations:
    • fchs change sign (multiply by -1)
    • ftst test if != 0

SSE programming (modern tech, will likely be used in assignment):

Lab 9

Assignment 2 Q&A:

  • new column that is passsed to step is temporary (doesn't become a part of the existing matrix for the next step)
  • we should use XMM registers, and if feeling adventurous - YMM or ZMM, but we cannot assume YMM and ZMM exist (so we have to check in the program)
  • we should make sure the solution works in lab 3045 (if YMM or ZMM are there, we actually don't need to worry about checking)
  • minimal input is 3x3, we don't actually need to check size-related edge cases (we can reject such input from user immediately)

Lab 10

Assignment 3 prep:

  • use PPM (text version format), submissions until 29 Jan
  • qemu should already work for emulating arm on students
  • brew install qemu works fine on osx

Running quemu:

  • -M to choose machine
  • -kernel points to linux kernel version
  • -initrd chooses image with machine state to be loaded
  • -hda chooses hard drive image to be loaded
  • -net customizes network options

Practical tips:

  • script runmenet2.sh should work with Bonus: gotowy katalog link / course website
  • use halt command from the root to stop the emulator
  • we use classic arm (language spec version 5), which is 32 bit
  • using conditional instructions in favour of jumps allows the processor to stream upcoming instructions for faster execution - basically, the less jumps the better

Writing arm assembly:

  • keep source in .s files
  • @ starts a line comment
  • labels have to end with a :
  • there are 16 registers: r0..r15, r15 being the instruction pointer
  • returning from a function:
    • (a) mov pc lr set return address (lr) to program counter (r15 or pc)
    • (b) bx lr: exchange program counter with lr, this is usually preferred
  • r0 contains return value
  • r0..r4 contain function arguments
  • assembling a program: as -o first.o first.s
  • most instructions have 3 arguments:
    • first argument is always the destination (unlike x86, we can move values easily)
    • the only exception is str instruction (bc first argument also has to be a register)
  • because first argument has to be a register, there exist instructions like "reverse subtract" (rsb, rsc)
  • unlike x86, we have to use ldr to load data from memory (and str to save)
  • we can always shift right parameter left or right using suffixes behind instruction: https://developer.arm.com/documentation/dui0489/h/arm-and-thumb-instructions/shift-operations
  • each instruction:
    • can set flags (if suffixed with s)
    • can be executed conditionally based on flags (if suggixed with flag name, like eq)
  • using flags:
    • cmp always sets flags
    • while executing instructions that don't set flags, the flags are persisted

Defining memory:

  • by default, we're writing in section .text, to define data use section .data and keep code in section .text
  • use .balign 4 after each variable to keep memory aligned
  • see example program for details (course website)
  • use ldr twice to load defined bytes into memory:
    • first lrd r1, .word var1 to translate var1 into its location above program bytes
    • second ldr r1, [r1] actually loads variable to the register
  • use str from_register to_address to store results back to memory

Lab 11

More ARM stuff, focused on getting the vm running locally. Seems all useful links are on the course website already: here and here.

Tips from lecture:

  • use LDMIA and STMDB for moving multiple registers to/from memory
  • there are 4 ways to use stack with arm asssembly (different instruction sets) - make sure to use the same one as GDB uses on Debian when completing the assignment

Loading constants:

  • all arm instructions are 32bit, so they cannot fit 32bit arguments - this is especially important when dealing with constants and addresses:
    • all 8bit constants are valid (from 0 to 0xff)
    • we can use left and right bit shift suffixes to pass larger arguments (as long as they can be defined by a shift of 8-bit argument)
    • in all other cases large arguments need to be constructed using more instructions
    • practically, we can just write down the constant in .s source file, if it's impossible to define it assembly will throw compilation errors
    • we can also use pseudo-instructions (that don't correspond to a single instruction):
      • ldr r3,=2137 for integer constants
      • vldr.F32 s7,=3.141591 for FPU constants
      • adr r3,end for loading addresses

Lab 12

Exam info:

  • counts as 1/4 of the points (so the same as each lab assignment)
  • need to get at least 1 point, 12 total points should suffice to get a positive grade
  • it will be a 10-question online test


Notes from assembly programming course / University of Warsaw 2020-21


Language:Assembly 48.8%Language:C 33.2%Language:Shell 12.9%Language:Makefile 3.0%Language:Python 2.0%