MakotoE / cs360-notes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Operating systems

  • Virtualization
    • Conversion of a physical resource into a virtual form
    • The OS is a resource manager
    • Virtualization of CPU: Processes allow many programs to run, despite a limited number of CPUs
    • Virtualization of memory: Each process is assigned a virtual address space that maps to a physical space
  • Concurrency
    • Multi-threadding allows many threads to run at the same time
  • Persistence
    • System memory in DRAM is volatile and is erased when power is lost
    • Data in hard drives or SSDs are persistent
  • System calls
    • The difference between routine procedure calls and system calls is that system calls change privilege levels
    • The user mode is restricted, while kernel mode grants more privileges
    • A trap transfers control to a trap handler which raises the privilege level
    • Return-from-trap reverts back to user mode

Processes

  • A process is a running program
  • The OS virtualizes the CPU so that CPU time can be shared among running processes
  • Policy: A ruleset for making decisions
  • Scheduling policy: A policy for deciding which process gets to run on the CPU in a given moment
  • Process API
    • Create
      • Allocates resources and starts a process
      • Processes are loaded in a lazy fasion, meaning programs are loaded when needed
      • Memory space is allocated for the process
      • 3 file descriptors are opened for a process: standard in, out, error
    • Destroy: Stops and deallocates a process
    • Wait: Wait for a process to stop
    • Miscellaneous control: Other controls, such as suspending a process
    • Status: How long a process has been running, what state it is in, etc.
  • Process states
    • Running: Executing instructions
    • Ready: A process is ready to run but it is on hold
    • Blocked: The process is waiting for an event, such as file I/O
  • The list of processes are contained in a process list
    • Each entry contains a process control block which has the context for each process

UNIX process API

  • fork()
    • Runs a copy of the current program
    • All registers and memory are copied to clone program state (or a virtual space is set aside in the case of copy-on-write)
    • Starts from where fork() was called
    • In the parent side, fork() returns the child pid
    • In the child, fork() returns 0
  • wait()
    • Blocks until forked process of given pid ends
    • If NULL is passed, it waits for all child processes to end
  • exec()
    • Runs the given program to replace the current one
    • The memory is replaced for that of the new program
    • PID does not change
    • The technique is known as overlaying
    • Fork-exec: When a forked process starts a different program
  • kill(): Used to stop a process
  • signal(): Used to catch a signal

Limited direct execution

  • Time sharing is when CPU usage is shared between processes by time slices
  • Performing restricted operations
    • At boot time, a trap table is created to map exceptions to trap handlers
    • When a program wants to perform controlled operations such as I/O, a system call is made
    • A trap instruction jumps to the kernel instructions and switches from user mode to kernel mode
    • All register state is saved to a per-process kernel stack
    • When the system call is finished, a return-from-trap instruction is called
    • Register state is restored
  • The big problem in direct execution of programs is how to share the CPU and how the kernel can take back control
    • If the program is running without giving back control, how is the kernel able to regain control?
    • A program passes control back to the kernel in system calls
    • A timer interrupt periodically calls kernel code so that it can run a scheduler if needed
  • In a context switch, the hardware saves the registers to the kernel stack. The kernel switches the registers in the stack to those of the new process. It returns from trap, and the hardware restores the switched registers.

Scheduling

  • A scheduling metric can be used to quantify a scheduling policy
  • Turnaround time: Time when the job arrived in the system - time when the job was completed
  • Fairness
    • How quickly a job can be started
    • A round-robin policy is fair
  • Executing a job that takes the longest holds up all other jobs and decreases turnaround times
  • Preemptive scheduler
    • A system that can pause processes to switch to other processes
    • It makes sense to preempt a long-running job to finish shorter jobs
  • Response time
    • The period of time between each time-slice of a
    • More job slices in a time period means the process feels responsive to the user
  • Overlap: When a process is blocked, other processes can be run
  • Multi-level feedback queue
    • Optimizes turnaround time while making a system responsive
    • The rules of a MLFQ
      1. If Priority(A) > Priority(B), A runs
      2. If Priority(A) = Priority(B), A & B run in a round-robin fashion
      3. When a job enters the system, it is placed at the highest priority
      • Ensures that short jobs are completed quickly
      1. Once a job uses up a time allotment, its priority is reduced
      • Allows newer jobs to get a higher priority
      1. After some time period, move all jobs to the highest priority
      • Avoids starving the lowest priority jobs of CPU time, so that they can at least make a litle progress even if there are higher priority jobs

Memory

  • Processes need isolation of memory through virtual address spaces

Heap memory API

  • malloc()
    • Allocates memory on heap
    • Parameter is number of bytes to allocate
    • Returns a void* which must be casted
    • malloc(strlen(s) + 1)) allocates memory that can store a copy of s
  • free() deallocates memory
  • Common errors: Forgetting to allocate, not allocating enough memory, forgetting to initialize memory, forgetting to free memory, dangling pointer, double free, trying to free a pointer that is not heap memory
  • malloc() is not a system call. It is a library call which uses brk to change the location of the end of the heap
  • mmap() maps a file to memory, and can also create an anonymous mapping which is initialized to zero
  • calloc() allocates and zeros memory
  • realloc() increases the size of a block of memory, and gets moved to a new address if necessary

Segmentation

  • Segmentation: The separation of the physical address space into code, heap, and stack segments within the processor space. All code of all processes are grouped together, all stacks are grouped together, and all heap are together.
  • x86-64 does not have segmentation, and uses memory paging for memory protection instead
  • Reduces the amount of unusable fragments of memory inside each processor allocation
  • Enables memory protection
    • An address starts with 1 or 2 bits of segment information (code, stack, or heap)
    • By adding the offset address to the starting address and comparing it to the end address, out-of-bounds addresses can be detected
    • Protection bits indicate whether a program can read, execute code, and/or write data in specific memory ranges

Free-Space Management

  • External fragmentation: Having so much unused and unusable small bits of free space that there is little usable free space left
  • Internal fragmentation: Lots of unused space inside of allocated segments
  • Free list
    • A list that keeps track of free spaces in memory
    • Example: head -> addr: 0, len: 10 -> addr: 20, len 10 -> NULL
    • When a chunk of memory is freed, the surrounding free space must be coalesced so that the free space nodes can be combined
  • Allocators track chunk sizes in a header block so that the correct amount of bytes is freed when free() is called
  • Allocation strategies
    • Best fit
      • Allocate in the smallest possible chunk of free space that the requested size can fit
      • Must search the entire free list
    • Worst fit
      • Allocate in the largest chunk
    • First fit
      • Allocates in the first chunk that is big enough
      • Fastest way to find a usable chunk
      • May pollute the beginning of the free list with small objects
    • Next fit
      • Stores a pointer to the last allocated free space
      • Allocates in the next free space and moves the pointer
      • Spreads allocated chunks rather than polluting the beginning of the list
    • Segregated lists
      • Try to store chunks of similar sizes in one location in the address space
      • Avoids fragmentation because the same chunks fit perfectly in this location
    • Buddy allocator
      • Divides the address space into a binary tree
      • When a chunk is freed, the "buddy" of the binary node is combined if it is free
      • Faster in coalescing free space

Paging

  • Paging
    • Memory is divided into fixed-size pieces
    • Another approach to space management
    • One page is a unit of memory
    • Does not suffer from external fragmentation because of the fixed-size pages
  • A fixed number of prefix bits of the virtual address contains the virtual page number
  • Page table
    • A per-process data structure
    • Stores address translations between virtual pages and addresses for the corresponding pages in physical memory
    • Each entry contains the physical frame number, protection bits (read/write/execute), and a present bit which indicates if the memory is stored in RAM or on disk

Translation-lookaside buffer

  • The translation-lookaside buffer is a part of the MMU that stores commonly used page table entries
  • Address space identifier (ASID)
    • Since page tables are per-process, the TLB needs a way to translate addresses based on the process
    • An ASID is like a PID that is stored with page frame numbers in the TLB
    • It indicates which process should access which entry

Hybrid approach to paging

  • Page tables can be too big with a simple implementation
  • Bigger pages allow smaller page tables at the cost of more internal fragmentation
  • One way to save space is to use the concept of segmentation to store the bounds for each segment
  • Another is a multi-level page table
    • The page directory stores a prefix and the page table identifier with a valid bit
    • Each page table fits in a page
    • Only allocates enough page tables for used space
    • There is a trade off of requiring two reads
    • Example virtual address:
      8 7 6 5 4 3 2 1 0
      |-a--|--b--|-c--|
      a = Page directory index
      b = Page table index
      c = Offset
    

Page swapping

  • Swapping pages to disk transparently enables the illusion of having a large virtual address space
  • The page table entry contains a present bit to indicate if data is stored on RAM or in disk
  • A page fault occurs when a read access needs to retrieve data from disk
  • A page-fault handler lets the OS know that a page fault occurred and requests that the page be loaded to RAM
  • Page-replacement policy
    • A policy for choosing which old pages in RAM to replace with requested pages
    • A swap daemon is activated when RAM usage is at high watermark and evicts pages until it reaches low watermark

Threads

  • Threads enable parallelization and non-blocking programs
  • Threads can context switch like processes
    • Threads store context in thread control blocks
  • Each thread has their own stack, but the heap is shared between all threads of a process

Thread API

  • Thread creation and joining
    pthread_create();
    pthread_join(); // Always check the errors
  • Locks
    pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
    pthread_mutex_lock(&lock);
    pthread_mutex_unlock(&lock);

Condition variable

pthread_mutex_t mutex;
pthread_cond_t condition;

// In thread 0
pthread_mutex_lock(&mutex);
while (!condition) {
  pthread_cond_wait(&condition, &mutex); // Put in a while loop in case of of spurious wakeup
}

pthread_mutex_unlock(&mutex);

// In thread 1
pthread_mutex_lock(&mutex);

pthread_mutex_unlock(&mutex);
pthread_cond_signal(&condition); //wake up thread 1
  • Spurious wakeup can occur when thread A is signalled to wake up, but a thread B starts before the thread A wakes up
  • Producer-consumer problem
    • A shared buffer between a producer and consumer needs some synchronization mechanism to ensure the buffer is not accessed at the same time, and so the buffer doesn't become full or the consumer doesn't access an empty buffer
    • The solution is to use 3 semaphores:
      • S: This semaphore ensures only one consumer or producer can access the buffer
      • E: Indicates that the buffer is empty
      • F: Indicates that the buffer is full
    • Pseudocode for producer and consumer
      void producer() {
        while (true) {
          produce()
          wait(E)
          wait(S)
          append()
          signal(S)
          signal(F)
        }
      }
      void consumer() {
        while (true) {
          wait(F)
          wait(S)
          take()
          signal(S)
          signal(E)
          use()
        }
      }

Semaphores

sem_t m;
sem_init(&m, 0, 0);
sem_wait(&m); // Waits for value to be 1

sem_post(&m); // Increments value
  • Condition variables have a boolean state of waiting or running, and is typically used for mutually exclusive access to a variable.
  • Semaphores combines a mutex and a counter, and is used for shared access to a resource.

I/O devices

  • There are two levels of I/O connections
    • I/O bus such as PCI: fast
    • Peripheral bus such as USB: slow
  • Polling for data may waste CPU cycles
  • An interrupt can be used to return from an operation
    • This allows the CPU to do other tasks during I/O
    • But it also means a context switch is necessary
  • A hybrid approach of polling and interrupt may take advantage of both
  • To facilitate data transfer between a device and memory, a direct memory access (DMA) can do the copying
  • Device interaction
    • I/O instructions may be used to specify data to be transfered between the device and a CPU register
    • Memory-mapped I/O transfers data using shared memory spaces
  • File system stack
    • File system API
    • Generic block layer
    • Device driver

Hard disk drives

  • Disks are made of platters, and each platter has circular tracks
  • Seek time includes rotational delay and settling time for the arm
  • Buffers in the hard disk allow read and write caching
  • Disk scheduling
    • Shortest seek time first
      • Orders requests by the nearest track
      • May cause starvation to areas far from the current position
    • SCAN (elevator)
      • Services requests in order of data locations
      • Resembles an elevator that continuously moves up and down
    • Shortest positioning time first
      • Orders requests by the closest position

File and directories

  • inode: A data structure that holds file attributes and disk block locations
  • Each process maintains an array of open files in the open file table
  • Sharing a file descriptor
    • On fork, a file descriptor is shared
    • File offsets are the same between the parent and child
    • dup() creates a new file descriptor that refers to the same file as the given descriptor
  • Renames are atomic
  • A file metadata contains a lot of info about a file such as: inode number, protection mode, hard link count, owner user, file size, block size, time of last access
  • Hard link
    • Created with ln
    • A hard link is a directory entry that points to the same inode as another entry
    • Hard link count is stored in metadata so that the block can be erased when no hard link exists that points to it
    • May only point to files
  • Symbolic link/soft link
    • A file that references another file
    • Can point to files or directories
    • A soft link does not guarantee that the referenced file exists
  • File system mounting
    • mkfs initializes an empty file system on a disk
    • mount is used to mount the file system to a directory tree

File locality

  • The old unix file system contained a super block, inodes region, and data
    • The super block contains information about the file system
    • Performance was bad because the disk had to move back and forth between the inodes and data
  • Fast File System
    • Divides the disk into cylinder groups
      • FFS tries to put files in the same directory in the same cylinder group to decrease seek times
    • Each cylinder group contains a redundant super block, an inode bitmap, a data bitmap, and data
      • The bitmaps indicate which blocks are allocated

Flash-based SSDs

  • Single-level cells are faster than multi-level cells but more expensive
  • Bits are grouped into pages which are grouped into blocks
  • Flash operations
    • Read: A page is read at a time
    • Erase: An entire block is erased
    • Program: A page is written to
  • Example of a block state
    • i = invalid page; E = erased page; V = valid page
    1. iiii: All pages in a block start invalid
    2. EEEE: The block is erased
    3. VEEE: Page 0 is programmed
    4. VVEE: Page 1 is programmed
    5. EEEE: The block is erased
  • Flash cells wear out, so storage algorithms must be optimized to reduce wear and maintain performance
    • Wear leveling: making sure that wear levels are equal throughout all cells
  • Flash translation layer: Converts logical blocks into phyiscal blocks
  • Write amplification: More data is written to storage than actual amount of data to be stored
  • Direct mapping: A direct mapped SSD is bad because this would mean the same blocks will be erased and written to many times, unevenly wearing out a cell
  • Log structured file system
    • Adds every new write to incrementing pages to reduce write amplification
    • The FTL tracks where data is stored for each logical block
    • When an old state is replaced with a new one, the FTL replaces the old mapping with the new mapping
    • Unused blocks are erased through garbage collection

About