Bovojon / disk-backed-queue

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Disk-backed queue

While working with big data, I was curious to explore the applications of a disk backed queue where a specified number of items in the queue can be kept in memory at any given time while all other items are written to disk.

Design Choices

  • Instead of using any of the built-in Python collection classes (e.g. [] (Lists), collections, etc.) I used a linked list with head and tail pointers to implement the disk-backed queue. The linked list also enables O(1) insert and remove operations as the pointers just need to be updated when inserting or removing.
  • Each node in the linked list has a value and a pointer to the next node.
  • The Queue class also contains details on the number of nodes in memory and on disk. If the queue is full, then all other nodes will be added to a different linked list queue built using the DiskQueue class and will be stored in a file.
  • I used Python's pickle module to serialize and deserialize the DiskQueue instances. This is because I needed a way to update and store the structure of the DiskQueue instance on file.
  • I added helper methods so that each method does the smallest unit of work possible. This made it easier to unit test.
  • As I was writing the methods, I thought of the following unit-test cases:
    • Peeking an empty queue
    • Peeking a non-empty queue
    • Enqueueing only one element
    • Enqueueing a string
    • Enqueueing when the queue is full
    • Dequeueing a non-full queue
    • Dequeueing a full queue
    • Dequeueing an empty queue
    • Dequeueing a queue with only one element

Time and Space complexity

  • The time complexity of enqueueing is O(1). When enqueueing, I create a node out of the value (which can be any Python object such as an int, string, dictionary, None, etc.) and check if the queue is full. If it is, then I get the DiskQueue instance from the pickle file if it already exists and add the node to its tail end. If the pickle file is empty then I create a new DiskQueue instance and point its head and tail to the new node. In either case, after I have a new DiskQueue with the new node, I store on the disk.
  • The time complexity of dequeueing is O(1). When dequeueing, I first check if the queue is empty and if it is, I raise an AssertionError. Otherwise, I get the head of the linked list and set the head to its next pointer. If there is only one node in the list then the head's next pointer will be None and I set the tail pointer to None as well. Otherwise, the head just becomes the next node. If the disk is not empty, then I get the head of the disk queue and add it to the tail end of the queue. Finally, I return the value of the dequeued node.
  • The space complexity of the queue implementation is O(n) where n is the number of items in the queue.

About


Languages

Language:Python 100.0%