sathwikmatsa / ToyDBMS

Simple RAM based DBMS in C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ToyDBMS

Simple RAM based DBMS in C++

Dependencies

  • Bison
  • Flex
  • g++

All the dependencies can be installed via apt-get on Linux.

For Windows, install Bison and Flex from GNUWin32 and setup MinGW for g++. (add them to path)

Usage

Download repo :

git clone https://github.com/sathwikmatsa/ToyDBMS.git
cd ToyDBMS

Linux

to build: run make

to execute queries in input.txt:

$ ./sql input.txt

Windows

to build: run make.bat

to execute queries in input.txt:

> sql.exe input.txt

Features

supported queries:

  • CREATE TABLE
  • SELECT
  • INSERT
  • DELETE
  • WHERE, MAX, IN

compatible with MySQL syntax

Note: this is a single database system

How it Works?

lexer (sql_lexer.l) : tokenizes the contents of input.txt

parser (sql_parser.y) : consumes the tokens from lexer and constructs an AST

interpreter (interpreter.cpp) : evaluates nodes in the AST to realize the output

crud.cpp contains the implementation of functions for dealing with storing/manipulation of data which are used by the interpreter.

Code Walkthrough Example

consider the following query: CREATE TABLE PERSON(ID INT, NAME VARCHAR(15));

lexer produces the following tokens:

[create] [table_t] [id] [(] [id] [int_t] [,] [id] [varchar] [(] [literal] [)] [)] [;]

parser recognizes it by the following rules:

CREATE_TABLE : create table_t id '(' CT_ARGS ')'
CT_ARGS : CT_ARG | CT_ARG ',' CT_ARGS
CT_ARG : ATTR_DEF
ATTR_DEF : id TYPE
TYPE : int_t | varchar '(' literal ')'
  • each non terminal token (in caps) can use a ast_node pointer to store required information like table name, arguments etc.
  • ast_node pointers of non terminals on RHS are stored in childNodes member of ast_node of the corresponding LHS non terminal.

interpreter evaluates the query by calling processQuery.

  • processQuery : identifies NODE type -> calls processCTQ
  • processCTQ :
    • creates table in database by calling create_table_in_database
    • adds attributes to the table by evaluating the childNode (CT_ARGS) by retreiving childNodes of CT_ARGS and processing each node based on it's type. In our case it's ATTR_DEF, so it calls processAttrDef which in turn calls add_attribute with appropriate parameters.

TODO

  • Verify attribute constraints and integrity constraints
  • handle NULL values
  • Alias
  • code refactoring

About

Simple RAM based DBMS in C++


Languages

Language:C++ 55.9%Language:Yacc 37.0%Language:Lex 6.0%Language:Makefile 0.7%Language:Batchfile 0.4%