Simple Querying over (CSV) Tables in Haskell
hasqltable FILE [-d|--delimiter ARG]
((-s|--fileSpec ARG) | (-r|--rawSpec ARG))
Available options:
-d,--delimiter
- The column delimiter-s,--fileSpec
- A file to be used to load the schema specifiaction-r,--rawSpec
- A string of the b/i/d/s characters representing bool/int/double/string column value in the order as they appear (e.g. "siddd" means the first colum is string, second is int, etc.)
The program runs in a REPL. To quit, type quit
or exit
as the command.
The REPL is iterative - your querries are applied to the last valid result or the initial table. If you wish to reset the iteration, i.e. start over, enter reload
or reset
as a command.
The following query structure is suported, clauses are case insensitive
SELECT columnList [optional ordered clauses]
columnList
- a list of column names separated by a,
, the names must be either plaintext or delimited by ` (to differentiate between the colum named `where` and thewhere
clause)- example:
SELECT name, age, `limit`
SELECT name
Other clauses are optional and independent of each other apart from their order. The query may use any subset of the following clauses: where, groupby, orderby, limit
WHERE simplExpr
Filters out rows for which the simplExpr
evaluates false.
simplExpr
is either:boolean literal
l bop l
l aop l bop l aop l
which translates into(l aop l) bop (l aop l)
l bop l aop l
(and vice versa) which translates intol bop (l aop l)
l
is anumeric literal
,string literal
,column reference
orboolean literal
numeric literal
is an integer or a double (regular notation with.
as the decimal separator, e.g.3.14159
)boolean literal
istrue
orfalse
(case insensitive)string literal
is a string enclosed in"
column reference
is a column name enclosed in ` (e.g. `myColumn`)
aop
is an arithmetic operator+ - / *
bop
is a boolean-result operator> < <= >= == !=, &, |, ^
(^
for XOR) andn
is a numeric literal (integer/double) or a column referencecolumn reference
is the column name enclosed in ` (e.g. `name`)
- type errors arising from incorrect usage of operators are handled during query execution
- example using the
where
clause:
SELECT name, surname WHERE `scorePerDay` * 5 >= `scoreGoal` / 2
SELECT age WHERE `name` != "BohdanQQ"
GROUPBY columnList
Groups rows based on equality of the columns specified in columnList, filters out only the first row in each group.
SELECT department, building GROUPBY department, building
ORDERBY [asc|desc] columnList
Orders based on the order specified. Columns closer to the beginning of the column list are prioritized over the following colums.
asc
ordesc
strings are case insensitive- see column list description of the SELECT clause
SELECT name, surname ORDERBY asc surname, name
LIMIT n
Limits the output to n
rows.
An example of a complex query:
select name, surname, department where `age` / `scorePerDay` <= 12 groupby building orderby asc name, surname limit 2
To save the current table (the last valid table printed out), type save
, hit enter and then input a name which will be used for both the table file and the table schema file (schema file will have .schema
appended to the name) and hit enter once again.
split, optparse-applicative
- dependencies for string splitting, argument parsingEither
is used to pass transformed tables or (string) errors
Main.hs
creates the functional core - the REPL and argument parsing
Config.hs
and Parser.hs
define the command line arugments format and the query parser respectively.
Types.hs
defines all used data structures, including the (core) Cell
, Expr
and SubQuery
types as well as some basic functions (operator functions) and typeclasses with their implementations (runtime type compatibility check using newly defined Compat
, Num
instance for numeric cells, ...)
I have used our lab/hw parser for parsing queries. Parsed query is a simple list of clauses which is then reordered for execution (SELECT for example executes nearly at the end of the entire query ).
The WHERE clause expression does NOT support AST parsing (see user documentation for allowed expression types), but I think
that it is possible to implement this using the existing parser and Expr
data structure.
The table is represented using the Cell
type which is a sort of "discriminated union". This type is used:
a) to hold the data inside a single table cell
b) along with a column name, to specify column's type (as a part of the table schema)
This table representation provides sufficient information to perform the transormations required by the clauses. Additionally, runtime "type checks" are performed when evaluating the WHERE clause expression so that errors do not cause a crash.
Simple testing framework is implemented to test some functionality.
To run tests:
cabal test
To generate documentation:
cabal haddock --haddock-executables
- enable full tree syntax for the where clause expression
- parse-time type checking (requires a parser with a state + schema available to it)
- schema column type does not need to be specified by a
Cell
(which also holds value - unused in the schema) - unary operator support (particularly boolean not, which can be implemented using
xor True
) [DONE]
implement saving of the intermediate results into a file (commandsave fileName
)[DONE]
implement reloading the original table (reload
command)- add an option to specify schema by file (consistent with the saving method)