Zero-shot Text-to-SQL Learning with Auxiliary Task

Code for CS224N Project that builds on top of Zero-shot Text-to-SQL Learning with Auxiliary Task

Usage

Conda Environments

Please use Python 3.6 and Pytorch 1.3. Other Python dependency is in requirement.txt. Install Python dependency with:

	pip install -r requirements.txt

Download Data

Data can be found from google drive. Please download them and extract them into root path.

Preprocessing is required. See run_tapas.sh for parsing the data in different format than the base dual-task code. embeddings.py can be used to generate embeddings (will require transformer library), and parse_embeddings.py will parse generated embeddings and create save them as a torch tensor to disk to be consumed by the model as an embedding.

Generate our respilted WikiSQL data

	cd data_model/wikisql
	python make_zs.py
	python make_fs.py

Run the model on original WikiSQL and our split

	cd zero-shot-text-to-SQL
	./run.sh

Contributions

The following code was written from scratch (in the zero-shot-text-to-SQL directory):

similarity_study.py
Similarity_Analysis_And_Plots.ipynb
preprocess_bert.py
create_similarity_index.py
embeddings.py
parse_embeddings.py

The following code was changed significantly to implement my model architecture:

table/Models.py (Implements embeddings, encoders, decoders, layers, etc.)
table/ModelConstructor.py (connects model components)
table/Loss.py
table/IO.py
table/Trainer.py
train.py
evaluate.py

Various other smaller changes were required throughout the codebase.

Acknowledgement

This implementation is based on coarse2fine.
The preprocessing and evaluation code used for WikiSQL is from salesforce/WikiSQL.
We build off of the model from Zero-shot Text-to-SQL Learning with Auxiliary Task
We use the Transformers library by HuggingFace. In particulare, the BERT and TAPAS transformers.

osalpekar / auxiliary-task-for-text-to-sql