osalpekar / auxiliary-task-for-text-to-sql

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Zero-shot Text-to-SQL Learning with Auxiliary Task

Code for CS224N Project that builds on top of Zero-shot Text-to-SQL Learning with Auxiliary Task

Usage

Conda Environments

Please use Python 3.6 and Pytorch 1.3. Other Python dependency is in requirement.txt. Install Python dependency with:

	pip install -r requirements.txt

Download Data

Data can be found from google drive. Please download them and extract them into root path.

Preprocessing is required. See run_tapas.sh for parsing the data in different format than the base dual-task code. embeddings.py can be used to generate embeddings (will require transformer library), and parse_embeddings.py will parse generated embeddings and create save them as a torch tensor to disk to be consumed by the model as an embedding.

Generate our respilted WikiSQL data

	cd data_model/wikisql
	python make_zs.py
	python make_fs.py

Run the model on original WikiSQL and our split

	cd zero-shot-text-to-SQL
	./run.sh

Contributions

The following code was written from scratch (in the zero-shot-text-to-SQL directory):

  • similarity_study.py
  • Similarity_Analysis_And_Plots.ipynb
  • preprocess_bert.py
  • create_similarity_index.py
  • embeddings.py
  • parse_embeddings.py

The following code was changed significantly to implement my model architecture:

  • table/Models.py (Implements embeddings, encoders, decoders, layers, etc.)
  • table/ModelConstructor.py (connects model components)
  • table/Loss.py
  • table/IO.py
  • table/Trainer.py
  • train.py
  • evaluate.py

Various other smaller changes were required throughout the codebase.

Acknowledgement

About


Languages

Language:Python 64.6%Language:Jupyter Notebook 34.7%Language:Shell 0.7%