TooTouch / SPARTA

Semantic Parsing And Relational Table Aware Model that generates SQL from question written in Korean language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SPARTA (Semantic Parsing And Relational Table Aware)

This is a term project in Unstructured Text Analysis class. We implement the deep learning model for converting Korean language to SQL query.

Team Members

  • Hoonsang Yoon
  • Jaehyuk Heo
  • Jungwoo Choi
  • Jeongseob Kim

Information

Demo

Check about Demo in here.

Video

Text2SQL Result Video

Dataset

tar xvjf data/data.tar.bz2

Korean WikiSQL dataset

unzip data/ko_token.zip
unzip data/ko_token_not_h.zip
unzip data/ko_from_table.zip
unzip data/ko_from_table_not_h.zip

Translation

We translated English question into Korean question in four ways as follows.

Download dataset

No Method Data Name Description
1 Where+Select ko_token Keep where values in label and column used in select clause among the words in English question
2 Where ko_token_not_h Keep header of table among the words in English question
3 Table+Header ko_from_table Keep values and header in table among the words in English question
4 Table ko_from_table_not_h Keep values in table among the words in English question
Method 1 (Where+Select)
Method 2 (Where)
Method 3 (Table+Header)
Method 4 (Table)

Run translation

  1. Create a question dataframe to translate English to Korean.
bash run_translate.sh value
  1. Translate English to Korean by using Google Tanslator (click here!) and copy a text file in ko_data directory such as 'ko_train_question.txt'

  2. Insert Korean question

bash run_translate.sh token

SPARTA Model

We use pretrained multilingual BERT as encoder.

Sub Task

  1. SQLova [ paper | github ]
  2. HydraNet [ paper | github ]

Seq2Seq

  1. BRIDGE(TabularSemanticParsing)[ paper | github ]

Evaluation

  • Logical Form Accuracy
  • Execution Accuracy

Experiments

Model Task Test
Logical Form
Accuracy(%)
Test
Execution
Accuracy(%)
SQLova Subtask 65.8 74.3
HydraNet Subtask 40.4 40.7
Bridge Generation 54.6 62.1

Download Trained Models

Method SQlova Bridge
Where+Select Download -
Where Download -
Table+Header Download -
Table Download -

Presentation

Proposal

Interim Findings

Final

Reference

  • [1] Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.
  • [2] Hwang, W., Yim, J., Park, S., & Seo, M. (2019). A comprehensive exploration on wikisql with table-aware word contextualization. KR2ML Workship at NeurIPS 2019
  • [3] Lyu, Q., Chakrabarti, K., Hathi, S., Kundu, S., Zhang, J., & Chen, Z. (2020). Hybrid ranking network for text-to-sql. arXiv preprint arXiv:2008.04759.
  • [4] Xi Victoria Lin, Richard Socher and Caiming Xiong. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. Findings of EMNLP 2020.

About

Semantic Parsing And Relational Table Aware Model that generates SQL from question written in Korean language


Languages

Language:Jupyter Notebook 54.7%Language:Python 43.4%Language:Shell 1.0%Language:HTML 1.0%Language:Dockerfile 0.0%