A few scripts that help with processing the NLmaps corpus.
Original NLmaps queries are written in a bracket form which can be linearised into individual tokens by taking a pre-order tree traversal. For example, the query
query(north(area(keyval('name','Paris')),nwr(keyval('building','cathedral'))),qtype(latlong))
can be linearised to
query@2 north@2 area@1 keyval@2 name@0 Paris@s nwr@1 keyval@2 building@0 cathedral@s qtype@1 latlong@0
We provide two scripts that handle the conversion, one for each direction. The idea for linearisation was originally presented by Andreas et al., 2013. Thus, some code in this repo closely resembles code from their repo smt-semparse but has been modified for the NLmaps corpus.
To linearise a file of NLmaps queries, use:
python linearise.py -i input_file -o output_file
and to reverse the linearisation, use:
python functionalise.py -i input_file -o output_file
NLmaps can either be evaluated at the query sequence level or based on the answers if queries are executed against an instance of the OpenStreetMap database.
To validate at the sequence level, use:
python seq_eval.py -i suggested_queries_file -g gold_queries_file
and to validate at the answer level, use:
python eval.py -i suggested_answers_file -g gold_answers_file
To validate at the answer level, an instance of the OpenStreetMap database needs to be installed as well as overpass-nlmaps.
Answers can then be generated usin
./query_db -d $DB_DIR -a answer_file -f query_file