The goal of this package is to create the lexical data that is used in https://github.com/cltl/frame-annotation-tool.
Python 3.6 was used to create this project. It might work with older versions of Python.
A number of external modules need to be installed, which are listed in requirements.txt. Depending on how you installed Python, you can probably install the requirements using one of following commands:
pip install -r requirements.txt
Please call install.sh to download the relevant resources.
bash install.sh
This will copy the data in doc/lexicon_data_for_frame_annotation_tool into your requested output folder.
from LexicalDataD2TAnnotationTool import create_lexicon_data_annotation_tool
create_lexicon_data_annotation_tool(path_readme='LexicalDataD2TAnnotationTool/doc/lexicon_data_for_frame_annotation_tool/README.md',
path_ud_information='LexicalDataD2TAnnotationTool/doc/lexicon_data_for_frame_annotation_tool/part_of_speech_ud_info.json',
path_mapping_ud_pos_to_fn_pos='LexicalDataD2TAnnotationTool/doc/lexicon_data_for_frame_annotation_tool/ud_pos_to_fn_pos.json',
output_folder='LexicalDataD2TAnnotationTool/test/lexicon_data_for_frame_annotation_tool',
verbose=2)
Once the folder has been created, you can add a file containing information about the frames using:
import LexicalDataD2TAnnotationTool,
from LexicalDataD2TAnnotationTool import add_frame_to_info
from nltk.corpus import framenet as fn
add_frame_to_info(output_folder='LexicalDataD2TAnnotationTool/test/lexicon_data_for_frame_annotation_tool',
fn_en=fn,
premon=LexicalDataD2TAnnotationTool.premon,
verbose=0)
This will add the file frame_to_info.json to the output folder.
The next step is to add language-specific lexicon information
import LexicalDataD2TAnnotationTool
from LexicalDataD2TAnnotationTool import add_lu_to_info
from nltk.corpus import framenet as fn
add_lu_to_info(your_fn=fn,
language='en',
premon=LexicalDataD2TAnnotationTool.premon,
namespace='http://rdf.cltl.nl/',
major_version=1,
minor_version=7,
output_folder=out_dir,
verbose=2)
The function will create one file in the output folder at:
lexicons:
LANGUAGE
lu_to_info.json
with the structure
lu_url (created based on the provided namespace)
attributes of the lexical unit
After running step 3, there is a file called lu_to_info.json for each language. You can also generate a mapping from a lemma to a POS to the lu urls by calling the following function.
from LexicalDataD2TAnnotationTool import add_lemma_to_pos_to_lu_urls
add_lemma_to_pos_to_lu_urls(output_folder=out_dir,
language='nl',
verbose=2)
This will add lemma_to_pos_to_lus.json to the lexicon folder of the language.
from nltk.corpus import framenet as fn
import LexicalDataD2TAnnotationTool
from LexicalDataD2TAnnotationTool import initialize_typical_frames
initialize_typical_frames(output_folder='LexicalDataD2TAnnotationTool/test/lexicon_data_for_frame_annotation_tool',
fn_en=fn,
premon=LexicalDataD2TAnnotationTool.premon,
event_type='Q40231',
overwrite=False, # if False, will not be initialized if the file exists
verbose=2)
This will create a file at typicality/typicality_scores/Q40231.json containing a mapping from a PreMOn frame URI -> typicality score. All scores are zero at this stage. The next function allows users to update the typicality scores.
import LexicalDataD2TAnnotationTool
from LexicalDataD2TAnnotationTool import update_typical_frames
frame_to_typicality = {
'http://premon.fbk.eu/resource/fn17-change_of_leadership' : 0.8,
}
update_typical_frames(output_folder='LexicalDataD2TAnnotationTool/test/lexicon_data_for_frame_annotation_tool',
premon=LexicalDataD2TAnnotationTool.premon,
event_type='Q40231',
frame_to_typicality=frame_to_typicality,
frame_format='premon_frame_uri',
verbose=2)
This will update the JSON file with the scores as provided by frame_to_typicality. You can choose between two formats (see frame_format): fn_label | premon_frame_uri.
It is possible to create a lexical lookup per event type.
import LexicalDataD2TAnnotationTool
from LexicalDataD2TAnnotationTool import create_lexical_lookup_per_eventtype
create_lexical_lookup_per_eventtype(event_type='Q40231',
language='nl',
premon=LexicalDataD2TAnnotationTool.premon,
output_folder='LexicalDataD2TAnnotationTool/test/lexicon_data_for_frame_annotation_tool',
overwrite=True,
verbose=2)
Provided that there exist:
- typicality scores for the event type 'Q40231' in the specified language
- lexical information about the language
This function will generate a JSON file at typicality/lexical_lookup/LANGUAGE/EVENT_TYPE.json with the following format:
'ordered_frames'
list of lists
[
[PreMOn URI, dropdown label, typicality_score],
..
]
'lexical_lookup'
LEMMA
POS
[typicality_score,
dropdown_label,
frame_uri,
lu_uri,
lu_name,
lexicon_url]
'all_frames':
[PreMOn frame URI, PreMOn URI, ...]
- Marten Postma (m.c.postma@vu.nl)
This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details