Standardize loading of different spacy models
AbinayaM02 opened this issue · comments
Abinaya Mahendiran commented
Some of the transformations/filters use different spacy models (en
, es
, zh
, de
). The way it is loaded needs to be standardized. The function initialize_models
in initialize.py
needs to be re-written to accommodate language parameter and the following transformations/filters should be updated.
Once the changes are done, test the modules individually using pytest using the below command,
pytest -s --t=<module_name>
Transformations:
- grapheme_to_phoneme_transformation
- city_names_transformation
- synonym_substitution
- ocr_perturbation
- change_person_named_entities
- antonyms_substitute
- emojify
- sentence_reordering
- transformer_fill
- auxiliary_negation_removal
- correct_common_misspellings
- word_noise
- yes_no_question
- subject_object_switch
- dyslexia_words_swap
- close_homophones_swap
- gender_neutral_rewrite
- tense
- adjectives_antonyms_switch
- abbreviation_transformation
- hashtagify
- token_replacement
- mr_value_replacement
- urban_dict_swap
- syntactically_diverse_paraphrase
- yoda_transform
- disability_transformation
- replace_numerical_values
- unit_converter
- suspecting_paraphraser
- change_date_format
- negate_strengthen
- gender_culture_diverse_name
- lexical_counterfactual_generator
- change_two_way_ne
- gender_culture_diverse_name_two_way
- replace_abbreviation_and_acronyms
- replace_financial_amounts
- slangificator
- summarization_transformation
- pinyin
- gender_neopronouns
- spanish_gender_swap
- add_hashtags
Filters:
- question_filter
- length
- polarity
- yesno_question
- keywords
- soundex
- numeric
- code_mixing
- speech_tag
- quantitative_ques
- group_inequity
- token_amount