GEM-benchmark / NL-Augmenter

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Standardize loading of different spacy models

AbinayaM02 opened this issue · comments

Some of the transformations/filters use different spacy models (en, es, zh, de). The way it is loaded needs to be standardized. The function initialize_models in initialize.py needs to be re-written to accommodate language parameter and the following transformations/filters should be updated.

Once the changes are done, test the modules individually using pytest using the below command,

pytest -s --t=<module_name>

Transformations:

  • grapheme_to_phoneme_transformation
  • city_names_transformation
  • synonym_substitution
  • ocr_perturbation
  • change_person_named_entities
  • antonyms_substitute
  • emojify
  • sentence_reordering
  • transformer_fill
  • auxiliary_negation_removal
  • correct_common_misspellings
  • word_noise
  • yes_no_question
  • subject_object_switch
  • dyslexia_words_swap
  • close_homophones_swap
  • gender_neutral_rewrite
  • tense
  • adjectives_antonyms_switch
  • abbreviation_transformation
  • hashtagify
  • token_replacement
  • mr_value_replacement
  • urban_dict_swap
  • syntactically_diverse_paraphrase
  • yoda_transform
  • disability_transformation
  • replace_numerical_values
  • unit_converter
  • suspecting_paraphraser
  • change_date_format
  • negate_strengthen
  • gender_culture_diverse_name
  • lexical_counterfactual_generator
  • change_two_way_ne
  • gender_culture_diverse_name_two_way
  • replace_abbreviation_and_acronyms
  • replace_financial_amounts
  • slangificator
  • summarization_transformation
  • pinyin
  • gender_neopronouns
  • spanish_gender_swap
  • add_hashtags

Filters:

  • question_filter
  • length
  • polarity
  • yesno_question
  • keywords
  • soundex
  • numeric
  • code_mixing
  • speech_tag
  • quantitative_ques
  • group_inequity
  • token_amount