GEM-benchmark / NL-Augmenter

It seems Spacy's tokenizer behaves differently when I run pytest -s --t=emojify and pytest -s --t=light --f=light.

For example, I added the following snippet in my generate() function:

print([str(t) for t in self.nlp(sentence)])

With input sentence "Apple is looking at buying U.K. startup for $132 billion."

pytest -s --t=emojify gives:

['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$', '132', 'billion', '.']

However, pytest -s --t=light --f=light gives:

['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$1', '32', 'billion.']

I use the fowling code to load spacy:

import spacy
from initialize import spacy_nlp
self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")

It looks very strange. Am I overlooking something?

Is it possible some other tasks overwrite the tokenizer of the global spacy_nlp?

For example,

NL-Augmenter/transformations/negate_strengthen/transformation.py

Lines 56 to 61 in 1275179

    
           # initialise 
        
           self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm") 
        
           self.nlp.tokenizer = Tokenizer( 
        
               self.nlp.vocab,  
        
               prefix_search=re.compile('''^\\$[a-zA-Z0-9]''').search 
        
               )

But this task (negate...) would have run after my emojify task based on the alphabetical order 🤔. I am not sure if using a global spacy_nlp might be a concern to test all cases sequentially.

Hi @xiaohk: What you're suspecting might be right. Could you try to reset the tokenizer in your code to its Defaults and see if the pytest goes through in both the cases?

from spacy.util import compile_prefix_regex, compile_suffix_regex, compile_infix_regex
from spacy.tokenizer import Tokenizer

nlp = spacy_nlp if spacy_nlp else spacy.load('en_core_web_sm')
rules = nlp.Defaults.tokenizer_exceptions
infix_re = compile_infix_regex(nlp.Defaults.infixes)
prefix_re = compile_prefix_regex(nlp.Defaults.prefixes)
suffix_re = compile_suffix_regex(nlp.Defaults.suffixes)

nlp.tokenizer = Tokenizer(
        nlp.vocab,
        rules = rules,
        prefix_search=prefix_re.search,
        suffix_search=suffix_re.search,
        infix_finditer=infix_re.finditer,
    )

Thanks @AbinayaM02! Yeah, defaulting the tokenizer solves the problem!

However, it only works if I reinitialize the tokenizer in my generate() method. It does not work if I do it in the __init__() of my transformation class. I guess pytest goes two rounds: (1) test constructing transformation objects, (2) then test the generate() function?

I am not sure reinitializing the tokenizer in every generate() call is the best solution tho... It's just a workaround for pytest 😄

Why can't we default the tokenizer in the initialize_models method where the spacy is loaded? By default, everyone gets a default tokenizer and each transformation can have it's own tokenizer setting in their object initialization. Would that make sense?

You mean the initialize_models() function in initialize.py?

NL-Augmenter/initialize.py

Lines 8 to 11 in 53227f8

    
           def initialize_models(): 
        
               global spacy_nlp 
        
               # load spacy 
        
               spacy_nlp = spacy.load("en_core_web_sm")

I did not read the test script carefully, but I suspect initialize_models() is only called once when testing all transformations. Some transformation might overwrite the global spacy's tokenizer.

Yeah, I meant the method in intialize.py. You're right, the test script calls the initialize_model() only once.

How about adding another method, say reset_tokenizer in the initialize.py, and let the transformations call that before it sets its own tokenizer (applicable only to transformations using spacy)? This way even the test function can initialize the model once and the transformations can reset the tokenizer based on its need.

[Edit]
We can add a default_spacy_tokenizer() method in initialize.py and in test_main.py, it can be called after each transformation is executed (inside the for loop). That way, every new run of the transformation in the test script will have a default tokenizer to start with.

We can add a default_spacy_tokenizer() method in initialize.py and in test_main.py, it can be called after each transformation is executed (inside the for loop). That way, every new run of the transformation in the test script will have a default tokenizer to start with.

Yes! I think it should work.

To keep track on how it's going, applying the reinitialization of the spacy tokenizer in the generate function of both #149 and #159 makes it so that the pytest test doesn't fail in my transformations. However, both of them are now failing when they get to another transformation (close_homophones_swap).

Hi @sotwi: The close_homophones_swap is failing because it also uses spacy :( Let me try to find all the transformations that uses spacy. We may need to use the same fix at many places in an optimal way. Will get back in sometime with an optimal solution.

To keep track on how it's going, applying the reinitialization of the spacy tokenizer in the generate function of both #149 and #159 makes it so that the pytest test doesn't fail in my transformations. However, both of them are now failing when they get to another transformation (close_homophones_swap).

Hi @sotwi : I have made the fix for the issue you're facing. Please pull the latest code and check if it's resolved.

Hello @AbinayaM02 , both #149 and #159 pass all the checks now. Thank you!

I'm sorry if I'm writing in the wrong place, but for me, SuspectingParaphraser still fails test cases:
https://github.com/GEM-benchmark/NL-Augmenter/pull/203/checks?check_run_id=3632344914

I'm sorry if I'm writing in the wrong place, but for me, SuspectingParaphraser still fails test cases:
https://github.com/GEM-benchmark/NL-Augmenter/pull/203/checks?check_run_id=3632344914

Hi @Erlemar : I had actually changed the suspecting_paraphraser's test.json in my fix, so it should ideally pass the test. Please pull the latest code and check if the issue still persists.

Hello @AbinayaM02 I see that the first test fails, it wasn't changed:

E           AssertionError: Mis-match in expected and predicted output for SuspectingParaphraser transformation: 
E              Expected Output: Sally finally returned the french book to Chris, didn't she? 
E              Predicted Output: Sally finally returned the french book to Chris, didn't it?
E           assert "Sally finall...s, didn't it?" == "Sally finall..., didn't she?"
E             Skipping 46 identical leading characters in diff, use -v to show
E             - s, didn't she?
E             ?           ^^^
E             + s, didn't it?
E             ?           ^^

Hello @AbinayaM02 I see that the first test fails, it wasn't changed:

E           AssertionError: Mis-match in expected and predicted output for SuspectingParaphraser transformation: 
E              Expected Output: Sally finally returned the french book to Chris, didn't she? 
E              Predicted Output: Sally finally returned the french book to Chris, didn't it?
E           assert "Sally finall...s, didn't it?" == "Sally finall..., didn't she?"
E             Skipping 46 identical leading characters in diff, use -v to show
E             - s, didn't she?
E             ?           ^^^
E             + s, didn't it?
E             ?           ^^

I pulled your branch and tried running the test for suspecting_paraphraser only, it passes but the test run for all light testcases fails for suspecting_paraphraser. But when I run both the tests in my cloned repository, it is successful! Could you try "fetch & merge" your main branch [Fetch upstream on Github UI] and then pull the main branch into your shuffle_within_segements branch and try the pytest again?

Thank you for the answer! I have updated my branch again (fetch & merge), and the tests work locally.
I can't run the workflow in my PR, though, as this is my first contribution.

Thank you for the answer! I have updated my branch again (fetch & merge), and the tests work locally.
I can't run the workflow in my PR, though, as this is my first contribution.

I have started the workflow. Let's see if it goes through now.

The checks have passed! Thanks!

Closing the issue since it's fixed.

	# initialise
	self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")
	self.nlp.tokenizer = Tokenizer(
	self.nlp.vocab,
	prefix_search=re.compile('''^\\$[a-zA-Z0-9]''').search
	)

	def initialize_models():
	global spacy_nlp
	# load spacy
	spacy_nlp = spacy.load("en_core_web_sm")

Spacy behaves differently when testing one case vs testing all cases