When I use multiprocessing pool.map, word._.negex throws an error as Can't retrieve unregistered extension attribute 'negex'

Question

When I use multiprocessing pool.map, word._.negex throws an error as Can't retrieve unregistered extension attribute 'negex'

K-7 opened this issue 4 years ago · comments

Describe the bug
I am processing the medical texts written by nurses and doctors using spacy English() model and Negex to find the appropriate negations. The code works fine when i run it in single thread but when I use Multiprocessing to process texts simultaneously it raises an Exception as given below

File "../code/process_notes.py", line 154, in multiprocessing_finding_negation pool_results = pool.map(self.process, split_dfs) File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value AttributeError: ("[E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call theset_extension method?", 'occurred at index 2')

To Reproduce

def load_spacy_model(self):
	nlp = English()
	nlp.add_pipe(self.nlp.create_pipe('sentencizer'))
	ruler = EntityRuler(self.nlp)
	# adding labels and patterns in the entity ruler
	ruler.add_patterns(self.progressnotes_constant_objects.return_pattern_label_list(self.client))
	# adding entityruler in spacy pipeline
	nlp.add_pipe(ruler)
	preceding_negations = en_clinical['preceding_negations']
	following_negations = en_clinical['following_negations']
	# adding custom preceding negations with the default preceding negations.
	preceding_negations += self.progressnotes_constant_objects.return_custom_preceding_negations()
	# adding custom following negations with the default following negations.
	following_negations += self.progressnotes_constant_objects.return_custom_following_negations()
	# negation words to see if a noun chunk has negation or not.
	negation_words = self.progressnotes_constant_objects.return_negation_words()
	negex = Negex(nlp, language='en_clinical', chunk_prefix=negation_words,
	              preceding_negations=preceding_negations,
	              following_negations=following_negations)
	# adding negex in the spacy pipeline
	# input----->|entityruler|----->|negex|---->entities
	nlp.add_pipe(negex, last=True)
      
def process(self, split_dfs):
    split_dfs = split_dfs
    # this function is run under multiprocessing.
    split_dfs = self.split_dfs.apply(self.lambda_func, axis=1)

def lambda_func(self, row):
    """
    This is a lambda function which runs inside a multiprocessing pool.
    It read single row of the dataframe.
    Applies basic cleanup using replace dict.
    Finds positive,their respective tart-end index and negative words.
    positive words are the words mentioned in the keywords patterns.
    """
    row['clean_note'] = row['notetext']
    
    # passing the sentence from NLP pipeline.
    doc = self.nlp(row['clean_note'])
    neg_list = list()
    pos_list, pos_index_list = list(), list()
    for word in doc.ents:
        # segregating positive and negative words.
        if not word._.negex:
            # populating positive and respective positive index list.
            pos_list.append(word.text)
            pos_index_list.append((word.start_char, word.end_char))
        else:
            neg_list.append(word.text)

p = os.cpu_count() - 1
pool = mp.Pool(processes=p)
split_dfs = np.array_split(notes_df, 25)  # notes_df is a panda dataframe
pool_results = pool.map(self.process, split_dfs)
pool.close()
pool.join()

Expected behavior
pos_list & neg_list needs to get populated

Screenshots

Desktop (please complete the following information):

OS: MacOS Catalina, 8GB RAM , 1.6 Ghz dual-core

Jeno Pizarro · Answer 1 · Wed Sep 30 2020 01:55:38 GMT+0800 (China Standard Time)

Hey - I haven't run it using pool multiprocessing myself, have you come across this post? https://stackoverflow.com/questions/58294624/multiprocessing-with-textacy-or-spacy/58317741#58317741

My guess is that the negex extension isn't properly showing up in the different memory spaces.

Jeno Pizarro · Answer 2 · Wed Dec 16 2020 03:09:44 GMT+0800 (China Standard Time)

Closing due to inactivity

Jeno Pizarro · Answer 3 · Thu Dec 17 2020 03:04:29 GMT+0800 (China Standard Time)

http://sujitpal.blogspot.com/2020/10/entities-from-cord-19-using-dask.html

may be of interest to you