Implementation in a loop clogs up memory
molokanov50 opened this issue · comments
There is a need for me to determine grammatical case for terms in texts of a big dataset. I found that the increment of memory usage as large as 0.3 to 0.7 MB occurs virtually every call of
forms = predictor.predict(terms)
.
Consider a simple example:
def findCase(termNumber, text): # нахождение падежа термина с указанным номером в тексте
terms = text.split()
forms = predictor.predict(terms)
myTag = forms[termNumber].tag
parts = re.split('\\|', myTag)
for part in parts:
subparts = re.split('=', part)
if len(subparts) < 2:
continue
if subparts[0] == 'Case':
return subparts[1].upper()
return 'UNDEF'
And then, if I have a collection of texts, i can implement:
myDict = {}
for i in range(len(texts)):
case = findCase(0, texts[i])
myDict[i] = case
I have 12500 texts with average length of about 700 symbols each. Running all my dataset required me extra 1.5 GB of memory due to utilizing predictor.predict(terms)
.
Seems like my local variable forms
remains in the memory after completing the method, but really, is your RNNMorphPredictor model maybe self-trained in this scenario? How to free this volume of memory?
Update: there is no obvious difference depending on the length of every single text. I reduced input text length down to 10 tokens, or approx. 80 symbols only. Memory usage is the same - 1.5 GB per 12500 texts. Thereby my question becomes even more actual.