practical-nlp / practical-nlp-code

Official Repository for Code associated with 'Practical Natural Language Processing' book by O'Reilly Media

Home Page:http://www.practicalnlp.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update for current versions of spaCy, gensim, etc.?

wendlingd opened this issue · comments

Great book! I read it cover to cover AND tried to run nearly all of the code, which I almost never do.

Could the code in this repo for spaCy and gensim be updated to the current versions? As one example, Ch5/01_KPE.ipynb does not run with the current version of spaCy. I am just learning but I assume changes might include, for Ch5/01_KPE.ipynb,

Book version:

!pip install textacy==0.9.1
!pip install spacy==2.2.4

import spacy
import textacy.ke
from textacy import *

print(f'Using textacy {textacy.__version__} and spaCy {spacy.__version__}')

# Worked with 2.2.4:
textacy.ke.textrank(doc, topn=10)

# Worked with 2.2.4:
print("Textrank output: ", [kps for kps, weights in textacy.ke.textrank(doc, normalize="lemma", topn=5)])

What appears to run okay as of this writing, December 2021, using spaCy 3.2.0...

# Lines of Ch5/01_KPE.ipynb revised for spaCy 3.2.0:
!pip install textacy==0.11.0   # or 0.12.0 but I haven't tried that
!pip install spacy==3.2.0

import spacy
import textacy
from textacy import extract
from textacy.extract import keyterms as kt

print(f'Using textacy {textacy.__version__}')
print(f'Using spaCy {spacy.__version__}')

# Works with 3.2.0:
import spacy
import textacy
from textacy import extract
from textacy.extract import keyterms as kt

print(f'Using textacy {textacy.__version__} and spaCy {spacy.__version__}')

# Works with 3.2.0:
kt.textrank(doc, normalize="lemma", topn=10) # I'm not sure the role of normalize

# Works with 3.2.0:
print("Textrank output: ", [kps for kps, weights in extract.keyterms.textrank(doc, normalize="lemma", topn=5)])

Would be great if someone smarter than me could update the book's spaCy- and gensim-related code to run current versions for 2022...

Hi @wendlingd. Currently we've provided a requirements.txt file with the versions of the libraries we support. As long as you use those versions, our code should work.

As for updating the libraries and the code, that is in the works. We'll hopefully have a clear timeline on it in the near future.