ICLRandD / Blackstone

:black_circle: A spaCy pipeline and model for NLP on unstructured legal text.

Home Page:https://research.iclr.co.uk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Compatibility with spaCy 2.1.9 & 2.2+

phHartl opened this issue · comments

Hi Blackstone team,
at first, I want to thank you for your pre-trained models and your work in automatic legal text analysis. Especially your custom SentenceSegmenter and NER detections works very good with our dataset of legal texts.
Unfortunately this package still depends on spaCy 2.1 or more specifically on spaCy 2.1.8. This version currently has a major memory leak bug (explosion/spaCy#3618), which has been fixed with 2.1.9. I already modified the dependency files of Blackstone, so I'm able to install spaCy 2.1.9 instead of the required 2.1.8 which works flawlessly on my machine. You might consider changing your dependencies accordingly.
However, it would be even better if you could update to an even newer version of spaCy (e.g. 2.2+) to profit from several performance optimizations done by Explosion. There is already a pending pull request (#22) to address this issue, but without the corresponding training data you used to train the model there is no way to retrain ourselves.
It would be greatly appreciated if you could update your model & package to spaCy 2.2 - as this might take some time you update your package's dependencies to spaCy 2.1.9 in the meantime to circumvent memory leaks present in spaCy 2.1.9.

commented

How did you install spacy 2.1.9? When I try to install spacy 2.1.9 it gives me this error
WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
Collecting spacy==2.1.9
Using cached spacy-2.1.9.tar.gz (30.7 MB)
Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [39 lines of output]
WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
Collecting setuptools
Using cached setuptools-65.0.0-py3-none-any.whl (1.2 MB)
Collecting wheel<0.33.0,>0.32.0
Using cached wheel-0.32.3-py2.py3-none-any.whl (21 kB)
Collecting Cython
Using cached Cython-0.29.32-py2.py3-none-any.whl (986 kB)
Collecting cymem<2.1.0,>=2.0.2
Using cached cymem-2.0.6-cp310-cp310-win_amd64.whl (36 kB)
Collecting preshed<2.1.0,>=2.0.1
Using cached preshed-2.0.1.tar.gz (113 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error

    python setup.py egg_info did not run successfully.
    exit code: 1

    [6 lines of output]
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "C:\Users\OM\AppData\Local\Temp\pip-install-1axblb8v\preshed_554bc9bb4aa743d6b206c3b1263b5b66\setup.py", line 9, in <module>
        from distutils import ccompiler, msvccompiler
    ImportError: cannot import name 'msvccompiler' from 'distutils' (E:\react\GEICOChatBot-master\backend\venv\lib\site-packages\setuptools\_distutils\__init__.py)
    [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed

  Encountered error while generating package metadata.

  See above for output.

  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.
  WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
  WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
  WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)

Step by step guide to fix

Fork a copy of the Blackstone repository on Github.
1. Clone your forked repository onto your local machine using the command git clone , where is the URL of your forked repository on Github.
2. Navigate to the root directory of your local Blackstone repository on your machine.
3. Update the dependencies for Blackstone in the setup.py file by changing the required version of spaCy to 2.1.9 or higher.
4. Save the changes to the setup.py file.
5. Create a new conda environment using the command conda create --name blackstone python=3.10 in your terminal or Anaconda Prompt.
6. Activate the new conda environment using the command conda activate blackstone.
7. Install your local instance of Blackstone into the conda environment using the command pip install -e . while still in the root directory of your local Blackstone repository.
8. Verify that the installation was successful by importing Blackstone in a Python script or notebook using the command import blackstone.
9. Test the package to ensure that it is working correctly.
I hope that helps! Let me know if you have any further questions.