Eliminate non-pure Python dependencies without wheels
jamescurtin opened this issue · comments
Currently, extra-model
has dependencies on pycld2==0.31
, cytoolz==0.9.0
, and spacy==2.0.18
: these pacakges either directly or indirectly use C extensions that are not shipped as a wheel. As a result, gcc
is a requirement of extra-model
so that these dependencies can be build from source.
The best case is to eliminate any dependencies on gcc
. If so, images deployed to production will (1) be smaller, (2) build faster, and (3) be more secure. Additionally, users of extra-model
are less likely to encounter installation errors because of missing C libraries.
It should be possible to eliminate the dependency on gcc
with the following changes:
cytoolz
: Neithercytoolz
nortoolz
are used in the codebase (perhaps an old dependency that was never cleaned up?) We can remove this package from the requirements file.pycld2
: This project hasn't been updated since 2019. If upgrading to usecld3
would be acceptable (difference betweencld2
andcld3
), we could usepycld3
as a drop-in replacement.pycld3
provides wheels for compatibility and is actively maintained.spacy
: Newer releases ofspacy
eliminate the offending dependencies. There is already a PR (#54) that updatesspacy
to a compatible version.
Once these changes are made, we can start using the slim-buster
docker image instead of buster
. The slim version is substantially smaller (112MB vs. 875MB) and doesn't contain gcc
--which replicates a desirable production environment.
I'll put up a draft PR to demonstrate---and once #54 is merged I will update the PR to use the slim image.