dragnet-org / dragnet

Just the facts -- web page content extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

incompatible sklearn/joblib version?

yeus opened this issue · comments

Hi everyone,

First of all great project ;). I am trying to get dragnet to run but have a problem with loading the pickled models. This is probably due to a version conflict in either joblib, numpy or sklearn. At least that's what I assume due to this blogpost:

https://stackoverflow.com/questions/48948209/keyerror-when-loading-pickled-scikit-learn-model-using-joblib

My own versions of sklearn and joblib and numpy are:

sklearn.__version__
Out: '0.19.1'

from sklearn.externals import joblib
joblib.__version__
Out: '0.14.1'

import numpy
numpy.__version__
Out: '1.17.4'

I think that probably this code section:

https://github.com/dragnet-org/dragnet/blob/master/dragnet/compat.py#L265

takes care of loading the different pickle modules regarding the correct version. It doesn't say anything regarding joblib though. On my system (ubuntu/python3, sklearn installed with pip) sklearn makes use of the system-wide joblib version. So

import joblib == import sklearn.external.joblib

I hope that you can help me maybe I can even contribute a little to the project. Which versions of joblib & sklearn & numpy should I try to use? here is the error:

content = extract_content(doc.summary())
Traceback (most recent call last):

  File "<ipython-input-5-882782be121c>", line 1, in <module>
    content = extract_content(doc.summary())

  File "/home/tom/.local/lib/python3.6/site-packages/dragnet/__init__.py", line 12, in extract_content
    'kohlschuetter_readability_weninger_content_model.pkl.gz')

  File "/home/tom/.local/lib/python3.6/site-packages/dragnet/util.py", line 168, in load_pickled_model
    return joblib.load(filepath)

  File "/home/tom/.local/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 605, in load
    obj = _unpickle(fobj, filename, mmap_mode)

  File "/home/tom/.local/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 529, in _unpickle
    obj = unpickler.load()

  File "/usr/lib/python3.6/pickle.py", line 1050, in load
    dispatch[key[0]](self)

KeyError: 0