TeamHG-Memex / Formasaurus

Formasaurus tells you the type of an HTML form and its fields using machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Max_iter error when running Formasaurus init

l0nedigit opened this issue · comments

Catching this output when running formasaurus init for the first time. Uncertain if it has any negative effect on prediction afterwards. Figured I'd bring it up.

On fresh setup of formasaurus running init causes this:

Loading training data...
Loading: 954 files [00:05, 176.17 files/s] 

Training form type detector on 1426 example(s)...
/root/ericsn0/n0/src/n0/modules/testvenv/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
Training field type detector...
Training on 1363 forms
Using precise form types
Extracting features
Python3.7.7

Virtual env packages:

Package          Version
---------------- ---------
certifi          2020.6.20
chardet          3.0.4
docopt           0.6.2
formasaurus      0.9.0
idna             2.10
joblib           0.16.0
lxml             4.5.2
numpy            1.19.1
pip              20.1.1
python-crfsuite  0.9.7
requests         2.24.0
requests-file    1.5.1
scikit-learn     0.23.1
scipy            1.5.2
setuptools       49.2.0
six              1.15.0
sklearn          0.0
sklearn-crfsuite 0.3.6
tabulate         0.8.7
threadpoolctl    2.1.0
tldextract       2.2.2
tqdm             4.48.0
urllib3          1.25.10
w3lib            1.22.0
wheel            0.34.2