ecotaxa / ecotaxa_front

Front end of the EcoTaxa application

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLASSIFICATION : Saved RF models are too big

jiho opened this issue · comments

From a recent test (zooscan_wp2) is seems that the saved models contain a lot of things (this one is 27GB) and therefore takes very long to read.

What is required is only:

  • the definition of the RF trees (a couple thousand splits)
  • the definition of the PCA projection space (a covariance matrix, of size = number of features => ~60x60)
    We should investigate what to discard and what to save.

There is a cheap solution which is to use "joblib" library compress option during save. https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html

Worth a study or soon deprecated?

Linked to a now gone function.