[Bug] importing covisirphy is slow because importing sklearn when "import covsirphy"
lisphilar opened this issue · comments
Checkbox
- I agree to follow code of conduct and to engage in discussion actively.
- I agree to create a new issue with the other templates if this is not a bug.
- I used the latest version of covsirphy.
Summary
covsirphy.Evaluator
uses sklearn.metrics
and importing this takes about 5 seconds. sklearn
is used only in Evaluator
and rewriting metrics codes with numpy
may reduces the importing time.
Reproducible example script
time python -c "import covsirphy"
time python -c "import sklearn"
time python -c "import numpy"
The current outputs
time python -c "import covsirphy"
real 0m18.736s
user 0m3.424s
sys 0m3.629s
$ time python -c "import sklearn"
real 0m5.610s
user 0m1.097s
sys 0m1.856s
$ time python -c "import numpy"
real 0m0.854s
user 0m0.214s
sys 0m0.643s
Expected outputs
`time python -c "import covsirphy"` takes 13 seconds (=18 - 5) at in my environment.
Environment
- CovsirPhy version: 3.0.0-dev
- Python version: 3.10.7
Package manager (required if installation issue)
poetry
Platform (required if installation issue)
Ubuntu
Additional Context
No response
ME: maximum residual error
MAE: https://pystyle.info/ml-regression-criteria/#outline__2_5
MSE: https://www.askpython.com/python/examples/rmse-root-mean-square-error
MSLE: https://pystyle.info/ml-regression-criteria/#outline__2_3
MAPE: https://www.askpython.com/python/examples/mape-mean-absolute-percentage-error
RMSE: https://www.askpython.com/python/examples/rmse-root-mean-square-error
RMSLE: https://gist.github.com/Tafkas/7642141
R2: https://www.adamsmith.haus/python/answers/how-to-calculate-r-squared-with-numpy-in-python
import numpy as np
# Metrics: {name: (function(x1, x2), whether smaller is better or not)}
_METRICS_DICT = {
"ME": (lambda x1, x2: np.max(np.abs(x2 - x1)), True),
"MAE": (lambda x1, x2: np.mean(np.abs(x2 - x1)), True),
"MSE": (lambda x1, x2: np.mean(np.square(x2 - x1)), True),
"MSLE": (lambda x1, x2: np.mean(np.square(np.log1p(x2) - np.log1p(x1))), True),
"MAPE": (lambda x1, x2: np.mean(np.abs((x2 - x1) / x1)) * 100, True),
"RMSE": (lambda x1, x2: np.sqrt(np.mean(np.square(x2 - x1))), True),
"RMSLE": (lambda x1, x2: np.sqrt(np.mean(np.square(np.log(x2 + 1) - np.log(x1 + 1)))), True),
"R2": (lambda x1, x2: np.corrcoef(x1, x2)[0, 1]**2, False),
}
In-effective on importing time because sklearn
is used in anather class for PCA.
Currently, make importtime
command (#1290) shows
With #1291,
- use
numpy
instead ofsklearn.metrics
inEvaluator
class - import
autots
andpca
inside methods because they havesklearn
as a dependency, making importing time slow
In my environment at this time,
- before: 27 seconds in total
- after: 11 seconds in total