Regression in runtime

Question

Regression in runtime

brynpickering opened this issue 9 months ago · comments

Problem description

Sometime since around #487, everything has slowed down considerably. Likely it is due to some combination of dependencies and some buggy implementation we've introduced.

Pandas v2.1.1 has a known regression that is likely hitting us when we do multiindex operations (pandas-dev/pandas#55256), so we should probably pin to <=2.0.3. However, it only explains part of the problem...

5 slowest tests pre #487:

11.82s call     calliope/test/test_core_time.py::TestClustering::test_hartigans_rule
9.58s call     calliope/test/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc
7.21s call     calliope/test/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
6.74s call     calliope/test/test_example_models.py::TestUrbanScaleExampleModelSenseChecks::test_urban_example_results_area[solve]
6.43s setup    calliope/test/test_backend_pyomo.py::TestMILPConstraints::test_loc_techs_storage_capacity_max_purchase_milp_constraint

5 slowest tests in #487 with pandas=2.0.3:

20.23s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
15.85s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
15.41s setup    tests/test_io.py::TestIO::test_save_netcdf
15.34s call     tests/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
14.74s call     tests/test_example_models.py::TestNationalScaleExampleModelSenseChecks::test_nationalscale_example_results_glpk

5 slowest tests post #487 with pandas=2.0.3:

==================================================================== slowest 5 durations =====================================================================
33.98s call     tests/test_cli.py::TestCLI::test_run_from_netcdf
31.62s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
29.08s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
22.66s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[solve]
20.40s call     tests/test_backend_latex_backend.py::TestMathDocumentation::test_string_return[build_valid-tex-\n\\documentclass{article}]

5 slowest tests post #487 with pandas=2.1

41.35s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
28.90s call     tests/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
23.05s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
20.40s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[solve]
19.58s setup    tests/test_io.py::TestIO::test_save_netcdf

Calliope version

v0.7.0-dev

Bryn Pickering · Answer 1 · Wed Oct 25 2023 00:47:50 GMT+0800 (China Standard Time)

OK, a lot of this is explained by collecting coverage by default. without coverage:

pandas=2.1.0 (all tests: 1min26)

==================================================================== slowest 5 durations =====================================================================
12.61s setup    tests/test_io.py::TestIO::test_save_netcdf
10.87s call     tests/test_cli.py::TestCLI::test_run_from_netcdf
10.78s call     tests/test_example_models.py::TestUrbanScaleExampleModelSenseChecks::test_urban_example_results_area
10.67s call     tests/test_cli.py::TestCLI::test_run_save_lp
10.60s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]

pandas=2.0.3 (all tests: 1min19)

==================================================================== slowest 5 durations =====================================================================
15.85s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
11.32s call     tests/test_cli.py::TestCLI::test_run_from_netcdf
11.05s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
9.99s setup    tests/test_io.py::TestIO::test_save_netcdf
9.59s call     tests/test_example_models.py::TestUrbanScaleExampleModelSenseChecks::test_urban_example_results_cap

pandas=2.1.1 (all tests: 1min27)

==================================================================== slowest 5 durations =====================================================================
11.78s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
10.95s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
10.54s call     tests/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
10.02s setup    tests/test_io.py::TestIO::test_save_netcdf
9.35s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_storage_inter_cluster

Looks like there is a marginal benefit to sticking with pandas=2.0.3 for now.

Bryn Pickering · Answer 2 · Wed Oct 25 2023 01:15:59 GMT+0800 (China Standard Time)

Still not perfect. Going back from current main in stages (skipping some commits), it looks like something was introduced in 95ce22d:

In current main (a01d4bb):

==================================================================== slowest 5 durations =====================================================================
26.80s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
21.20s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
16.23s call     tests/test_example_models.py::TestNationalScaleExampleModelSenseChecks::test_considers_supply_generation_only_in_total_levelised_cost
16.04s call     tests/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
13.36s call     tests/test_cli.py::TestCLI::test_run_from_netcdf

In 95ce22d

==================================================================== slowest 5 durations =====================================================================
21.50s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
18.41s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
15.89s call     tests/test_cli.py::TestCLI::test_run_from_netcdf
12.72s call     tests/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
11.75s call     tests/test_example_models.py::TestNationalScaleExampleModelSenseChecks::test_nationalscale_example_results_cbc

In bf87c30

==================================================================== slowest 5 durations =====================================================================
11.19s call     tests/test_example_models.py::TestNationalScaleResampledExampleModelSenseChecks::test_nationalscale_example_results_cbc
11.10s call     tests/test_cli.py::TestCLI::test_run_from_netcdf
11.09s setup    tests/test_io.py::TestIO::test_save_netcdf
10.79s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
9.63s call     tests/test_example_models.py::TestUrbanScaleExampleModelSenseChecks::test_urban_example_results_area

In 42500c7

13.43s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
10.90s call     tests/test_cli.py::TestCLI::test_run_from_yaml
10.68s call     tests/test_example_models.py::TestNationalScaleResampledExampleModelSenseChecks::test_nationalscale_example_results_cbc
9.85s call     tests/test_cli.py::TestCLI::test_run_from_netcdf
9.74s call     tests/test_backend_latex_backend.py::TestMathDocumentation::test_string_return[build_all-tex-\n\\documentclass{article}]

In 5591b09

==================================================================== slowest 5 durations =====================================================================
15.30s call     tests/test_core_time.py::TestClustering::test_hartigans_rule
11.87s setup    tests/test_io.py::TestIO::test_save_netcdf
10.25s call     tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test_nationalscale_clustered_example_closest_results_cbc[run]
10.21s call     tests/test_example_models.py::TestModelPreproccessing::test_preprocess_time_clustering
9.67s call     tests/test_cli.py::TestCLI::test_run_from_netcdf

Bryn Pickering · Answer 3 · Wed Oct 25 2023 01:23:20 GMT+0800 (China Standard Time)

profiling tests/test_example_models.py::TestNationalScaleClusteredExampleModelSenseChecks::test _nationalscale_clustered_example_closest_results_cbc[run] suggests that the profiling in pytest is just quite volatile (or that timeseries clustering is; maybe that is the real cause of our pain):

in 95ce22d (should be 21secs according to pytest durations):

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   12.075   12.075 runner.py:111(pytest_runtest_protocol)
    17/11    0.000    0.000   12.074    1.098 _hooks.py:479(__call__)
    17/11    0.000    0.000   12.074    1.098 _manager.py:106(_hookexec)
    17/11    0.000    0.000   12.074    1.098 _callers.py:27(_multicall)
        1    0.000    0.000   12.074   12.074 runner.py:119(runtestprotocol)
        3    0.000    0.000   12.074    4.025 runner.py:219(call_and_report)
        3    0.000    0.000   12.074    4.025 runner.py:247(call_runtest_hook)
        3    0.000    0.000   12.074    4.025 runner.py:318(from_call)
        3    0.000    0.000   12.074    4.025 runner.py:262(<lambda>)
        1    0.000    0.000   12.072   12.072 runner.py:160(pytest_runtest_call)
        1    0.000    0.000   12.072   12.072 python.py:1790(runtest)
        1    0.000    0.000   12.072   12.072 python.py:187(pytest_pyfunc_call)
        1    0.000    0.000   12.072   12.072 test_example_models.py:499(test_nationalscale_clustered_example_closest_results_cbc)
        1    0.000    0.000   12.072   12.072 test_example_models.py:477(example_tester_closest)
        1    0.000    0.000   12.072   12.072 test_example_models.py:418(model_runner)
        1    0.000    0.000    6.806    6.806 model.py:355(build)
        1    0.000    0.000    6.464    6.464 model.py:376(_build)
       69    0.007    0.000    6.434    0.093 backend_model.py:163(_add_component)
        1    0.000    0.000    4.172    4.172 examples.py:31(time_clustering)
        1    0.000    0.000    4.172    4.172 model.py:59(__init__)

In bf87c30 (should be <9secs according to pytest durations):

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   15.392   15.392 runner.py:111(pytest_runtest_protocol)
    17/11    0.000    0.000   15.392    1.399 _hooks.py:479(__call__)
        1    0.000    0.000   15.392   15.392 runner.py:119(runtestprotocol)
    17/11    0.000    0.000   15.392    1.399 _manager.py:106(_hookexec)
        3    0.000    0.000   15.392    5.131 runner.py:219(call_and_report)
    17/11    0.000    0.000   15.392    1.399 _callers.py:27(_multicall)
        3    0.000    0.000   15.391    5.130 runner.py:247(call_runtest_hook)
        3    0.000    0.000   15.391    5.130 runner.py:318(from_call)
        3    0.000    0.000   15.391    5.130 runner.py:262(<lambda>)
        1    0.000    0.000   15.390   15.390 runner.py:160(pytest_runtest_call)
        1    0.000    0.000   15.390   15.390 python.py:1790(runtest)
        1    0.000    0.000   15.390   15.390 python.py:187(pytest_pyfunc_call)
        1    0.000    0.000   15.390   15.390 test_example_models.py:499(test_nationalscale_clustered_example_closest_results_cbc)
        1    0.000    0.000   15.390   15.390 test_example_models.py:477(example_tester_closest)
        1    0.000    0.000   15.390   15.390 test_example_models.py:418(model_runner)
        1    0.000    0.000    7.920    7.920 model.py:355(build)
        1    0.000    0.000    7.518    7.518 model.py:376(_build)
       69    0.006    0.000    7.486    0.108 backend_model.py:163(_add_component)
        1    0.000    0.000    6.373    6.373 examples.py:31(time_clustering)
        1    0.000    0.000    6.373    6.373 model.py:59(__init__)

Bryn Pickering · Answer 4 · Wed Oct 25 2023 22:25:24 GMT+0800 (China Standard Time)

Fixed the pandas and --cov issues in #499