`poetry run` is very slow
JasonGross opened this issue · comments
No idea why, but, e.g., poetry run python -m gbmi.exp_max_of_n.train --force load
is very slow (7--8 seconds). The profile from cProfile
shows literally nothing taking any significant time, Ctrl+C interruption suggests it's an issue with poetry loading pytorch?
time poetry run python -c 'import cProfile; from gbmi.exp_max_of_n import train'
takes about 5--6 seconds
poetry run python -c 'import cProfile; cProfile.run("from gbmi.exp_max_of_n import train")'
6390808 function calls (6217572 primitive calls) in 9.675 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :1(<module>)
1 0.000 0.000 0.000 0.000 :1(ClientTimeoutAttributes)
1 0.000 0.000 0.000 0.000 :1(ConnectionKeyAttributes)
1 0.000 0.000 0.000 0.000 :1(ContentDispositionAttributes)
1 0.000 0.000 0.000 0.000 :1(ETagAttributes)
1 0.000 0.000 0.000 0.000 :1(MimeTypeAttributes)
...
Ah, I guess I was sorting the profile wrong.
poetry run python -m cProfile -s cumtime -m gbmi.exp_max_of_n.train --force load
gives profile.log
profile
6396417 function calls (6224556 primitive calls) in 14.744 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
920 0.043 0.000 57.806 0.063 __init__.py:1(<module>)
2 0.000 0.000 15.955 7.977 train.py:1(<module>)
7387/1 0.287 0.000 14.755 14.755 {built-in method builtins.exec}
1 0.000 0.000 14.754 14.754 <string>:1(<module>)
1 0.000 0.000 14.754 14.754 <frozen runpy>:201(run_module)
1 0.000 0.000 14.752 14.752 <frozen runpy>:65(_run_code)
5289/9 0.052 0.000 13.981 1.553 <frozen importlib._bootstrap>:1165(_find_and_load)
5181/9 0.043 0.000 13.981 1.553 <frozen importlib._bootstrap>:1120(_find_and_load_unlocked)
4969/9 0.037 0.000 13.977 1.553 <frozen importlib._bootstrap>:666(_load_unlocked)
4818/9 0.021 0.000 13.977 1.553 <frozen importlib._bootstrap_external>:934(exec_module)
11562/19 0.012 0.000 13.973 0.735 <frozen importlib._bootstrap>:233(_call_with_frames_removed)
1593/79 0.007 0.000 11.257 0.142 {built-in method builtins.__import__}
5614/1381 0.016 0.000 9.609 0.007 <frozen importlib._bootstrap>:1207(_handle_fromlist)
61 0.003 0.000 4.109 0.067 utils.py:1(<module>)
4819 0.072 0.000 3.109 0.001 <frozen importlib._bootstrap_external>:1007(get_code)
3 0.000 0.000 2.655 0.885 model.py:1(<module>)
4969/4938 0.015 0.000 2.544 0.001 <frozen importlib._bootstrap>:566(module_from_spec)
140/104 0.000 0.000 2.514 0.024 __init__.py:108(import_module)
145/104 0.000 0.000 2.514 0.024 <frozen importlib._bootstrap>:1192(_gcd_import)
99244/61787 0.052 0.000 2.417 0.000 {built-in method builtins.hasattr}
129/124 0.001 0.000 2.312 0.019 <frozen importlib._bootstrap_external>:1231(create_module)
129/124 2.299 0.018 2.311 0.019 {built-in method _imp.create_dynamic}
36/11 0.000 0.000 1.976 0.180 import_utils.py:1366(__getattr__)
24/12 0.000 0.000 1.976 0.165 import_utils.py:1380(_get_module)
222282/191425 0.117 0.000 1.913 0.000 {built-in method builtins.getattr}
10836/10575 0.220 0.000 1.899 0.000 {built-in method builtins.__build_class__}
4820 0.029 0.000 1.865 0.000 <frozen importlib._bootstrap_external>:1127(get_data)
1 0.000 0.000 1.725 1.725 HookedTransformer.py:1(<module>)
1 0.000 0.000 1.704 1.704 loading_from_pretrained.py:1(<module>)
1 0.000 0.000 1.700 1.700 modeling_bert.py:1(<module>)
1 0.000 0.000 1.603 1.603 modeling_utils.py:1(<module>)
5 0.000 0.000 1.581 0.316 accelerator.py:1(<module>)
4973 1.580 0.000 1.580 0.000 {method 'read' of '_io.BufferedReader' objects}
1 0.000 0.000 1.549 1.549 checkpointing.py:1(<module>)
2 0.000 0.000 1.519 0.760 batch_size_finder.py:1(<module>)
2 0.000 0.000 1.510 0.755 callback.py:1(<module>)
9 0.001 0.000 1.505 0.167 types.py:1(<module>)
1 0.000 0.000 1.463 1.463 fsdp_utils.py:1(<module>)
1 0.000 0.000 1.374 1.374 state_dict_loader.py:1(<module>)
1 0.000 0.000 1.365 1.365 default_planner.py:1(<module>)
4 0.000 0.000 1.347 0.337 embedding_ops.py:1(<module>)
1 0.000 0.000 1.340 1.340 op_schema.py:1(<module>)
1 0.000 0.000 1.336 1.336 placement_types.py:1(<module>)
1 0.000 0.000 1.334 1.334 arrow_dataset.py:1(<module>)
1 0.000 0.000 1.333 1.333 _functional_collectives.py:1(<module>)
2 0.000 0.000 1.180 0.590 pit.py:1(<module>)
1 0.000 0.000 1.177 1.177 checks.py:1(<module>)
2 0.000 0.000 1.169 0.585 metric.py:1(<module>)
2 0.000 0.000 1.131 0.565 plot.py:1(<module>)
5242/5161 0.087 0.000 1.119 0.000 <frozen importlib._bootstrap>:1054(_find_spec)
4818 0.024 0.000 1.012 0.000 <frozen importlib._bootstrap_external>:727(_compile_bytecode)
4818 0.979 0.000 0.979 0.000 {built-in method marshal.loads}
24 0.001 0.000 0.909 0.038 api.py:1(<module>)
2 0.000 0.000 0.898 0.449 _ops.py:1(<module>)
5234 0.009 0.000 0.846 0.000 <frozen importlib._bootstrap_external>:1496(find_spec)
5235/5234 0.029 0.000 0.837 0.000 <frozen importlib._bootstrap_external>:1464(_get_spec)
1 0.000 0.000 0.765 0.765 _ipython_extension.py:1(<module>)
1 0.000 0.000 0.763 0.763 model.py:240(train_or_load_model)
6829 0.165 0.000 0.758 0.000 <frozen importlib._bootstrap_external>:1604(find_spec)
1 0.000 0.000 0.748 0.748 normalize.py:37(wrapper)
1 0.000 0.000 0.748 0.748 api.py:933(artifact)
3 0.000 0.000 0.742 0.247 retry.py:94(__call__)
3 0.000 0.000 0.742 0.247 client.py:48(execute)
3 0.000 0.000 0.742 0.247 client.py:58(_get_result)
3 0.000 0.000 0.741 0.247 gql_request.py:42(execute)
1 0.049 0.049 0.739 0.739 _meta_registrations.py:1(<module>)
3 0.000 0.000 0.736 0.245 sessions.py:626(post)
3 0.000 0.000 0.736 0.245 sessions.py:502(request)
3 0.000 0.000 0.722 0.241 sessions.py:673(send)
3 0.000 0.000 0.719 0.240 adapters.py:434(send)
3 0.000 0.000 0.715 0.238 connectionpool.py:595(urlopen)
3 0.000 0.000 0.714 0.238 connectionpool.py:380(_make_request)
14 0.001 0.000 0.684 0.049 config.py:1(<module>)
1 0.000 0.000 0.621 0.621 embed.py:1(<module>)
1 0.000 0.000 0.612 0.612 allowed_functions.py:1(<module>)
3 0.000 0.000 0.605 0.202 internal.py:1(<module>)
2 0.000 0.000 0.579 0.289 __init__.py:1820(__getattr__)
20475 0.036 0.000 0.567 0.000 __init__.py:272(_compile)
24304 0.543 0.000 0.543 0.000 {built-in method posix.stat}
5 0.000 0.000 0.533 0.107 __init__.py:342(__init__)
5 0.533 0.107 0.533 0.107 {built-in method _ctypes.dlopen}
1 0.000 0.000 0.533 0.533 __init__.py:165(_load_global_deps)
29196/27393 0.047 0.000 0.519 0.000 typing.py:352(inner)
2 0.000 0.000 0.517 0.258 retry.py:210(wrapped_fn)
2 0.000 0.000 0.517 0.258 api.py:66(execute)
677 0.006 0.000 0.510 0.001 _compiler.py:738(compile)
1 0.000 0.000 0.504 0.504 wandb_init.py:1(<module>)
1 0.000 0.000 0.494 0.494 artifact.py:1(<module>)
2 0.001 0.000 0.492 0.246 interactiveshell.py:1(<module>)
105 0.002 0.000 0.489 0.005 artist.py:159(_update_set_signature_and_docstring)
7435 0.004 0.000 0.488 0.000 __init__.py:225(compile)
104 0.001 0.000 0.478 0.005 artist.py:126(__init_subclass__)
1 0.000 0.000 0.475 0.475 dependency_versions_check.py:1(<module>)
545 0.001 0.000 0.447 0.001 dataclasses.py:1219(wrap)
545 0.034 0.000 0.445 0.001 dataclasses.py:884(_process_class)
3 0.000 0.000 0.442 0.147 _base.py:1(<module>)
2 0.000 0.000 0.406 0.203 backend.py:1(<module>)
1 0.000 0.000 0.393 0.393 api.py:615(_parse_artifact_path)
1 0.000 0.000 0.393 0.393 api.py:491(default_entity)
136 0.006 0.000 0.383 0.003 {method 'readline' of '_io.BufferedReader' objects}
117 0.001 0.000 0.383 0.003 artist.py:1837(kwdoc)
3 0.000 0.000 0.380 0.127 connection.py:435(getresponse)
6 0.001 0.000 0.379 0.063 functional.py:1(<module>)
3 0.000 0.000 0.379 0.126 client.py:1334(getresponse)
3 0.000 0.000 0.378 0.126 client.py:311(begin)
3 0.000 0.000 0.377 0.126 client.py:278(_read_status)
3 0.000 0.000 0.377 0.126 socket.py:692(readinto)
3 0.000 0.000 0.377 0.126 ssl.py:1300(recv_into)
3 0.000 0.000 0.377 0.126 ssl.py:1158(read)
3 0.377 0.126 0.377 0.126 {method 'read' of '_ssl._SSLSocket' objects}
68/64 0.000 0.000 0.375 0.006 __init__.py:365(__getattr__)
5 0.000 0.000 0.372 0.074 decorators.py:1(<module>)
1 0.000 0.000 0.370 0.370 offsetbox.py:1(<module>)
1 0.000 0.000 0.368 0.368 decorators.py:101(inner)
10 0.000 0.000 0.368 0.037 allowed_functions.py:82(remove)
28/27 0.001 0.000 0.368 0.014 allowed_functions.py:63(__call__)
1 0.001 0.001 0.367 0.367 allowed_functions.py:149(_allowed_function_ids)
629/2 0.076 0.000 0.365 0.182 allowed_functions.py:188(_find_torch_objects)