hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Home Page:https://hpcaitech.github.io/Open-Sora/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ModuleNotFoundError: No module named 'torch._six'

IMJONEZZ opened this issue · comments

I installed on Ubuntu using the instructions in the README.

Everything installed correctly, but when I attempt to run using this command:
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path OpenSora-v1-HQ-16x256x256.pth --prompt-path ./assets/texts/t2v_samples.txt

I get this traceback:

[04/05/24 13:17:15] INFO     colossalai - colossalai - INFO:
                             /home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/colossalai/initialize.py
                             :67 launch
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, world size: 1
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.93s/it]
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/layers/blocks.py", line 33, in get_layernorm
    from apex.normalization import FusedLayerNorm
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/__init__.py", line 8, in <module>
    from . import amp
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/__init__.py", line 1, in <module>
    from .amp import init, half_function, float_function, promote_function,\
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/amp.py", line 5, in <module>
    from .frontend import *
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/frontend.py", line 2, in <module>
    from ._initialize import _initialize
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_initialize.py", line 2, in <module>
    from torch._six import string_classes
ModuleNotFoundError: No module named 'torch._six'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Open-Sora/scripts/inference.py", line 112, in <module>
    main()
  File "/home/user/Open-Sora/scripts/inference.py", line 58, in main
    model = build_module(
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/registry.py", line 22, in build_module
    return builder.build(cfg)
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 385, in STDiT_XL_2
    model = STDiT(depth=28, hidden_size=1152, patch_size=(1, 2, 2), num_heads=16, **kwargs)
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 181, in __init__
    [
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 182, in <listcomp>
    STDiTBlock(
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 56, in __init__
    self.norm1 = get_layernorm(hidden_size, eps=1e-6, affine=False, use_kernel=enable_layernorm_kernel)
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/opensora/models/layers/blocks.py", line 37, in get_layernorm
    raise RuntimeError("FusedLayerNorm not available. Please install apex.")
RuntimeError: FusedLayerNorm not available. Please install apex.
[2024-04-05 13:18:01,454] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1417) of binary: /home/user/miniconda3/envs/opensora/bin/python
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/opensora/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.2.2', 'console_scripts', 'torchrun')())
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/user/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts/inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-05_13:18:01
  host      : DESKTOP-G7PO0IO.
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1417)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

It says that FusedLayerNorm isn't available and to install apex, but apex is correctly installed.

I found this issue: microsoft/DeepSpeed#2845

It looks like torch._six is deprecated.

After deleting the environment and the repo and starting from scratch, after installing everything over again I now have this error:
ModuleNotFoundError: No module named 'colossalai'

This is obviously a problem because of this:

Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import colossalai
>>> print(colossalai.__version__)
0.3.6

ModuleNotFoundError: No module named 'colossalai'

It could be a problem of having ambiguity in the python you used for installing colossalai. Can you please show which python?

This issue is stale because it has been open for 7 days with no activity.