Giters
pytorch
/
PiPPy
Pipeline Parallelism for PyTorch
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
577
Watchers:
35
Issues:
233
Forks:
71
pytorch/PiPPy Issues
TorchVision Installation issue
Updated
a month ago
Comments count
1
Non-0 rank creates CUDA context on GPU 0
Updated
a month ago
Comments count
1
Issue with the unet example on a different model, reduction dim mismatch
Updated
2 months ago
Comments count
1
examples/Inference failed
Updated
2 months ago
Comments count
2
Add lazy shape inference with buffer shape validator
Closed
2 months ago
Support for UNet1D and UNet2D models?
Closed
2 months ago
Comments count
4
PipelineStage/Schedule issues
Updated
2 months ago
Comments count
2
format.sh does not format correctly enough for check.sh
Updated
2 months ago
PipelineStage: Improve error logging and debuggability
Updated
2 months ago
Add device mesh / process group support for PipelineStage
Updated
2 months ago
Add support for loss function, update the PipelineStage output
Updated
2 months ago
Fix backward implementation and remove setting grad in forward()
Updated
2 months ago
Shape prop error when kwargs have constants
Closed
3 months ago
Re-support kwargs at run time
Closed
3 months ago
Issue with optimizer instantiation
Updated
4 months ago
Comments count
2
Check if remap_qualname still works after refactorization
Closed
4 months ago
Comments count
1
Check if stage-wise checkpoint loading still works after refactorization
Updated
4 months ago
Check if meta device tracing still works after refactorization
Updated
4 months ago
ResNet example always underfitting when pippy training
Updated
4 months ago
Comments count
5
PyTorch renaming submod indices leading to assert break
Updated
4 months ago
Pipeline Schedule confused
Updated
4 months ago
Comments count
1
Decouple graph interpretation from pipeline executor
Updated
5 months ago
[H100] local test C10D forward does not have tensor result equivalency (16% mismatch)
Updated
5 months ago
Incompatible with pytorch 2.0?
Closed
6 months ago
Failed to run fine-tuning (freezing some layers) of hf model with pippy
Updated
7 months ago
split_into_equal_size returns submodules with non-optimizable parameters
Updated
7 months ago
Any plan to support PEFT LoRA models?
Updated
8 months ago
Comments count
2
Why does parallel pipeline require a master
Updated
8 months ago
Comments count
1
tp+pp and gspmd examples not running
Closed
10 months ago
Comments count
1
[spmd] spmd logging doesn't work with logging level
Closed
10 months ago
How did this error happen when i run example about resnet?
Updated
10 months ago
Split each layer in multiple gpu
Updated
10 months ago
Request for Examples of Pipeline Parallelism with Multiple Machines in PiPPy
Updated
a year ago
Comments count
1
TP+PiPPy failing on HF examples.
Updated
a year ago
Comments count
4
How to run the gpt2 example on a single node with four GPU?
Updated
a year ago
Could pippy be coexisted with deepspeed?
Updated
a year ago
Comments count
1
Incorrect loss value of huggingface bert example
Updated
a year ago
init_empty_weights only works with torchrun and is very slow
Closed
a year ago
Comments count
6
How to reduce memory costs when running on CPU
Closed
a year ago
Pippy ddp2pipe example doesn't work for pipeline
Updated
a year ago
Comments count
4
Problem reproducing minimal example
Closed
a year ago
Comments count
2
[SPMD] Missing DT support NotImplementedError: Operator aten.amax.default does not have a DistributedTensor rule registered.
Updated
a year ago
[SPMD] Add support for convolution ops to DTensor sharding prop
Updated
a year ago
[DTensor] missing rule for aten.fill.Scalar causing unit tests to fail for SPMD
Updated
a year ago
Issue with FX tracing of HF seq2seq models
Updated
a year ago
Remove checkpoint files moved to PT
Closed
a year ago
Fix test failure in test/spmd/checkpoint/test_dt_planner.py
Closed
a year ago
Fix test failure in test/spmd/checkpoint/test_pg_planner.py
Closed
a year ago
[SPMD][Fusion] add bucket size/ num_bytes policy for fusion
Updated
a year ago
[SPMD][Fusion] - ensure matching ProcessGroups for fused comm calls
Updated
a year ago
Previous
Next