Giters
pytorch
/
PiPPy
Pipeline Parallelism for PyTorch
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
577
Watchers:
36
Issues:
236
Forks:
71
pytorch/PiPPy Issues
Issue with optimizer instantiation
Updated
4 months ago
Comments count
2
Check if remap_qualname still works after refactorization
Closed
4 months ago
Comments count
1
Check if stage-wise checkpoint loading still works after refactorization
Updated
4 months ago
Check if meta device tracing still works after refactorization
Updated
4 months ago
ResNet example always underfitting when pippy training
Updated
4 months ago
Comments count
5
PyTorch renaming submod indices leading to assert break
Updated
4 months ago
Pipeline Schedule confused
Updated
4 months ago
Comments count
1
Decouple graph interpretation from pipeline executor
Updated
5 months ago
[H100] local test C10D forward does not have tensor result equivalency (16% mismatch)
Updated
5 months ago
Incompatible with pytorch 2.0?
Closed
6 months ago
Failed to run fine-tuning (freezing some layers) of hf model with pippy
Updated
7 months ago
split_into_equal_size returns submodules with non-optimizable parameters
Updated
7 months ago
[spmd] spmd api tracing warning need to investigate
Closed
8 months ago
Comments count
2
Any plan to support PEFT LoRA models?
Updated
8 months ago
Comments count
2
Why does parallel pipeline require a master
Updated
8 months ago
Comments count
1
tp+pp and gspmd examples not running
Closed
10 months ago
Comments count
1
[spmd] spmd logging doesn't work with logging level
Closed
10 months ago
How did this error happen when i run example about resnet?
Updated
10 months ago
Split each layer in multiple gpu
Updated
a year ago
Request for Examples of Pipeline Parallelism with Multiple Machines in PiPPy
Updated
a year ago
Comments count
1
TP+PiPPy failing on HF examples.
Updated
a year ago
Comments count
4
How to run the gpt2 example on a single node with four GPU?
Updated
a year ago
Could pippy be coexisted with deepspeed?
Updated
a year ago
Comments count
1
Incorrect loss value of huggingface bert example
Updated
a year ago
init_empty_weights only works with torchrun and is very slow
Closed
a year ago
Comments count
6
How to reduce memory costs when running on CPU
Closed
a year ago
Move DTensor from tau to PyTorch
Closed
a year ago
Pippy ddp2pipe example doesn't work for pipeline
Updated
a year ago
Comments count
4
Problem reproducing minimal example
Closed
a year ago
Comments count
2
[SPMD] Missing DT support NotImplementedError: Operator aten.amax.default does not have a DistributedTensor rule registered.
Updated
a year ago
[SPMD] Add support for convolution ops to DTensor sharding prop
Updated
a year ago
[DTensor] missing rule for aten.fill.Scalar causing unit tests to fail for SPMD
Updated
a year ago
Issue with FX tracing of HF seq2seq models
Updated
a year ago
Remove checkpoint files moved to PT
Closed
a year ago
Fix test failure in test/spmd/checkpoint/test_dt_planner.py
Closed
a year ago
Fix test failure in test/spmd/checkpoint/test_pg_planner.py
Closed
a year ago
[SPMD][Fusion] add bucket size/ num_bytes policy for fusion
Updated
a year ago
[SPMD][Fusion] - ensure matching ProcessGroups for fused comm calls
Updated
a year ago
[SPMD][Fusion] - ensure buffer dtype matches gradient tensor dtype
Updated
a year ago
[SPMD][Fusion] Add unit tests for fusion
Updated
a year ago
[SPMD][Fusion] tracking - move global buffer to just before first fusion
Updated
a year ago
[spmd] incorrect aten.expand call with nn.linear (expanded size must match existing size at dim 0)
Updated
a year ago
Comments count
1
'CLIPVisionConfig' object has no attribute 'vocab_size'
Updated
a year ago
[SPMD] Remove Gradient tensor clones added during DTensor comm collective insertion
Updated
a year ago
pytests_test_gpu(0) will fail if allocated a non-4 gpu server - add guard/skip?
Updated
a year ago
Support Segformer models in HF tests
Updated
a year ago
[spmd] torch.cat (aten.cat.default) not implemented for Distributed Tensor (tracking)
Updated
a year ago
[spmd] self-attention not converging
Updated
2 years ago
Comments count
1
[spmd] self-attention module's proj.bias isn't properly updated on all ranks but rank 0
Updated
2 years ago
Comments count
1
Buck run device error
Closed
2 years ago
Previous
Next