Too many bindings of type StorageBuffers in Stage ShaderStages(COMPUTE)
ianmarmour opened this issue · comments
Describe the bug
When attempting to perform any training of my model that with a batch size > 1 there are too many bindings of type StorageBuffer
. This leads to a panic in WGPU and causes the training process to crash. It appears that the correct limits for the adapter are being inferred but potentially not respected by fusion if fusion is disabled this error isn't encountered.
To Reproduce
- Use WGPU as the device type on a Mac.
- .... unsure...
cargo run
to perform training with batch size > 1- On the second batch of items the panic will occur.
Expected behavior
Proper allocation of the correct number of maximum available StorageBuffers
.
Screenshots
2024-07-04T01:51:02.976015Z INFO burn_fusion::stream::store::base: New execution plan 76 - Operations: 1 - Triggers 1
2024-07-04T01:51:02.976063Z INFO burn_fusion::stream::store::base: New execution plan 77 - Operations: 3 - Triggers 1
2024-07-04T01:51:02.985188Z INFO burn_jit::fusion::kernel: Compiling ... "mri0pvnx16y16z16g0vs7ubdfg"
2024-07-04T01:51:03.194143Z INFO burn_jit::fusion::kernel: Compiling ... "mri0pvnx16y16z1675o2aql5m4"
2024-07-04T01:51:03.404701Z INFO burn_compute::tune::tuner: Fastest result burn_jit::fusion::kernel::AutotunableKernel<burn_wgpu::runtime::WgpuRuntime>-Fusion ElemWise - num_operations: 4 shape: [32, 64, 2, 64]
2024-07-04T01:51:03.428641Z INFO burn_jit::fusion::kernel: Compiling ... "mi0o0ri0pvnx16y16z16g0vs7ubdfg"
2024-07-04T01:51:03.515667Z INFO burn_fusion::stream::store::base: New execution plan 78 - Operations: 99 - Triggers 1
2024-07-04T01:51:03.595443Z INFO burn_jit::fusion::kernel: Compiling ... "mri0pi2pi3pi4pi5pi6pi7pi8pi9pi10pi11pi12pi13pi14pi15pi16pi17pi18pi19pi20pi21pi22pi23pi24pi25pi26pi27pi28pi29pi30pi31pi32pi33pi34pi35pi36pi37pi38pi39pi40pi41pi42pi43pi44pi45pi46pi47pi48pi49pi50pi51pi52pi53pi54pi55pi56pi57pi58pi59pi60pi61pi62pi63pi64pi66pvnx16y16z160u635da9uo"
2024-07-04T01:51:03.597239Z ERROR wgpu::backend::wgpu_core: Handling wgpu errors as fatal by default
2024-07-04T01:51:03.597268Z ERROR burn_train::learner::application_logger: PANIC => panicked at /Users/ian/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.1/src/backend/wgpu_core.rs:2996:5:
wgpu error: Validation Error
Caused by:
In Device::create_compute_pipeline
Unable to derive an implicit layout
Too many bindings of type StorageBuffers in Stage ShaderStages(COMPUTE), limit is 31, count was 102. Check the limit `max_storage_buffers_per_shader_stage` passed to `Adapter::request_device`
Desktop (please complete the following information):
- OS: MacOS Sonoma
- CPU: M3 Max
- Version: 14.5
- Burn Version: 0.14.0
Additional context
Training Configuration
#[derive(Config)]
pub struct TrainingConfig {
pub model: TasNetConfig,
pub optimizer: AdamConfig,
#[config(default = 10)]
pub num_epochs: usize,
#[config(default = 2)]
pub batch_size: usize,
#[config(default = 4)]
pub num_workers: usize,
#[config(default = 42)]
pub seed: u64,
#[config(default = 3.0e-4)]
pub learning_rate: f64,
}
fn main() {
let device = WgpuDevice::BestAvailable;
training::train::<Autodiff<Wgpu>>(
"/tmp/guide",
training::TrainingConfig::new(TasNetConfig::new(500, 40, 1000, 10, 2), AdamConfig::new()),
device,
);
}
Metal GPU Feature Set Table: Indicates the inferred maximum limit of 31 is correct.