Fix Batch concurrency bug when c > 5
calvinnhieu opened this issue · comments
Calvin Nhieu commented
When > 5 loom requests for the same (cell id set equality) matrix (cache miss) are submitted in parallel, only 5 matrices are generated successfully and the rest fail during the Batch job. The error thrown during Batch resembles:
TypeError: Can't broadcast (63925, 50) -> (63925, 44)
during row writing to the h5py.File
/loom file.
Review the concurrency configurations for the ECS Query Runner, Redshift and Batch jobs to determine root cause and fix.
Definition of Done:
- ensure > 5 identical matrices can be generated in parallel successfully for all formats
- ensure > 5 different matrices can be generated in parallel successfully for all formats