HumanCellAtlas / matrix-service

DCP Expression Matrix Service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix Batch concurrency bug when c > 5

calvinnhieu opened this issue · comments

When > 5 loom requests for the same (cell id set equality) matrix (cache miss) are submitted in parallel, only 5 matrices are generated successfully and the rest fail during the Batch job. The error thrown during Batch resembles:

TypeError: Can't broadcast (63925, 50) -> (63925, 44) during row writing to the h5py.File/loom file.

Review the concurrency configurations for the ECS Query Runner, Redshift and Batch jobs to determine root cause and fix.

Definition of Done:

  • ensure > 5 identical matrices can be generated in parallel successfully for all formats
  • ensure > 5 different matrices can be generated in parallel successfully for all formats