microsoft / CromwellOnAzure

Microsoft Genomics implementation of the Broad Institute's Cromwell workflow engine on Azure

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large Bam files fail to copy into the output directory

simalicrum opened this issue · comments

Describe the bug
Running the workflow here on a >100GB dataset: https://mondrian-scwgs.github.io/mondrian/#/

Workflow steps appear to complete successfully until the end of the last workflow task where a 77GB output Bam should be copied into the output directory to complete the run. All other output files are copied into the output directory.

The successfully completed analysis output Bam is successfully built but never copied from the /cromwell-executions/ container.

Trigger file is moved into the 'failed' directory with the error "CromwellFailed" in the json output.

Steps to Reproduce
Running the workflow here on a large dataset: https://mondrian-scwgs.github.io/mondrian/#/ with a large output file in the /cromwell-executions/ workflow directory.

Expected behavior
Large output files should be copied into specified output location.

Deployment details: (any information you can provide would be helpful):
Cromwell on Azure 4.5 deployment with no changes to configuration.

Screenshots
Drilled down into AKS workload container 'cromwell' and found the following Exceptions:

image
image

Additional context
Workflow runs with smaller output files using the identical workflow files complete successfully.

This is issue is mitigated by ensuring the Temp disks on the AKS nodes are big enough to accommodate the size of the output file during the final copy into the output container from the cromwell-executions container. I also changed the size of the various PVCs on the cluster in case that had some kind of impact.

I'm 99% sure this is caused by the CSI driver for AKS. There is some kind of staging that happens in the node Temp disk during the copy. It looks like to me that the entire file is copied into the Temp disk before writing to the output directory.