Input/output error when creating IGV report
fgvieira opened this issue · comments
Dear all,
I am trying to create an IGV report:
create_report --flanking 100 --info-columns DP AF ANN --sample-columns GT DP AD GQ --standalone --output out.html variants.annot.vcf.gz reference_genome.fasta --tracks sample.cram
But I am getting the error below:
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/97a317c689cbdd7e92a5c159acd290d2": Input/output error
[...]
What does this error mean? If I try access the URL, it works fine.
And why is IGV trying to access ENA if I provide the reference genome?
thanks,
@fgvieira I actually do not know, I suspect pysam is trying to access the file based on information in the header. If so this is out of my control. Do you have, or can you create, a "bam" file version of the cram to see if the error persists?
I found this in the samtools reference (pysam, used by igv-reports, wraps samtools)
When reading a CRAM the @SQ headers are interrogated to identify the reference sequence MD5sum (M5: tag) and the local reference sequence filename (UR: tag). Note that http:// and ftp:// based URLs in the UR: field are not used, but local fasta filenames (with or without file://) can be used.
To create a CRAM the @SQ headers will also be read to identify the reference sequences, but M5: and UR: tags may not be present. In this case the -T and -t options of samtools view may be used to specify the fasta or fasta.fai filenames respectively (provided the .fasta.fai file is also backed up by a .fasta file).
I cannot reproduce the issue with my test CRAM files, but I suspect there is some specific combination of headers that cause samtools to try to compute a MD5Sum from the reference specified in the header. I think the fix will be to pass the reference specified in igv-reports to pysam with the "-T" argument. I am actually on vacation now until July 12 and will not try a release until I return, but I have pushed this (potential) fix. If you are able to build from source and try it please report back here. Feel free to ping this issue after July 12 as a reminder to release.
I tried the latest commit, but got an error:
Traceback (most recent call last):
File "bin/create_report", line 8, in <module>
sys.exit(main())
File "lib/python3.9/site-packages/igv_reports/report.py", line 271, in main
create_report(args)
File "lib/python3.9/site-packages/igv_reports/report.py", line 51, in create_report
reader = utils.getreader(track, None, args.fasta)
File "lib/python3.9/site-packages/igv_reports/utils.py", line 11, in getreader
return bam.BamReader(path)
TypeError: __init__() missing 1 required positional argument: 'fasta'
I tried adding the --fasta
command line option, but it is not recognized.
OK I'll have to look at this next week, I'm traveling now. There is no. "--fasta" option in igv-reports, that is a pysam option, sorry if I wasn't clear.
Many people use this with CRAM files, I don't know what it is about yours that is triggering this. However, the error is coming from pysam, a python package. I suggest installing that package and trying to query the file for alignments, knowing if that is successful or not would be good information.
Looking a bit more into the code, I think the issue might be here:
igv-reports/igv_reports/feature.py
Lines 246 to 247 in a0c558a
@fgvieira Sorry for the delay, could you pull and try again when you have a chance?
You were on the right track, but its a little more subtle. For CRAM format we need to distinguish between input and output formats (CRAM is output as BAM in the report).
Seems to be working now. 👍
thanks!