jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for output to BAM

plijnzaad opened this issue · comments

I just fixed the File Not Found (issue #86) a bit (see my pull request), but whereas it seems to now be able to read sam and bam files, it can only write sam, not bam:

/Users/philip/tmp/atropos.fix/bin/atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC -se grep_adapt.bam -o trimmed4.bam --report-file summary5.txt --input-format bam --output-format bam

With atropos 2.0.0a5.dev1+g3aa3791 , Python 3.7.7, pysam version '0.15.4' , Cython version 0.29.16 ), this yields UnknownFileTypeError: File format <SequenceFileType.BAM: ({'.bam'}, False)> is unknown (expected 'fasta' or 'fastq').

For details see this attachment

Thanks! I'll try to work on it this week. Please submit a PR if you figure out the solution.

just a comment for those of you (like myself) who would try to use some FIFO magic for this: solving it as something like

atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC -se TM433_trunc.bam --input-format bam -o >(samtools view -b > trimmed.bam) --output-format sam --report-file summary3.txt
does not work (crashes with message like Path /dev/fd/63 is not writable ... looks like Python or pysam makes to many assumptions here ... )

Also crashes when using -o - or when using -o /dev/stdout .

Hi @plijnzaad - writing to BAM is not currently supported, and I'm still debating whether or not to add it. For now you can workaround this by writing SAM to stdout and piping it to samtools -Sb.

Regarding the FIFO issue - can you please try again after installing bamnostic? atropos will use bamnostic first if it's installed and it should avoid most of the issues that exist with pysam. Thanks

Hi, I installed bamnostic (version 1.1.4) and it finds it etc. but it still doesn't work. With bamnostic, atropos again crashes on input bam files that contain @CO header lines (says Malformed BGZF block, see
e-bamnostic-baminput-with-CO.txt. Using a bam file without a @CO header lline and the following incantation:

atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC --input-format bam -se $bam --output-format sam -o >(samtools view -b -o testout.bam )

it again crashes with Path /dev/fd/63 is not writable (see
e-bamnostic-bamoutput.txt ).

Using -o - or -o /dev/stdout instead leads to ValueError: Invalid path: /dev/stdout

This appears to be a problem with your bamfile. When I converted it to SAM and then back to BAM it worked fine (samtools view TM249_trunc2.bam | samtools view -Sb > new.bam).

To get output to stdout, you just need to not specify the -o option. You should be able to specify -o /dev/stdout or -o - but it appears that is not working - I will fix that. But the following command works as expected:

atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC --input-format bam -se new.bam --output-format sam | samtools view -Sb -o testout.bam.

Specifying stdout/stderr is now fixed in develop.

Brililant, seems to work fine, many thanks. I realized that I overlooked the fact that leaving out the-ooption already resulted in output to stdout - sorry!

Still puzzled about the bam-formatting error that trips bamnostic up, I'll have another look.

(PS: your example conversion gets rid of all header lines, so doesn't say much)

Weirdly, converting (to SAM) the original bam and the reformatted (bam->sam->bam) versions are identical. I also did a strict check with ValidateSamFile from picardtools-2.21.1, both bams give identical (and harmless) warnings. My conclusion is that bamnostic is broken. Is there a way to make atropos prefer pysam over bamnostic (other than uninstalling bamnostic) ?

The error is "Malformed BGZF block" so it's not a difference in the contents but with the compression of the data.

I will add a new issue to enable the choice of BAM reading library to be configurable. For the time being, the solution is just to pip uninstall bamnostic.

Brilliant, thanks!