OceanGenomics / mudskipper

A tool for projecting genomic alignments to transcriptomic coordinates

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

test_bam2bam fails

rob-p opened this issue · comments

The test test_bam2bam on line 32 of tests/bam_test.rs currently fails. It was failing before because the return type of read_and_process was not right. I fixed that, but this test is still failing. It should fail to find / write 33 records, but has 0. Actually since bam2bam didn't return an integer, it is unclear how this value is to be calculated, so I took a best guess (probably wrong). Can you advise @sjbaker47 and @HosseinAsghari (cc @mohsenzakeri, @haghshenas).

Thanks!
Rob

Based on the comments in test_bam2bam, the purpose of this test is to count the number of alignment records that could not be converted to any transcript locations. Below, I did the calculations on the original and converted bam files and it turns out there are 35 alignment records (not 33) that were not converted to transcript locations which confuses me.

$ mudskipper bulk --gtf NC_002333.2.gtf --alignment NC_002333.2.sam --out NC_002333.2_toTranscriptome.bam

$ samtools view NC_002333.2.sam | cut -f1 | sort | uniq | wc -l
85
$ samtools view NC_002333.2_toTranscriptome.bam | cut -f1 | sort | uniq | wc -l
50

So there are 85-50=35 records that mudskipper failed to convert.

Right now, in the code the return value of the read_and_process function which is basically the return value of the bam2bam function is something entirely different. It returns the number of alignment records that do not have any reads in them which is zero (I'm not sure how it can get to be non-zero).

I can modify the code so that we properly count the number of missing records but I think this number could change in the future versions of mudskipper (and possibly go lower for example if we improve the conversion of the complicated alignments). So, I'm not sure if this would be a proper unit test. What do you think?

Hi @HosseinAsghari,

Thanks! I think this still makes sense as a unit test. Presumably, barring bugs, what projections we see should be a function of the parameters we provide to mudskipper (e.g. fraction softclipping allowed, etc.). Therefore, as long as we control how the unit-test is parameterized, it's probably valid to have to make sure we don't produce any unexpected regressions.

Thanks!
Rob