legacy_report stops as soon as an adapter sequence is not trimmed
cokelaer opened this issue · comments
In principle, when we use a list of adapters, we get a report that looks like (if txt format is required):
--------
Trimming
--------
Pairs records fraction
----------------------------------- ------------ ------------
Total read pairs processed: 864,879
Read 1 with adapter: 106,932 12.4%
Read 2 with adapter: 694 0.1%
Pairs that were too short: 130,547 15.1%
Pairs written (passing filters): 734,332 84.9%
Base pairs bp fraction
----------------------------------- ------------ ------------
Total bp processed: 174,705,558
Read 1: 87,352,779
Read 2: 87,352,779
End Ns trimmed 251 0.0%
Read 1: 219 0.0%
Read 2: 32 0.0%
Quality-trimmed 17,022,625 9.7%
Read 1: 6,813,448 7.8%
Read 2: 10,209,177 11.7%
Total bp written (filtered): 143,928,089 82.4%
Read 1: 72,399,685 82.9%
Read 2: 71,528,404 81.9%
followed by a section that looks like this for each adapters:
----------------------------------------------------
First read: Adapter Universal_Adapter|name:universal
----------------------------------------------------
Sequence Type Length Trimmed (x)
---------------------------------------------------------- -------------- ------ -----------
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT variable 5'/3' 58 550
147 times, it overlapped the 5' end of a read
403 times, it overlapped the 3' end or was within the read
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-39 bp: 3; 40-49 bp: 4; 50-58 bp: 5
Overview of removed sequences (5'):
length count expect max.err error counts
0 1 2
------ ----- ------- ------- ------------
6 68 1,266.9 0 68
7 31 316.7 0 31
10 14 39.6 1 1 13
Now, in this case, the number of trimmed adapters is 550. If it was 0, the the whole report process is stopped. I believe this is a bug that occurs on line 582:
if adapter["total"] == 0:
return
that should be
adapter["total"] == 0:
continue
I have also found a potential bug in that file that is coming from line 721
def _print_base_histogram(title, hist, extra_width=4, index_name='Pos'):
When hist is None it throws an error and stops the report. Adding
if hist is None: _print("No Data") return
Allows the report to complete. I am not sure if this is related to the bug above or not.
Thanks to both of you! These fixes will go out in v1.1.16 today.