biogo / hts

biogo high throughput sequencing repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bam reader error on date format (different format)

ns3116 opened this issue · comments

Ran into a similar error running smoove > goleft > biogo

panic: parsing time "2018-08-02 114421" as "2006-01-02T150405": cannot parse " 114421" as "T": line 8: @RG\tID:H333CDSXX.1\tLB:EXPT_ID:180351;PREP_ID:182579\tPL:ILLUMINA\tPU:H333CDSXX.1\tSM:sample064\tCN:IGM\tDT:2018-08-02 11:44:21"

It is a novaSeq, dragen aligned bam

Can you clarify whether the read group line is "\t" (real tabs) or '\t' (two character tab replacement). If it's the latter, this is a bug in the aligner or whatever wrote the RG line.

They are real tabs, just show up in the error that way. From actual BAM header

@RG ID:H333CDSXX.1 LB:EXPT_ID:180351;PREP_ID:182579 PL:ILLUMINA PU:H333CDSXX.1 SM:sample064 CN:IGM DT:2018-08-02 11:44:21

(They are getting turned to spaces in the copy paste, but they are tabs)

The spec says, "Date the run was produced (ISO8601 date or date/time)." The date/time there is not ISO8601 as there is a space separator rather than T. So this is WAI and the bug is still in what wrote the file. Fixing it here leads to this...

ISO8601
ISO8601

To be clear, because of the extraordinarily loose definition of ISO8601, we already do a lot of work to try to parse time.

I suppose that I could add user-settable time formats to try if it's not zero, but the list of accepted time formats won't grow unless there are files that have the format and the format is ISO8601.

Figures it's not to spec. The RG lines are generated by dragen. I will bring it up with Edico. Thanks

Looks like this should be closed.