pycoQC looks for 'read_len' column in sequencing_summary. it isnt there

Question

pycoQC looks for 'read_len' column in sequencing_summary. it isnt there

DrOllyGomez opened this issue 5 years ago · comments

DrOllyGomez commented 5 years ago

Describe the bug
see traceback attached.
traceback.txt

The sequencing_summary.txt file was produced in following way:

multi_to_single_fast5
porplex driving albacore 2.3.4
all that done in parallel, then individual sequencing_summary.txt files collected and simply concatenated.

Output of 'head' command on sequencing_summary.txt attached
headseqsum.txt

Expected behavior
Expected a pycoQC report to be generated by the attached call.
call.txt

Desktop (please complete the following information):

OS: Ubuntu 18.04

Additional context
Add any other context about the problem here.

Adrien Leger · Answer 1 · Tue Oct 08 2019 17:38:54 GMT+0800 (China Standard Time)

Hi @DrOllyGomez,

It seems that the summary file is generated by Poreplex itself, I haven't come across it so far.
The format is a little different from the ONT. I pushed a compatibility fix on the dev Branch.
Would you be able to give it a try with your full file?
=> pip install git+https://github.com/a-slide/pycoQC.git@dev --upgrade
And check that you have upgraded to version 2.5.0.11
Thanks

Adrien Leger · Answer 2 · Tue Oct 08 2019 17:40:23 GMT+0800 (China Standard Time)

And you don't have to concatenate the files yourself. pycoQC also works with regular expressions to match all the files

Adrien Leger · Answer 3 · Tue Oct 08 2019 17:42:32 GMT+0800 (China Standard Time)

I forgot to say thanks a lot for the detailed error reporting. That's probably the best I had so far :D

DrOllyGomez · Answer 4 · Mon Oct 21 2019 17:46:42 GMT+0800 (China Standard Time)

Hi, progress I believe, but not quite there yet:

Here ..
upgrade.txt
.. is the upgrade commentary from pip, showing dependencies, versions etc.

and here...
Traceback2.txt
... is the output, with apparently successful parsing, but with later problem.....

any help very gratefully received.
Mike

Adrien Leger · Answer 5 · Tue Oct 22 2019 16:44:34 GMT+0800 (China Standard Time)

I believe there must be non-numeric entries in your file in the sequence_length column.
Could you please confirm that or send me the entire file you used so I can replicate the issue ?

DrOllyGomez · Answer 6 · Tue Oct 22 2019 17:29:16 GMT+0800 (China Standard Time)

Yes, I can confirm there are non-numerics: there are multiple instances of the header line...... eg
'filename\tread_id\trun_id\tchannel\tstart_time\tduration\tnum_events....etc
...... from me concatenating the original, parallelised! Arrgh! :)

I will write a script to excise these (leaving in the very first) and I imagine pycoQC will work fine on them... will report back....
Thanks and apologies
Mike

Adrien Leger · Answer 7 · Tue Oct 22 2019 17:59:24 GMT+0800 (China Standard Time)

You don't have to. PycoQC can take multiple summary files as input.
Then it merges the file data without the header. :D

DrOllyGomez · Answer 8 · Tue Oct 22 2019 17:59:29 GMT+0800 (China Standard Time)

Yep: that's nailed it: and it looks beautiful! Many thanks!!

DrOllyGomez · Answer 9 · Tue Oct 22 2019 18:00:42 GMT+0800 (China Standard Time)

paths crossed there! will try the method you suggest too.... :)

Adrien Leger · Answer 10 · Tue Oct 22 2019 18:01:16 GMT+0800 (China Standard Time)

From the documentation
Path to a sequencing_summary generated by Albacore 1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+ (guppy_basecaller). One can also pass multiple space separated file paths or a UNIX style regex matching multiple files (Required)

Adrien Leger · Answer 11 · Tue Oct 22 2019 18:07:00 GMT+0800 (China Standard Time)

I will close the issue then.
thanks

thierryjanssens · Answer 12 · Wed Jan 26 2022 19:31:04 GMT+0800 (China Standard Time)

Hi,

I also ran into the same issue, but the separate sequence_summary.txt files came as is from the run. Still they generate the same error.

pycoQC v2.5.1.dev6

I ran it from a conda environment.

pycoQC --summary_file sequencing_summary_FAR75694_601e48d8.txt sequencing_summary_FAR75694_aa9a0e51.txt sequencing_summary_FAR75694_c645dc34.txt --barcode_file barcode/barcoding_summary.txt --html_outfile pycoQC_FlevoRUN2.html

I have attached the smaller of the three files.
sequencing_summary_FAR75694_c645dc34.txt

thierryjanssens · Answer 13 · Thu Jan 27 2022 02:29:19 GMT+0800 (China Standard Time)

Hi,

I realized that I had to use the sequencing_summary.txt after basecalling.
Not the file that is generated during the run...

Now it works.

Kind regards,

T.