BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sample_sheets - accept experiments without a fastq file referenced

smoe opened this issue · comments

For some experiments in the sample sheet I am using there is no FASTQ file available - the focus for these samples is on other phenotypes. Still, I would like to avoid any additional processing but reuse the file that other groups are using. This saves some editing and (even though this is not used today) allows to learn about the distribution of phenotypes.

Would you accept a patch that makes pigx-rnaseq accept empty file names?

Hi @smoe,
I am not so sure about this one. The users might actually just forget to put samples without file names, where there are supposed to be file names. So, if you think it can be easily done that can distinguish intentionally left empty file names from the actually missing files, and without any necessary structural changes in the whole pipeline, it should be okay, I think.

I think the user needs to make sure that they provide one (or pair) of file names for each sample and this should be explicit/intentional. Also, the sample sheet processing shouldn't be very complicated in this case, it is just excluding some lines without file names. So, I am guessing this would create some more errors for the users.

If you think the use case you describe outweighs the risks of additional errors, we can consider this of course.

I agree. How about allowing for "not available" or preferably "N/A" to allow being explicit that a sample is not sequenced? It should just be anything that the wet-lab site would accept as an edit such that the document can be shared between the dry and the wet side.

Okay, that sounds reasonable then. As long as the user can't leave this empty unintentionally, it should be fine I think.