linsalrob / partie

PARTIE is a program to partition sequence read archive (SRA) metagenomics data into amplicon and shotgun data sets. The user-supplied annotations of the data sets can not be trusted, and so PARTIE allows automatic separation of the data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Repeated entry in tables (SRR360776)

luispedro opened this issue · comments

The run SRR360776 appears twice in the tables:

$ grep SRR360776 SRA_PARTIE_DATA.txt SRA_Metagenome_Types.tsv    
...
SRA_PARTIE_DATA.txt:SRR360776   86.69725743     0       0       0.01    AMPLICON
SRA_PARTIE_DATA.txt:SRR360776   86.6972574342988        0       0       0.0100000000000051      WGS
...
SRA_Metagenome_Types.tsv:SRR360776      AMPLICON
SRA_Metagenome_Types.tsv:SRR360776      WGS

(removed some irrelevant matches)

Most confusingly, as you can see, once it is classified as AMPLICON and another time as WGS

I don't know why this happened, we are trying to figure out why it would have been misclassified as an AMPLICON sequence.

However, we ran the classifier 100 times and it was always WGS, moreover, that sample is a metatranscriptome and should be a WGS.

Fixing.

Files edited and closed.