AdmiralenOla / Scoary

Pan-genome wide association studies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Don't enforce "Non-unique gene name" and "Annotation" columns

AdmiralenOla opened this issue · comments

Remove enforcing of the columns "Non-unique gene name" and "Annotation" in the output. Some users might have input file with only a single identifier column (Gene ID) before sample info starts, and wants to run with -s 2.

In the current version, this will cause Scoary to fill in the "Non-unique Gene name" and "Annotation" columns with sample data. (Because it automatically assumes that this info can be found in columns 2 and 3). There is really no need to enforce any other columns than Gene ID.

Actually, I was just about to suggest an alternative, allowing the user to specify column numbers to be included in the output (so I can see the gene numbers of specific strains in the dataset in the Scoary output).

I am now modifying the "Non-unique gene name" column for this and then split that one out.

Hi! Trying to wrap my head around this, but I don't quite see how it would work. I think I'm confused by "gene numbers of specific strains in the dataset". Do you mean grabbing columns from the input Roary file or producing some kind of aggregate column? Would you mind giving an example?

OK, I think I understand what you mean now. Sure, I can implement that, should be fairly easy! I will schedule it for the next release.

Hi @dutchscientist. This functionality is included in the latest version. Hope you like it!

Yes, this is great! Exactly what I wanted, the --include_input_columns is just what I needed. Thanks very much!