nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement

Home Page:https://clades.nextstrain.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PCR primer mutation functionality[v3]

sureman opened this issue · comments

This is a request to restore the --input-pcr-primers functionality in CLI v3 as we do use this in our pipelines.

Thanks for reaching out with this feedback! The functionality is still present, what's changed is how to pass the config (in v3 as JSON as part of --input-pathogen-json instead of as a csv with its own cli argument). Could you let us know how you're using the CLI flag and the feature?

We weren't aware of anyone using this feature with custom primers, if you could give a bit more context I'm sure we can figure out a way. It's likely already possible with v3.0.0 what you'd want.

Good reminder for me complete that part of the v3 documentation.

Thank you for responding so quickly. We were using the --input-pcr-primers parameter to input a VarSkip primer tsv such as:

Country (Institute) Target Oligonucleotide Sequence
varskip_neb_vss2b nCoV-2019_1 varskip-0317-1_1_LEFT GGTAACAAACCAACCAACTTTCGA

Given this file, NextClade would return the primer, mutation and genome position within the pcrPrimerChanges field of the report if detected.

Thanks for the quick reply @sureman, that's useful! Good to see the feature used for sequencing primers!

To reassure you, the PCR primer feature hasn't been removed, just the way the config file is passed has. I will document the how-to. For now, if the feature is important to your use case, there's nothing wrong with continuing to use v2 for the moment. There's no degradation of v2 in any way, datasets still work the same etc (though there likely won't be new datasets, but for SARS-CoV-2 this will only happen in 4-6 weeks anyways), so we have a bit of time.

The current v3.0.0 way of doing things would involve (if I understand your usage correctly):

  1. Downloading the dataset you would like to use, e.g. sars-cov-2, putting it into a folder
  2. Patching the pathogen.json file by adding your primer details at the appropriate path (e.g. a short Python script that takes a csv or bed file and the downloaded pathogen json and outputs the patched/customized pathogen json
  3. Run nextclade v3 with the downloaded dataset except for the fact that you overwrite the --input-pathogen-json with the patched one

Don't worry the full docs are yet to come, but this is a quick summary of the steps involved to get it to work already with v3 as is.

Just so I have the full context, could you maybe tell me a bit more about:

  • which dataset you usually use
  • what other CLI arguments you use

Thanks @corneliusroemer! That all sounds good. We are pulling the sars-cov-2 dataset with the following arguments:

Previous arguments: nextclade run --verbose --jobs !{task.cpus} --input-pcr-primers !{primers} --input-tree !{tree} --input-qc-config !{qc} --output-tsv bcftools_nextclade/!{sample}_nextclade_report.tsv --output-errors bcftools_nextclade/!{sample}_nextclade_report.errors --output-insertions bcftools_nextclade/!{sample}_nextclade_report.insertions --input-root-seq !{reference} --input-gene-map !{genemap} --input-virus-properties !{virus_properties} !{fasta}

Updated arguments for v3 (without primers): nextclade run --verbose --jobs !{task.cpus} --input-tree !{tree} --output-tsv nextclade/!{sample}_nextclade_report.tsv --input-ref !{reference} --input-annotation !{genemap} --input-pathogen-json !{virus_properties} !{fasta}

Thanks a lot @sureman for sharing how you use Nextclade, that makes it easier to see what might work and what wouldn't.

Based on your current command, you seem to download a dataset before running and then pass each dataset file individually. Do you do that to have better control over each input file or because you were not aware that there's the option to pass a whole directory via -D, --input-dataset <INPUT_DATASET>: Path to a directory or a zip file containing a dataset? Individual files of the dataset specified via their own CLI args take precedence over what's passed through -D so this could make the invocation more ergonomics by changing your invocations to:

# v2 using -D
nextclade2 run --verbose --jobs !{task.cpus} --input-dataset !{dataset_directory} --input-pcr-primers !{primers} --output-tsv bcftools_nextclade/!{sample}_nextclade_report.tsv --output-errors bcftools_nextclade/!{sample}_nextclade_report.errors --output-insertions bcftools_nextclade/!{sample}_nextclade_report.insertions

# v3 equivalent
nextclade run --verbose --jobs !{task.cpus} --input-dataset !{dataset_directory} --input-pathogen-json !{patched_pathogen_json} --output-tsv bcftools_nextclade/!{sample}_nextclade_report.tsv

Where the patched pathogen json would be the result of the original pathogen json + your custom primer config.

I'll write down the v3 primer config docs soon and let you know once they are ready.

If you have any other feedback or feature requests, feel free to open more issues, it's always appreciated to hear from real-world CLI users. In contrast to the GUI/web where we can get a rough idea of usage via privacy-preserving web analytics we don't know at all how the CLI is used in the wild (beyond what a github code search reveals).

Thanks @corneliusroemer. We are aware of the --input-dataset parameter but as you thought, we do pass each dataset file individually to have better control. Patching the pathogen json file should work well for us and we look forward to the documentation.

Thank you for your quick response and solution and for all of the great work that you have done for the community.

@sureman Here's my first attempt at documenting how to add primers to the v3 pathogen json: https://github.com/nextstrain/nextclade/pull/1389/files I hope this is helpful already. In case of questions, feel free to comment on the PR or even inline in the code.

The above works up to 3.0.1, but we've removed it in later versions so ignore if you come across this

Thanks @corneliusroemer! We'll work on implementing and testing on our side. Your efforts are very much appreciated!

@sureman We decided to return the CSV version soon. Also, the proposed documentation is incorrect. I'll notify here when the CSV version is back.

The functionality has been re-added in the pull request #1398. We are currently testing how well it works. You can help by downloadin CLI binaries here, testing it by making a few runs with your custom PCR primers and reporting back.

@sureman Please ignore my above comments re adding primers to the pathogen.json - the now merged PR @ivan-aksamentov mentioned has removed that functionality. So the way forward is to simply pass primers like you did in v2, with --input-pcr-primers /path/to/primers.csv

Thanks @ivan-aksamentov and @corneliusroemer for all of your efforts. We will test as soon as possible.

This was released in Nextclade CLI 3.1.0. Please comment or open another issue if there's remaining problems.