CDCgov / SARS-CoV-2_Sequencing

A collection of sequencing protocols and bioinformatic resources for SARS-CoV-2 sequencing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regarding oxford nanopore data analysis

ps120195 opened this issue · comments

Regarding oxford nanopore data analysis

Was there a specific issue, or is this more of a philosophical conjecture?

Happy to help. Which pipeline? Attachments were missing.

I ran it thrice, still I am not getting details of vcf which is there in the image 1 ,saying the fasta sequence does not match the REF allele ... and so on

If these are two different systems, you're sure that the perl environment and all dependencies are the same version?

commented

I'm not clear on the difference between screen shots 1 and 2.

In screenshot 1, it looks like it finished correctly. Did you get a reasonable consensus in 'consensus.fasta'?

In screenshot 2, something went wrong. Are you using the same reference fasta that was used for read mapping? Bcftools is very picky about the vcf and the reference to which it applies variants. It may be possible that the reference was getting masked incorrectly, but I can't work out why that would be. I wonder if you could check the samtools depth at position 8782 and potentially let me have a look at your vcf? Interestingly, position 8782 is one where we have observed a lot of variation.

consensus_and_reference
consensus2.fasta is my consensus fasta and MN908947.3.fasta is my reference file which i used in mapping too.Also I am getting that C to T variant at same 8782 location
vcf_location

samtools depth at position 8782 is 1871

Here I ran from start till last,still result is same, please see the screenshot
full_pipeline

commented

Hmm. I'd like to get to the bottom this, but I need a little more info. Can you show me the output of the following:

bcftools view VIC07_ONT.vcf |grep -EC3 "\s8282\s"
bcftools view VIC07_ONT.vcf.masked.vcf.gz |grep -EC3 "\s8282\s"

bcftools_view_EC
It was -EC3 ,,sorry

commented

Those look OK to me. The only other thing I can think of is that there is something funky going on with the reference. Can you try running dos2unix MN908947.3.fasta and then running the script again? If that is the issue, I can make a change to fix this (I will add it in in any case).

Yaa sure ,

I tried dos2Unix command and ran the full script again, Still no change in output.
Cap1
Cap2

I tried now using MN908947.fna instead of MN908947.fasta ,and it worked.
See the output
cap3

commented

Ok, so it looks like you converted the line endings for "MN908947.fasta" and it worked. Using "MN908947.fna" (which is identical except line endings were not converted to unix line endings) trows the error. I think these are all consistent, unless I misunderstand you. I will make the change to take into consideration fasta files with Windows line endings.

Thank you for helping me out. I learnt alot during this error hunt.As the error is resolved ,I want to know if I have to use only file.fna for this pipeline ?

commented

No worries! Glad you caught this, as it's an easy fix but annoying for users.
The filename doesn't matter. As long as the fasta header is the same and (for now) the windows line endings of your reference file are converted to unix line endings.

commented

@dmaccannell I think this can be closed