Parsoa / SVDSS

Improved structural variant discovery in accurate long reads using sample-specific strings (SFS)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to prepare a BAM file

kuangzhuoran opened this issue · comments

"It requires as input the BAM file of the sample to be genotyped.”
In this step: "SVDSS smooth --bam sample.bam --workdir $PWD --reference GRCh38.fa --threads 16",
My understanding is that HiFi reads (CCS data) were used to map to the reference genome and get this bam file.

If I have genomes and HiFi data (CCS data) for multiple species and need to make inter- and intra-species comparisons, do all HiFi data map to the same genome, or do they map to themselves?

Thanks a lot !

Hi,

To do smoothing, you need to map your input CCS reads to the reference genome and then run SVDSS smooth on the resulting BAM file.

What sort of comparison are you trying to perform? SVDSS is not directly meant for comparative analysis. You can however genotype each of your samples individually and then compare the variants.

If you have several samples of the same species, one option is to map all of your samples to the same reference genome and genotype them with SVSDSS against that reference and then compare the genotypes using other tools for analysis.

You may find our earlier method PingPong useful for comparative analysis. SVDSS is based on PingPong.

In this step: "SVDSS smooth --bam sample.bam --workdir $PWD --reference GRCh38.fa --threads 16"
smoothed_reads.txt and ignored_reads.txt in workdir is empty
[I] Processed batch 1. Reads so far 20000. Reads per second: 20000. Time: 1
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0
[I] Processed batch 2. Reads so far 30000. Reads per second: 3750. Time: 8
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0

mmm that [I] Processed bases: 1, is quite strange.. That number should be the total number of bases processed if I'm not wrong (and it's initialized at 1).. It seems like all read have been filtered(smoothed_reads.txt should be non-empty).

Some reasons why this could happen:

  • there is no primary alignment (but I don't think this is the case)
  • the .bam is corrupted (as above)
  • every read is aligned to a chromosome not present in the input reference

    SVDSS/smoother.cpp

    Lines 265 to 268 in dc0333e

    string chrom(bam_header->target_name[alignment->core.tid]);
    if (chromosome_seqs.find(chrom) == chromosome_seqs.end()) {
    continue;
    }
  • alignments are too dirty (and then are skipped)

How did you map the reads? In case, would it be possible for you to share the .bam?

Best,