No repeat were detected

HLHsieh opened this issue · comments

Hi there,

After your suggestions, I can run LongTR on my data as follows:

LongTR --bams C9ORF72_1_10R.sorted.bam --fasta $dir/genome.fa --regions $dir/test.bed --tr-vcf $dir/test.vcf.gz --phased-bam --bam-samps C9ORF72_1 --bam-libs C9ORF72_1

Message showed:

Using phased BAM tags to genotype and phase TRs (WARNING: Any arguments provided to --snp-vcf will be ignored)
Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed
Region file contains 1 regions

Processing region chr9 27573528 27573546
111 reads overlapped region, of which
	0 were hard clipped
	0 had an 'N' base call
	0 had low MAPQ
	111 had low base quality scores
	0 did not span the STR
	0 did not have a unique mapping
Phased SNPs add info for 0 out of 0 reads
Skipping locus with too few reads: TOTAL=0, MIN=10

------HipSTR Execution Summary------
Skipped 1 loci with too few reads for stutter model model training or genotyping.
	 If this is a sizeable portion of your loci, see the --min-reads command line option
Genotyping succeeded for 0/0 loci

Approximate timing breakdown
 BAM seek time       = 0.007054 seconds
 Read filtering      = 0.065989 seconds
 SNP info extraction = 2.6e-05 seconds
 Genotyping          = 0 seconds
	 Trimming alignment        = 0 seconds
	 Haplotype generation  = 0 seconds
	 Haplotype alignment   = 0 seconds
LongTR execution finished: Total runtime = 3.21404 sec

Therefore, no repeats were detected. This data should contain some repeat regions. I am wondering if there is something wrong with the command or if you have any suggestions on how I can adjust it accordingly.

Many thanks,

Hi Hsin,

It seems that your locus is skipped because all overlapping reads had low quality, you can decrease the threshold by specifying a value for --min-mean-qual, the default value is 30.


Hi Helia,

Thank you for your quick response. I was trying the following command, but the same error message returned.

LongTR --bams C9ORF72_1_10R.sorted.bam --fasta $dir/genome.fa --regions $dir/test.bed --tr-vcf $dir/test.vcf.gz --phased-bam --bam-samps C9ORF72_1 --bam-libs C9ORF72_1 --min-mean-qual 0 --min-reads 1
Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed
Region file contains 1 regions

Processing region chr9 27573528 27573546
111 reads overlapped region, of which
	0 were hard clipped
	0 had an 'N' base call
	0 had low MAPQ
	111 had low base quality scores
	0 did not span the STR
	0 did not have a unique mapping
Phased SNPs add info for 0 out of 0 reads and 0 out of 0 samples
Skipping locus with too few reads: TOTAL=0, MIN=1

------HipSTR Execution Summary------
Skipped 1 loci with too few reads for stutter model model training or genotyping.
	 If this is a sizeable portion of your loci, see the --min-reads command line option
Genotyping succeeded for 0/0 loci

Approximate timing breakdown
 BAM seek time       = 0.006804 seconds
 Read filtering      = 0.065036 seconds
 SNP info extraction = 4.3e-05 seconds
 Genotyping          = 0 seconds
	 Trimming alignment        = 0 seconds
	 Haplotype generation  = 0 seconds
	 Haplotype alignment   = 0 seconds
LongTR execution finished: Total runtime = 3.1865 sec

Any suggestions would be appreciated.

Many thanks,

I have had to specify "-1" for that parameter to avoid removing all (nanopore) reads.


Thanks. Specified "-1" for that parameter can include all reads, but no repeats were detected in the output file.


Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed
Region file contains 1 regions

Processing region chr9 27573528 27573546
111 reads overlapped region, of which
        0 were hard clipped
        0 had an 'N' base call
        0 had low MAPQ
        0 had low base quality scores
        0 did not span the STR
        0 did not have a unique mapping
Phased SNPs add info for 0 out of 111 reads and 0 out of 1 samples
Trimming reads
Failed to trim align 6 out of 111 reads
Generating candidate haplotypes


Added 2 inexact haplotypes generated by POA
Aborting genotyping of the locus as the sequence upstream of the repeat is too repetitive for accurate genotyping
Locus timing:
 BAM seek time       = 0.0069 seconds
 Read filtering      = 0.085952 seconds
 SNP info extraction = 0.005669 seconds
 Stutter estimation  = 4e-06 seconds
 Genotyping          = 0.067338 seconds
        Trim alignment        = 0.057342 seconds
         Haplotype generation  = 0.009429 seconds
         Haplotype alignment   = 0 seconds

------HipSTR Execution Summary------
Genotyping succeeded for 0/1 loci

Approximate timing breakdown
 BAM seek time       = 0.0069 seconds
 Read filtering      = 0.085952 seconds
 SNP info extraction = 0.005669 seconds
 Genotyping          = 0.067338 seconds
         Trimming alignment        = 0.057342 seconds
         Haplotype generation  = 0.009429 seconds
         Haplotype alignment   = 0 seconds
LongTR execution finished: Total runtime = 3.27426 sec

VCF file

##command=LongTR-638942f-dirty --bams C9ORF72_1_10R_NanoSim_50x.sorted.bam --fasta /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/genome.fa --regions /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.bed --tr-vcf /scratch/kinfai_root/kinfai0/hsinlun/C9ORF72_1_9R_NanoSim_2x/test.vcf.gz --bam-samps C9ORF72_1 --bam-libs C9ORF72_1 --min-mean-qual -1 --min-reads 1
My test bed file:

chr9    27573529        27573546        6       3       C9orf72

Any suggestions would be appreciated.

Many thanks,

Hi Hsin,

You are getting this error because LongTR is trying to assemble the flanking sequence, which is unnecessary with long reads, we recommend adding --skip-assembly to the command when using long reads to avoid this step.
