T2T chrm13 genome mapping with highly unmapped reads AND parameters tuning didn't work
gnilihzeux opened this issue · comments
Dear author,
There are very high ratio unmapped reads for 'too short' and 'other' while mapping to T2T chrm13
genome, but it worked for hg19 genome. BWT, there is a 83% reads mapping to T2T with bowtie2.
Our data is RNA-seq with ribosome fractions.
Our group had modified some parameters related to repeats, including --winAnchorMultimapNmax
higer, --outFilterMultimapNmax
higher, --alignIntronMin 1
. But all tunes didn't work.
What parameters should been set?
Thanks a lot.
The logs are follow:
T2T
Started job on | Jan 25 02:22:54
Started mapping on | Jan 25 02:23:47
Finished on | Jan 25 02:43:58
Mapping speed, Million of reads per hour | 108.31
Number of input reads | 36434753
Average input read length | 283
UNIQUE READS:
Uniquely mapped reads number | 2537986
Uniquely mapped reads % | 6.97%
Average mapped length | 278.61
Number of splices: Total | 1204395
Number of splices: Annotated (sjdb) | 1136229
Number of splices: GT/AG | 1164313
Number of splices: GC/AG | 9637
Number of splices: AT/AC | 1101
Number of splices: Non-canonical | 29344
Mismatch rate per base, % | 0.48%
Deletion rate per base | 0.09%
Deletion average length | 1.85
Insertion rate per base | 0.04%
Insertion average length | 1.37
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 456923
% of reads mapped to multiple loci | 1.25%
Number of reads mapped to too many loci | 190199
% of reads mapped to too many loci | 0.52%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.06%
% of reads unmapped: too short | 48.17%
% of reads unmapped: other | 43.03%
CHIMERIC READS:
Number of chimeric reads | 26935
% of chimeric reads | 0.07%
hg19
Mapping speed, Million of reads per hour | 230.11
Number of input reads | 36434753
Average input read length | 283
UNIQUE READS:
Uniquely mapped reads number | 8063144
Uniquely mapped reads % | 22.13%
Average mapped length | 285.69
Number of splices: Total | 1959408
Number of splices: Annotated (sjdb) | 1133733
Number of splices: GT/AG | 1246660
Number of splices: GC/AG | 36075
Number of splices: AT/AC | 2267
Number of splices: Non-canonical | 674406
Mismatch rate per base, % | 0.39%
Deletion rate per base | 0.12%
Deletion average length | 1.18
Insertion rate per base | 0.02%
Insertion average length | 1.12
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 27457561
% of reads mapped to multiple loci | 75.36%
Number of reads mapped to too many loci | 13369
% of reads mapped to too many loci | 0.04%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.07%
% of reads unmapped: too short | 2.32%
% of reads unmapped: other | 0.09%
CHIMERIC READS:
Number of chimeric reads | 614048
% of chimeric reads | 1.69%
Hi @gnilihzeux
I would recommend exploring the reads that were mapped by bowtie2 and not mapped by STAR.
@alexdobin Yes, I seemed have found what happened to unmapped reads, of which most are palindrome sequence beween Read1 and Read2.
However, I have not found a solution to this problem yet.
Some sequences are listed as follows
>@illumina:8501:1210 mate1
GAGGCATTTGGCTACCTTAAGAGAGTCATAGTTACTCCCGCCGTTTACCCGCGCTTCATTGAATTTCTTCACTTTG
>@illumina:8501:1210 mate2
CAAAGTGAAGAAATTCAATGAAGCGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTAAGGTAGCCAAATGCCTC
>@illumina:36606:2009 mate1
AGCCGTCCCGGAGCCGGTCGCGGCGCACCGCCGCGGTGGAAATGCGCCCGGCGGCGGCCGGTCGCCGGTCGGGGGACGGTCCCCCGCCGACCCCACCCCCGGCCCCGCCCGCCCACCCCCGCACCCGCCGGAGCCCGCCCCCTCCGGGGA
>@illumina:36606:2009 mate2
GGCCGTGTCGGCGGCCCGGCGGATCTTTCCCGCCCCCCGTTCCTCCCGACCCCTCCACCCGCCCTCCCTTCCCCCGCCGCCCCTCCTCCTCCTCCCCGGAGGGGGCGGGCTCCGGCGGGTGCGGGGGTGGGCGGGCGGGGCCGGGGGTGG
>@illumina:36347:2009 mate1
ATCGGCGAGTGCTGCTGCCGGGGGGGCTGTAACACTCGGGGGGGGTTTCGGTCCCGCCGCCGCCGCCGCCGCCGCCACCGCCGCCGCGAGGGGGGGGGAATCA
>@illumina:36347:2009 mate2
TGATTCCCCCCCCCTCGCGGCGGCGGTGGCGGCGGCGGCGGCGGCGGCGGGACCGAAACCCCCCCCGAGTGTTACAGCCCCCCCGGCAGCAGCACTCGCCGAT
>@illumina:49804:3788 mate1
GTAGTTCACCATCTTTCGGGTCCTAACACGTGCGCTCGTGCTCCACCTCCCCGGCGCGGCGGGCGAGACGGGCCGGTGGTGCGCCCTCGGCGGACTGGAGAGGCATCGGGATCCCACCTCGGGAAGCG
>@illumina:49804:3788 mate2
CAAGGAGTCTAACACGTGCGCGAGTCGGGGGCTCGCACGAAAGCCGCCGTGGCGCAATGAAGGTGAAGGCCGGCGCGCTCGCCGGCCGAGGTGGGATCCCGAGGCCTCTCCAGTCCGCCGAGGGCGCACCACCGGCCCGTCTCGCCCGCC
>@illumina:38521:29680 mate1
GTTTCGGTCCCGCCGCCGCCGCCGCCGCCGCCACCGCCGCCGCCGCCGCCGCCCCGACCCGCGCGCCCTCCCGAGGGAGGACGCGGGGCCGGGGGGCGGAGACGGGGGAGGAGGAGGACGGACGGACGGACGGACGGGGCCCCCCGAGCC
>@illumina:38521:29680 mate2
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
>@illumina:46639:29680 mate1
TACTATTCAAAGTTCTTTTCAACTTTCCCTTACGGTACTTGTTGACTCCC
>@illumina:46639:29680 mate2
GGGAGTCAACAAGTACCGTAAGGGAAAGTTGAAAAGAACTTTGAATAGTA
>@illumina:48673:29712 mate1
CCCATTTAAAGTTTGAGAATAGGTTGAGATCGTTTTCGGCCCCAAGACCTCTAATCNTTCGCTTTACCGGATAAAACTGCGTGGCGGGGGTGCGTCGGGTCTGCGAGAGCGCCAGCTATCCTGAGGGAAACTTCGGAGGGAACCAGCTAC
>@illumina:48673:29712 mate2
GAAACTCTGGTGGAGGTCCGTAGCGGTCCTGACGTGCAAATCGGTCGTCCGACCTGGGTATAGGGGCNAAAGACTAATCGAACCATCTAGTAGCTGGTTCCCTCCGAAGTTTCCCTCAGGATAGCTNGCGCTCTCGCAGACCCGACGCAC