When i use bwa for mapping with grch37.p13.fa and hg19.fa,there exists some differences in some regions.
zhangshouwei309194 opened this issue · comments
Dear author:
When i use bwa for mapping with grch37.p13.fa and hg19.fa,there exists some differences in some regions.
grch37.p13.fa: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/
hg19.fa: https://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/
Next it is an example, i use the same command for two types of genome. For a SNP in chr1:206647742, the results is right for hg19. but it is not right for grch37.p13.fa.
hg19:
samtools mpileup -d 200000 -q 0 -r chr1:206647742-206647743 -f hg19.fa test1.markdup.bam
[mpileup] 1 samples in 1 input files
chr1 206647742 A 1316 G$G$G$G$G$GGGGGgggggGGGggggGGggGGGgggGGGGgggggggggggGGggggggGGGGGGGggGGgggGGGGGGGGGGGgggGGGGGGGGGGGGGGGGGggggGGGGGggggggGGGGGGGggggGGGGGgggGGGggggggggGgGgggGGGGGGGgggggggGGggggggGGGGgGGGGGGGGGGGGGGGGgGGGGGGGgGGggGGGgggGgggGGGGGGGGGGGGGGGGGGGGGGGgggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGgGGGGGGGGGGGGGGGGGgGGGGGGggggGGGGGGGGGGgGGGGGGGGGGgggggggggGGGGGggggggggGGGGGGGGGGGGGGGggggGGGGGGGgggGGGGGGGGGGGGGGGgggGGGGGGGGGGGgGGGGGGGggGGGgggggGGGGGGGGGGGGGGggggGGGGGGGggGgggggggGGGgggggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGggGGGGGggGGGGGGGGGGgGGGGGGGGGGgggGGGGGGGGGGGGggggGGGgggGGGGGGGGGGGGGGgGGGGGggggGGGGGGGGggggGGGGGggGGGGGGGGGGGGGGGgggGGGGGGGGGGggggggggGGGGGGGGGGggGGGggggGGGGGgGGGGGGGGgggGgggggggGGggggggGGGGGGGGgGGGGGGGGGGgggGgggggGgGgGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGgggGGGGGGGGGGGGGGggggGGgggGGGGGGGGGgGGGGGGGGGGGGGGGGGGGGGggGGGGGggGGGGGggGGGGggGGggggggGGGGGGGGGGGgggGGGGGGGGGGGGGGGGGgGGGGGGgGGGGGGGgGGGGGGGGGGGGGGgGGGGGGGGGgggggggGGGGGGGGGGGGGggGGGGGGGGGGgggggGGGGGGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGgGGGGGGGGGGGGGGGggGGGgGGGGGGGGGggggGGGGGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGgGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGgGGGGggGGGgGGGGGgGGGGGGGGGGGGGGGGGGGGGGgGGGGGgggGGGGGGGGGGGGggGggggggGGGgggGGGGGGGGGGGGgGGGGGGGGGGGGGGGGGggGGGgg FFFFFOOIO_:::::FFkFFFFkFFFkkFFFFkkkkFFFFF:FFFFFFFFFFFFFFkkkkFkFFkkFFFkkkkkkkFkkFFFFkkFFFFkFFk^FkFkFFFFFFkkkkFFFFFFFkkkFkkkFFFFkkkkkFF:kFkFFFFFFFFkFkFFFkFkF^FFFFFFFFFFFFFFFFFkkkkFFkkkkkFkFkkkFkkkFSkkkFkkFkkFFFkFFFFkFFFFkkkkkFkkkkkFFkkk^kkkkFFFFkkkkkFkkkkkFkkSkkkkkFkkkFkkSkFkkkkkFkkFkkkFkFkFkFkkkkFkkkkkkFFFFkkkkFkkkSkFkFkkFkkFFFFFFFFFFFFkkFkFFFFFFFFFFkSFFkk^kkFFkFkFFFFkkkFFFkFFFkkFFkkkkkkFkkkkFFF3kkkkkFFkF_FkkkkkkkFFkkkFFFFFFkkkkkkkFkFFkFFFFFkkkFkFkFFFFFFFFFFkkkFFFFFFFkkFFkkkkFkkFkkFkFkkkFkkFFFkFFkkkkkFFFFkFkFkkkkkFFkkFFkkFkFFFFkFkFFkkkkFkkkkkkkkFkFFFkkkFFkFFkFFkFFFFkkkFFFFFkkFkkkkkkkkkFFFFFkFFFFkkFkkkkFFFFFkFFkFFFkkFFkkFkkkkkkF:FFFkkFkkFkFkkFFFFFFFFkkFFFkkFFkFFkkFFFFFkFkkkFkkkFkkFkFFFFFFFFFFFkkFFFFFFkkkkFkFkFkFkFFFFFkkFFFkFFFFFkFFFFkFFkkkFkkFFFFFFFFFFkFFFkkFkFkkkFFFkFkkFFFkk_kFFFFkFFFFkFkkkkkkkFFFFFFFFFkFFFkFkkFFkkkFkFkkFkFFFkFkkFkFFFkFFkFFFFFFkFFFFkkFFkFkFFFFFFkkkkFFFkFkFkkkkkkFFFFFkF:FFkkFkFFFFFkFFFFFFFFFFFkFkkFFFkkFFFFkkFkFkkFFkFFFFkkFFkkkFkFFFFkFFkkFkkkkkkkkFFkFkkFFkFFkFkFFFFFFFFFkFkkFFFkkFkkFFFkFFFFFkkkFFFFFFkFFFkkFkFkFFkkkFkFFkFFFFFFFFFFkFkFFkFkFFFFFFFkFkkFFkkkFFFFkFFkkkkFkFkFFFFFFFkFFFFFFFFkFkFFFFFkFFFkkFFFkFkkkkkFkFFFFFkFFFFFFFkFFFFFFFFkFFkkkFFkkFFFFFkFFFFFFFFFkkFFFFkFkFFFkFFFFFFFkFF:FkFFFFFFFkFkFFFFFFFFFFFFkkFFkFkFFFkFFkFFFFFFFkFFFkFFFkFFkFFFFFFkFFFFFFFFkFkF:FFkkFk^FFFFkFFkFFFkFkFFFQ9Q99
chr1 206647743 G 1322 .$.$.$.$.,,,,,...,,,,..,,...,,,....,,,,,,,,,,,..,,,,,,.......,,..,,,...........,,,.................,,,,.....,,,,,,.......,,,,.....,,,...,,,,,,,,.,.,,,.......,,,,,,,..,,,,,,....,................,.......,..,,,...,,,.,,,.......................,,,...................................,,.................,......,,,,..........,..........,,,,,,,,,.....,,,,,,,,...............,,,,.......,,,...............,,,.$..........,.......,,...,,,,,..............,,,,.......,,.,,,,,,,,...,,,,,.............................,,......................,,.....,,..........,..........,,,............,,,,...,,,..............,.....,,,,........,,,,.....,,...............,,,..........,,,,,,,..........,,...,,,,.....,........,,,.,,,,,,,..,,,,,,........,..........,,,.,,,,,.,.,.................................,,...........,,,....................,,......................,,,..............,,,,,..,,,.........,.....................,,.....,,.....,,....,,..,,,,,,...........,,,.................,......,.......,..............,.........,,,,,,,.............,,..........,,,,,.........................,,............,...............,,...,.........,,,,................,,.............................,...........,,........................................,....,,,...,.....,......................,.....,,,............,,.,,,,,,...,,,............,.................,,...,,^].^].^].^].^].^].^].^], OOIO_:::::FFkFFFFkFFFkkFFFFkkkkFFFFFFFFFFFFFFFFFFFFkkkkFkFFkkFFFkkkkkkkFkkFFFFkkFFFFkFFk^FkFkF:FFFFkkkkFFFFFFFkkkFkkkFFFFkkkkkFFFkFkFFFFFFFFkFkFFFkFkF^FFFFFFFFFFFFFFFFFkkkkFFkkkkkFkFkkkFkkkFSkkkFkkFkkFFFFkFFFFkFFFFkkkkkFkkkkkFFkkk^kkkkFFFFkkkkkFkkkkkFkkSkkkkkFkkkFkkSkFkkkk_FFkkFkkkFkFkFkFkkkkFkkkkkkFFFFkkkkFkkkSkFkFkkFkkFFFFFFFFFFFFkkFkFFFFFFFFFFkSFFkk^kkFFkFkFFFFkkkFFFkFFFkkFFkkkkkkFkkkkFFF3kkkkkFFkFkFkkkkkkkFFkQkFFFFFFkkkkkkkFkFFkFFFFFkkkFkFkFFFFFFFFFFFkkkFFFFFFFkkFFkkkkFkkJkkFkFkkkFkkFFFkFFkkkkkFFFFkFkFkkkkkFFkkFFkkFkFFFJkFkFFkkkkFkkkkkk_kFkFFFkkkFFkFFkFFkFFFFkkkFFFFjkkFkkkkkkkkkFFFFFkFFFFkkFkkkkFFFFFkFFkFFFkkFFkkFkkkkkkFFFFFkkFkkFkFkkFFFFFFFkkFFFkkFFkFFkkFFFFFk:kkkFkkkFkkFkFFFFFFFFFFFkkFFFFFFkkkkFkFkFkFkFFFFFkkFFFkFFFFFkFFFFkFFkkkFkkFFFFFFFFFFkFFFkkFkFkkkFFFkFkkFFFkkkFF>FFkFFFFkFkkkkkkkFFFFFFFFFkFFFkFkkFFkkkFkFkkFkFFFkFkkFkFFFkFFkFFFFFFFk:FFFk_FFkFkFFFFFFkkkkFFFkFkFkkkkkkFFFFFkFFFFkkFkFFFFFkFFFFFFFFFFFkFkkFFFkkFFFFkkFkFkkFFkFFFFkkFFkkkFkFFFFkFFkkFkkkkkkkkFFkJkkFFkFFkFkFFFFFFFFFkFkkFFFkkFkkFFFkFFFFFkkkFFFFFFkFFFkkFkFkFFkkkFkFFkFFFFFFFFFFkFkFFkFkFFFFFFFkFkkFFkkkFFFFkFFkkkkFkFkFFFFFFFkJFFFFFFFkFkFFFFFkFFFkkFJFkFkkkkkFkFFFFFkFFFFFFFkFFFFFFFkFFkkkFFkkFFFFFkFFFFFFFFFkkFFFFkFkFFFkFFFFFFFkFFFFFkFFFFFF:kFkFFFFFFjFFFFFkkFFkFkFFFkFFkFFFFFFFkFFFkFFFkFFkFFFFFFkFFFFFFFFkFkFFF:kkFk^FFFFkFFkFFFkFkFFFQ9Q99iEEiiEEE
grch37.p13:
samtools mpileup -d 200000 -r 1:206647742-206647743 -q 0 -f GRCh37.p13.genome.fa test.markdup.bam
[mpileup] 1 samples in 1 input files
1 206647742 A 1 T F
1 206647743 G 6 .^!.^!.^!.^!.^!, FiEiiE
Then i extract the expanding 500 bp bases left or right from the two genomes and align them:
samtools faidx hg19.fa chr1:206647242-206648242
chr1:206647242-206648242
tgcagtgagctgagatcttgacactgcactccagcctgggtgacagagcgaggctccgtc
tcaaaaaaaaaaaaaaaaaaaaaaaagaaTTGGAGCCATACAGACCAGGTTCCAATCCCT
TCCCTGCTGCTAACCCCAGGGAGTGTTAGCTGCCCTGTGATGATTGTCAATAGCAATTGT
AATAATGACAACAAGCCATCCCCTGCAGAAGATCAGAGTGTCAGGATCTTGTCACCTCCC
AGTGCTGGACTCTCTACCCCTTGAGAGGGAAAGGCGGTGCGGATGGGAGCCCCCATCCAA
CCAGGCTAATCTCTGGGGTTGGGCTGGCCGGAGAGGCTGAATGGAGGCCCAGGAGAGGGT
GGCTGCTCCCCTGTGGGAGTGGGACATGTGCTAATCCCATGCTGTCTCCCACTGCTCCCT
CCCCAATGGCAGAAATCCGGAGAGCTGGTTGCTGTGAAGGTCTTCAACACTACCAGCTAC
CTGCGGCCCCGCGAGGTGCAAGTGAGGGAGTTTGAGGTCCTGCGGAAGCTGAACCACCAG
AACATTGTCAAGCTCTTTGCGGTGGAGGAGACGGTAGGTCCGGTGCTTGGTCAGAGAATG
GTCTTGTCCTTGACCCTTATGGTCTGGGGAGAATCAGGCCACATGATAACAGAGATTTGG
TCCCATGCTCATCAGCAGGTCAGAGACAGCAGGCAAATTGCAGAAGGGAGCAAAGGGGGC
AAGGGGGTGGGGGCGGTGCACTGGAAAGGAACGATGGACAGAATCAGTACCTAAGCAGAG
GGCTTCCTGGAATAACTGACTTTGGATTCCAGTGTGCGGGATCAGTGTGAGGCCAAGGAG
GGAAGGCCAGGCCAGAAGCTGGGACCTGGAGAATGGGGGCTCTGGGCTCCAGGCTGAGCC
ACTTCTTCCTGGTGGGTGGGGAGGAGAAGTGCCGTCCTCATGAGCCCCTCTCTGTCCCAC
CCATAGGGCGGAAGCCGGCAGAAGGTACTGGTGATGGAGTA
samtools faidx GRCh37.p13.genome.fa 1:206647242-206648242
1:206647242-206648242
TGCAGTGAGCTGAGATCTTGACACTGCACTCCAGCCTGGGTGACAGAGCGAGGCTCCGTC
TCAAAAAAAAAAAAAAAAAAAAAAAAGAATTGGAGCCATACAGACCAGGTTCCAATCCCT
TCCCTGCTGCTAACCCCAGGGAGTGTTAGCTGCCCTGTGATGATTGTCAATAGCAATTGT
AATAATGACAACAAGCCATCCCCTGCAGAAGATCAGAGTGTCAGGATCTTGTCACCTCCC
AGTGCTGGACTCTCTACCCCTTGAGAGGGAAAGGCGGTGCGGATGGGAGCCCCCATCCAA
CCAGGCTAATCTCTGGGGTTGGGCTGGCCGGAGAGGCTGAATGGAGGCCCAGGAGAGGGT
GGCTGCTCCCCTGTGGGAGTGGGACATGTGCTAATCCCATGCTGTCTCCCACTGCTCCCT
CCCCAATGGCAGAAATCCGGAGAGCTGGTTGCTGTGAAGGTCTTCAACACTACCAGCTAC
CTGCGGCCCCGCGAGGTGCAAGTGAGGGAGTTTGAGGTCCTGCGGAAGCTGAACCACCAG
AACATTGTCAAGCTCTTTGCGGTGGAGGAGACGGTAGGTCCGGTGCTTGGTCAGAGAATG
GTCTTGTCCTTGACCCTTATGGTCTGGGGAGAATCAGGCCACATGATAACAGAGATTTGG
TCCCATGCTCATCAGCAGGTCAGAGACAGCAGGCAAATTGCAGAAGGGAGCAAAGGGGGC
AAGGGGGTGGGGGCGGTGCACTGGAAAGGAACGATGGACAGAATCAGTACCTAAGCAGAG
GGCTTCCTGGAATAACTGACTTTGGATTCCAGTGTGCGGGATCAGTGTGAGGCCAAGGAG
GGAAGGCCAGGCCAGAAGCTGGGACCTGGAGAATGGGGGCTCTGGGCTCCAGGCTGAGCC
ACTTCTTCCTGGTGGGTGGGGAGGAGAAGTGCCGTCCTCATGAGCCCCTCTCTGTCCCAC
CCATAGGGCGGAAGCCGGCAGAAGGTACTGGTGATGGAGTA
########################################
Program: needle
Rundate: Thu 28 Dec 2023 01:45:15
Commandline: needle
-auto
-stdout
-asequence emboss_needle-I20231228-014512-0468-51177144-p1m.asequence
-bsequence emboss_needle-I20231228-014512-0468-51177144-p1m.bsequence
-datafile EDNAFULL
-gapopen 10.0
-gapextend 0.5
-endopen 10.0
-endextend 0.5
-aformat3 pair
-snucleotide1
-snucleotide2
Align_format: pair
Report_file: stdout
########################################
#=======================================
Aligned_sequences: 2
1: 206647242-206648242
2: 206647242-206648242
Matrix: EDNAFULL
Gap_penalty: 10.0
Extend_penalty: 0.5
Length: 1001
Identity: 1001/1001 (100.0%)
Similarity: 1001/1001 (100.0%)
Gaps: 0/1001 ( 0.0%)
Score: 5005.0
They are exactly the same. I don't know why there exist so much differences.
For each genomes, i found in most regions,the variation for SNV/InDel is the same, but it exists some difference in some regions. I don't know how to resolve this problem. Because grch37 and hg19 in most regions is the same. And in the exactly the same region,the alignment have such difference, as described above.
Look foward to your reply! Thank you !