marbl / canu

A single molecule sequence assembler for genomes large and small.

Home Page:http://canu.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

*.contigs.layout.readToTig file

tianjio opened this issue · comments

For the *contigs.layout.readToTig file, the first column is readsID, the second column is tigID, the third column is the start position of the corresponding reads in the tig, and the fourth column is the end position of the reads in the tig.
64a36e21702a4b2a5f197adc031f4a6
I sorted the files by reads starting location and found that many reads overlapped with others at the tig location.
a97a7193711888232449b21bba0463f
Are these reads redundant? Can I ignore these reads if I want to obtain the path of reads constituting tig through overlapping relationships?

image
The overlap length of the two reads in the contig position was 247,283 bp, but the overlap length between the two reads obtained by the ovStoreDump command was only 127,200
c9519294a90d6af90a2304fa442eafc

The readToTig file will include all reads, including contained reads since it's used to build consensus where contained reads are informative for the sequence. In general the read overlaps in the file will match the store, though if you're running with the -pacbio-hifi option, the ovStore will be in HPC space while the readToTigFile will not.

Sometimes, an overlap may be a different since in the readToTig file than the overlap store since the readToTigFile is built pairwise followed by read placement so if that read was originally not used in the path, it may get added based on another overlap than the one you've dumped. This can happen with reads from the alt haplotype. If a read is too diverged, consensus is allowed to not use it and skip over a read. If you want to know the exact path the assembler is using to reconstruct the contig walk, you want to use ovStoreDump with the -picture option and -bogart unitigging/4-unitigger/*.best. For example:

% head asm.contigs.layout.readToTig 
#readID tigID   bgn     end
2402    1       0       13221
1862    1       14388   539
4794    1       15075   1017
7705    1       14300   1248
5508    1       19451   5833
4343    1       6326    19953
914     1       7826    22286
4848    1       8471    22375
4584    1       9511    22617

% ovStoreDump -S asm.seqStore -O unitigging/asm.ovlStore -picture 2402 -bogart `ls unitigging/4-unitigger/*.best.edges |sed s/.edges//g`
Opened seqStore 'asm.seqStore' for 'corrected-trimmed' reads.
A       0:13217   A      2402       0:13217     13217                   |--------------------------------------------------------------------------------------------------->|
A       0:1426    B      5660   14208:15635     15635   0.000%   +14208 |---------->                                                                                         |          dovetail
A       0:3445    B      8031   13683:17130     17130   0.000%   +13683 |-------------------------->                                                                         |          dovetail
A       0:3617    B       585   10856:14475     14475   0.000%   +10856 |--------------------------->                                                                        |          dovetail
A       0:3998    B      4958   10507:14507     14507   0.025%   +10507 |------------------------------>                                                                     |          dovetail
A       0:4327    B      1482    9207:13536     13536   0.023%    +9207 |cccccccccccccccccccccccccccccccc>                                                                   |          contained
A       0:4378    B      4399       0:4379      14537   0.023%   +10158 |<---------------------------------                                                                  |          dovetail
A       0:4660    B      3481       0:4666      13737   0.021%    +9071 |<ccccccccccccccccccccccccccccccccccc                                                                |          contained
A       0:5939    B      6791       0:5939      13725   0.067%    +7786 |<cccccccccccccccccccccccccccccccccccccccccccc                                                       |          contained
A       0:6208    B      6648    6941:13159     13159   0.081%    +6941 |cccccccccccccccccccccccccccccccccccccccccccccc>                                                     |          contained
A       0:6810    B      7889       0:6812      16283   0.015%    +9471 |<---------------------------------------------------                                                |          dovetail
A       0:6837    B      4238       0:6833      14927   0.015%    +8094 |<---------------------------------------------------                                                |          dovetail
A       0:8174    B      7985       0:8177      13712   0.012%    +5535 |<-------------------------------------------------------------                                      |          dovetail
A       0:10132   B      5250    4115:14251     14251   0.020%    +4115 |============================================================================>                       |          dovetail
A     539:13217   B      1862    1165:13849     13849   0.032%          |    <===============================================================================================| +1165    dovetail
A    1017:13217   B      4794    1854:14054     14054   0.041%          |       <--------------------------------------------------------------------------------------------| +1854    dovetail
A    1247:13217   B      7705    1076:13063     13063   0.033%          |         <cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc| +1076    contained
A    2276:13217   B       734       0:10943     14822   0.037%          |                 gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg>| +3879    coverage-gap
A    5832:13217   B      5508    6228:13623     13623   0.068%          |                                            <-------------------------------------------------------| +6228    dovetail
A    6325:13217   B      4343       0:6888      13624   0.058%          |                                               ---------------------------------------------------->| +6736    dovetail
A    7824:13217   B       914       0:5394      14453   0.056%          |                                                           ---------------------------------------->| +9059    dovetail
A    8469:13217   B      4848       0:4745      13897   0.042%          |                                                                ----------------------------------->| +9152    dovetail
A    9508:13217   B      4584       0:3712      13110   0.027%          |                                                                       ---------------------------->| +9398    dovetail
A   12366:13217   B      8759       0:851       13928   0.118%          |                                                                                             ------>| +13077   dovetail

% ovStoreDump -S asm.seqStore -O unitigging/asm.ovlStore -picture 1862 -bogart `ls unitigging/4-unitigger/*.best.edges |sed s/.edges//g`
Opened seqStore 'asm.seqStore' for 'corrected-trimmed' reads.
A       0:13849   A      1862       0:13849     13849                   |--------------------------------------------------------------------------------------------------->|
A       0:513     B      7143   14232:14747     14747   0.388%   +14232 |--->                                                                                                |          dovetail
A       0:744     B      7041   13867:14610     14610   0.268%   +13867 |ccccc>                                                                                              |          contained
A       0:752     B      3932   13976:14730     14730   0.663%   +13976 |----->                                                                                              |          dovetail
A       0:999     B      5442   12129:13129     13129   0.599%   +12129 |------->                                                                                            |          dovetail
A       0:1673    B      6202   14292:15952     15952   0.179%   +14292 |gggggggggggg>                                                                                       |          coverage-gap
A       0:2016    B      8759       0:2018      13928   0.198%   +11910 |<--------------                                                                                     |          dovetail
A       0:4876    B      4584       0:4879      13110   0.000%    +8231 |<-----------------------------------                                                                |          dovetail
A       0:5916    B      4848       0:5912      13897   0.169%    +7985 |<------------------------------------------                                                         |          dovetail
A       0:6562    B       914       0:6560      14453   0.015%    +7893 |<-----------------------------------------------                                                    |          dovetail
A       0:8062    B      4343       0:8050      13624   0.012%    +5574 |<----------------------------------------------------------                                         |          dovetail
A       0:8555    B      5508    5062:13623     13623   0.070%    +5062 |------------------------------------------------------------->                                      |          dovetail
A       0:12111   B       734       0:12112     14822   0.000%    +2710 |<ggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg            |          coverage-gap
A       0:13371   B      4794     687:14054     14054   0.075%     +687 |================================================================================================>   |          dovetail
A      87:13140   B      7705       0:13063     13063   0.008%          |cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc>     |          contained
A    1165:13849   B      2402     539:13217     13217   0.032%          |        <===========================================================================================| +539     dovetail
A    4250:13849   B      5250    4655:14251     14251   0.052%          |                              <---------------------------------------------------------------------| +4655    dovetail
A    6212:13849   B      7985       0:7637      13712   0.026%          |                                            ------------------------------------------------------->| +6075    dovetail
A    7550:13849   B      4238       0:6294      14927   0.000%          |                                                      --------------------------------------------->| +8633    dovetail
A    7577:13849   B      7889       0:6271      16283   0.000%          |                                                      --------------------------------------------->| +10012   dovetail
A    8179:13849   B      6648    7481:13159     13159   0.106%          |                                                           <cccccccccccccccccccccccccccccccccccccccc| +7481    contained
A    8448:13849   B      6791       0:5399      13725   0.093%          |                                                             cccccccccccccccccccccccccccccccccccccc>| +8326    contained
A    9727:13849   B      3481       0:4127      13737   0.000%          |                                                                      ccccccccccccccccccccccccccccc>| +9610    contained
A   10009:13849   B      4399       0:3839      14537   0.000%          |                                                                        --------------------------->| +10698   dovetail
A   10060:13849   B      1482    9747:13536     13536   0.000%          |                                                                        <ccccccccccccccccccccccccccc| +9747    contained
A   10389:13849   B      4958   11047:14507     14507   0.029%          |                                                                           <------------------------| +11047   dovetail
A   10770:13849   B       585   11396:14475     14475   0.000%          |                                                                             <----------------------| +11396   dovetail
A   10942:13849   B      8031   14223:17130     17130   0.000%          |                                                                               <--------------------| +14223   dovetail
A   12961:13849   B      5660   14746:15635     15635   0.000%          |                                                                                             <------| +14746   dovetail

The === lines indicate the best edge in the graph so here the path starts with read 2402, followed by 1862, and 4794.