c-zhou / yahs

Yet another Hi-C scaffolding tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Contig was cutted continually

xiekunwhy opened this issue · comments

Hi,

I found that YaHS cut some contigs continually like following (utg261 in scaffold_28), why don't join these continually pieces or add an option to join them?

scaffold_28 1 22000 1 W utg261 13001 35000 +
scaffold_28 22001 22200 2 N 200 scaffold yes proximity_ligation
scaffold_28 22201 34200 3 W utg261 35001 47000 +
scaffold_28 34201 34400 4 N 200 scaffold yes proximity_ligation
scaffold_28 34401 454400 5 W utg261 47001 467000 +
scaffold_28 454401 454600 6 N 200 scaffold yes proximity_ligation
scaffold_28 454601 1549263 7 W utg413 9001 1103663 +
scaffold_28 1549264 1549463 8 N 200 scaffold yes proximity_ligation
scaffold_28 1549464 1849806 9 W utg6105 1 300343 -
scaffold_28 1849807 1850006 10 N 200 scaffold yes proximity_ligation
scaffold_28 1850007 1896624 11 W utg6020 378001 424618 -
scaffold_28 1896625 1896824 12 N 200 scaffold yes proximity_ligation
scaffold_28 1896825 2274824 13 W utg6020 1 378000 -
scaffold_28 2274825 2275024 14 N 200 scaffold yes proximity_ligation
scaffold_28 2275025 3467380 15 W utg675 1 1192356 +
scaffold_28 3467381 3467580 16 N 200 scaffold yes proximity_ligation
scaffold_28 3467581 3952820 17 W utg1310 1 485240 +

Best,
Kun

Hello Kun,

These could be some false breaks introduced in the contig error correction step. We do error correction at a relatively high resolution (small bin size). These contigs were cut due to a lack of HiC signals in those regions caused by for example sequencing coverage bias. The false breaks were put back together at lower resolutions (large bin sizes), i.e., when we zoom out the HiC contact map.

Thanks for the suggestion. Yes, we can add an option to do a final check to remove these breaks. I will let you know when it is done.

Best,
Chenxi