Telomere-to-telomere consortium

Introduction

We have sequenced the CHM13hTERT human cell line on the Oxford Nanopore GridION. We have also sequenced approximately 50x coverage using 10X Genomics as well as BioNano DLS and Arima Genomics HiC. PacBio data for this cell line has been previously generated by the Washington University School of Medicine and the University of Washington, and is available from NCBI SRA.

Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. We followed Josh Quick's ultra-long read (UL) protocol for library preparation and sequencing.

Data reuse and license

All data is released to the public domain (CC0) and we encourage its reuse. While not required, we would appreciate if you would acknowledge the "telomere-to-telomere" (T2T) consortium for the creation of this data and encourage you to join us if you would like to help finish the human reference genome. More information about our consortium can be found on the T2T homepage.

Draft Assembly

The current assembly draft (v0.4) is generated with Canu v1.7.1 including rel1 data up to 2018/11/15 and incorporating the previously released PacBio data. Two gaps on the X plus the centromere were manually resolved. The assembly was polished with two rounds of nanopolish and two rounds of arrow. The estimated base accuracy is currently QV36, which we expect to improve with future integration of the 10X Genomics data. BioNano structural variants on the X were identified, locally mapping nanopore reads selected, reassembled, and used to patch the assembly. However, these patches are not yet polished or validated using BioNano. The assembly has not been curated outside of the X chromosome.

The assembly is 2.94 Gbp in size with 657 contigs and an NG50 of 85.8 Mbp

This should be considered a draft and likely has mis-assemblies, inaccurate consensus, and frame-shifted genes. It will be further validated, scaffolded with BioNano, and polished using the available data.

Downloads

Assembly draft v0.4 (md5: 7e3c2fff9479ba45f7916fa1eee1310b)

Sequencing Data

Oxford Nanopore Data

We sequenced approximately 100 flowcells of UL data for a total of 155 Gbp (50x coverage, 1.6 Gbp/flowcell). The read N50 is 70 kbp and there are 99 Gbp of data in reads >50 kbp (32x). The longest mapping read is 1.04 Mbp.

rel2 (genomic DNA)

rel2 is the same data as rel1 but recalled with the latest generation callers (Guppy flip-flop 2.3.1). We have provided mappings both to our current draft assembly and to the GRCh38 with decoys in cram format, using minimap2.

Downloads

Guppy flip-flop 2.3.1 (md5: 7e3f4ded02d500a3db0c76c84cdc42b9)
Guppy flip-flop mapped to asm v0.4 with minimap2 (md5: 09d87ae044d1628056cb95690dc93378)
Guppy flip-flop mapped to GRCh38 with decoys with minimap2 (md5: 1a4888cafbc935a21c17f449b4802438)

rel1 (genomic DNA)

The full dataset as of 2019/01/09. These basecalls were generated on-instrument and use older versions of Guppy (depending on when the flowcell ran on the instrument).

Downloads

Guppy on-instrument (md5: c2cb74601eb657df21b7d25980908288)

fast5 data

The raw fast5 data, without basecalls, is available for completeness. The data is grouped into 96 sets.

Downloads

Partition 001 (md5: c837460c50a4446fc8320c95dc88f204)
Partition 002 (md5: 05ceccf4256d248aaec2a4c61e58c26c)
Partition 003 (md5: 879e3a6391e5da5f943fa46b92decd47)
Partition 004 (md5: 600bfa46c741eeff0064b1d8040b9349)
Partition 005 (md5: 1a72beff4b2e4556c5033176ed1cd109)
Partition 006 (md5: fcd6f8ceeac2034eddaa33cedf6d0010)
Partition 007 (md5: 0d44cb41a4888b55bce2cba7e70107ba)
Partition 008 (md5: 52242770505ac9aca1070e0b926c4769)
Partition 009 (md5: 4e85e63a4ebf8efb2f97fdcee46e5737)
Partition 010 (md5: e495530dd8a68b7bc9864ab89a4ef52f)
Partition 011 (md5: 3b57e6256d0162d83a281e74157134e0)
Partition 012 (md5: 735a0a03c6bec1e0ed417baa0c2d7db2)
Partition 013 (md5: 90c51a9ab06266b2a980bcc16d3d3960)
Partition 014 (md5: 645ea0b4edc2bfc71c708a53d5b0d92b)
Partition 015 (md5: 24f456adb4c1c6579fe34f07c82179e7)
Partition 016 (md5: 6b72ddda5a7a1c10b50f3026914519ec)
Partition 017 (md5: 14e7b918b28ecc784b68569454fa27d9)
Partition 018 (md5: d5f7c9b1d88cf48298f6cbbb2a2a45a9)
Partition 019 (md5: cefa121a627dfcf9a1dfb117065a7264)
Partition 020 (md5: ca0729b28cd4cccc81eba670c6e86689)
Partition 021 (md5: 51a873a2019f2b091ab035cc3f074bb8)
Partition 022 (md5: e9235f052d651b4ba1fdaaa06ad134d0)
Partition 023 (md5: 75ebfdb40745d667962a19a0aa838837)
Partition 024 (md5: e1e05425f9823e50650bd2cf1efa41c6)
Partition 025 (md5: f8efb23a5e77b12f46bce73b2ddba36a)
Partition 026 (md5: 829f32786514b092da9e4fb8701da037)
Partition 027 (md5: 15ebb086d975583386c1d0e49fbca932)
Partition 028 (md5: 3dd39dee6efea9b1b50d282d1d2aae19)
Partition 029 (md5: 3c5b3522dd741214554f84d8645cdf20)
Partition 030 (md5: 1ef7fe24c315085d8dcfe4e6ba9b4de2)
Partition 031 (md5: e9501d4d0fd38d64c2ad1c81f8d1a0e3)
Partition 032 (md5: 1f3ff51da0e87c2009bef8256b930f0b)
Partition 033 (md5: 76a518084b021db82fd5dab7540e88bb)
Partition 034 (md5: fd9f4dcfaeb89134a4f700a5346c16fa)
Partition 035 (md5: dbdd53ba61d67a7f61405ae39d2b931b)
Partition 036 (md5: c243b8f64bde0051fe104e8baaecf09b)
Partition 037 (md5: aafa1d558881b2b4856fde3af0cbb9b2)
Partition 038 (md5: d2e39e42eaf6a0a63d0542435590dd88)
Partition 039 (md5: ef48d5c46f19de02fb6f6646726c95de)
Partition 040 (md5: 17d7d34b45e14b2a79fc30e5c5084315)
Partition 041 (md5: eb6a16d0b37d538bdbf90c3bfcc0f098)
Partition 042 (md5: 7dbf87d75c901463b2e4e4afdc4adb52)
Partition 043 (md5: 97c071a1d0a170e9f4809f6cdc459a6b)
Partition 044 (md5: 27dc707435a2c98fc7201ccefec68c9d)
Partition 045 (md5: 54ce28e1e1b54ab9fd8dd072711acd30)
Partition 046 (md5: b174c7826fc399312fad331660745e55)
Partition 047 (md5: 2b6ce400051fce5d2de09fd8fd461fc8)
Partition 048 (md5: 81415b29f2b6a605473af6d3529758b1)
Partition 049 (md5: ffc9182d8a9ad9752b6571d3d2f2b69d)
Partition 050 (md5: 790281fcf0512a798b6f0e75b14620be)
Partition 051 (md5: 4fc5dc17819a3727e5cedaa89550ef9f)
Partition 052 (md5: d33a70e926dee0e67cf1a75d50ee1249)
Partition 053 (md5: 9d66e1372866dd454173f486d57ae322)
Partition 054 (md5: 958b62e07349258d93ee3e089c6f91ff)
Partition 055 (md5: 0e605a04d9bbeb0573aefddbfae12bd6)
Partition 056 (md5: 29b205c649f66e3d44ea9f598b492bc2)
Partition 057 (md5: 7336b91e333ae912b4cfc6e366570c54)
Partition 058 (md5: 2d992482005a2523f710487f2c0a0a31)
Partition 059 (md5: 3b45c205982796a90aa0f40955c4937b)
Partition 060 (md5: f085ae6a4818c44d03a6f5adfc445699)
Partition 061 (md5: 1c5a3a0ed8b53a930535b9d34e6a0667)
Partition 062 (md5: fbfd4ffb7cf8fca4d613d0ec67d3104c)
Partition 063 (md5: 9ddf7a9fe7e9cf8ceb02b8debed41fcc)
Partition 064 (md5: ee3ac8080a19d4a6ab3af84074d03d7a)
Partition 065 (md5: d94a12692d399c44612cab8b2aea8164)
Partition 066 (md5: a9f3bfa69bbc248b33f99f42827331eb)
Partition 067 (md5: 6c9d4b38edc6f78521f3cfdd8edc571c)
Partition 068 (md5: 76a29683bfad7c4a0b8a0bdbbbd6fd49)
Partition 069 (md5: f924667636c528d56e46aa92db0a182d)
Partition 070 (md5: f813b0a4b2a4a2353c7deb539f16f286)
Partition 071 (md5: fa56e2524ea2cc57e79f692466375b83)
Partition 072 (md5: 23b1df220d55ab9df2735c74849a53c9)
Partition 073 (md5: 70839cbc61d3d8af7fafcb7ba8f96461)
Partition 074 (md5: 109b91ceda32ab0f8b9edb24cb35fb23)
Partition 075 (md5: 53c466af09a3a119df3255189091bcda)
Partition 076 (md5: 22ad2327db64767e34378508afe60706)
Partition 077 (md5: 64c7c1702e3476137c54ebc0c07d970e)
Partition 078 (md5: 6e2048a8a2ceb36bb679455e0af81230)
Partition 079 (md5: 45717c24fe844f2605be81bd8e15d856)
Partition 080 (md5: 1ac20637828f0f3115f1c0f289e006aa)
Partition 081 (md5: e7b5e584de5f2cbda1d53ec2f6e2668e)
Partition 082 (md5: aad214d168ad3a59488dfac71fcedc22)
Partition 083 (md5: d557dee3b08c61d540fd6a00689341fa)
Partition 084 (md5: cc2b4676515b988dd4f64724e49c3304)
Partition 085 (md5: 34e6154991e5d5c641e22a529c5f06e1)
Partition 086 (md5: 2f9ff4371f32c3a33ea081ad8825437e)
Partition 087 (md5: 945504e89ba54cdab032eac63985d216)
Partition 088 (md5: 46a8ba05cb12b268c7f7ce04575d24da)
Partition 089 (md5: 5fd0219c9c99aa08ce07bb35e647144c)
Partition 090 (md5: da0e3f19f81c99a89bcff7e8f74dc6cb)
Partition 091 (md5: c11b11f3386d47dd33acc3cba7f44fb2)
Partition 092 (md5: 87dfa60ae9308214b43aa7075ddd9f44)
Partition 093 (md5: 6eced035881d3e804bea7103d26c042e)
Partition 094 (md5: 59ebbc64994779244e5f7431c54b819e)
Partition 095 (md5: 4de3c1f5163357a256847c1082379df3)
Partition 096 (md5: cf16e88c803b82b052651171490d6d5a)

10X Genomics Data

Raw fastq files

Approximately 50x of data was generated on a NovaSeq instrument. Based on the summary output of Supernova, there are 1.2 billion reads with 41x effective coverage. The mean molecule length is 130 kbp and an N50 of 864 reads per barcode.

Downloads

CHM13_prep5_S13_L002_I1_001 (md5: 84af4586ca9f78060d5802b36cdd9e8a)
CHM13_prep5_S13_L002_R1_001 (md5: 231633e0cf2fbdeba732dc7ad6233fa0)
CHM13_prep5_S13_L002_R2_001 (md5: 386febfc3fc760e11e315e69310ed3d8)
CHM13_prep5_S14_L002_I1_001 (md5: f0b7628e90dfaf2f702ec613c7b61ca7)
CHM13_prep5_S14_L002_R1_001 (md5: 86afbc7a41ea1c81657bf1ca64d1178c)
CHM13_prep5_S14_L002_R2_001 (md5: 3dfbe58b5ae715213e20614837dcf3b7)
CHM13_prep5_S15_L002_I1_001 (md5: ee34f03c765787ea069050d8eaac1de4)
CHM13_prep5_S15_L002_R1_001 (md5: 73edcb56dd18d7b7b2705b4db7b4efc5)
CHM13_prep5_S15_L002_R2_001 (md5: a0de8e5bc127203129e4e1437b3e6aaa)
CHM13_prep5_S16_L002_I1_001 (md5: 42db246f7e5725a7b6ff3f5f5aedfd6e)
CHM13_prep5_S16_L002_R1_001 (md5: 3d3db7eccaf388fbcd901cbc6ad47630)
CHM13_prep5_S16_L002_R2_001 (md5: 9dfcc17398a7acd906212a09ab4c8903)

BioNano DLS Data

Approximately 430x of data was generated using the Saphyr instrument and the DLE-1 enzyme. There are 15.2 M molecules with an N50 molecule length of 115.9 kbp and a max of 2.3 Mbp (2 M molecules > 150 kbp, N50 218 kbp). The assembly of the molecules is 2.97 Gbp in size with 255 contigs and an NG50 of 59.6 Mbp.

Downloads

BNX (md5: 59a7a5583e900e1e5cecb08a34b5b0dc)
CMAP (md5: cf1a6fbcf006a26673499b9297664fdb)

HiC Data

The HiC raw data will be available soon.

Previously generated PacBio data

The PacBio data was previously generated and is available from the SRA

Notes on downloading files.

Files are generously hosted by Amazon Web Services. Although available as straight-forward HTTP links, download performance is improved by using the Amazon Web Services command-line interface. References should be amended to use the s3:// addressing scheme, i.e. replace https://s3.amazon.com/nanopore-human-wgs/ with s3://nanopore-human-wgs to download. For example, to download CHM13_prep5_S13_L002_I1_001.fastq.gz to the current working directory use the following command.

aws s3 --no-sign-request cp s3://nanopore-human-wgs/chm13/10x/CHM13_prep5_S13_L002_I1_001.fastq.gz .

or to download the full dataset use the following command.

aws s3 --no-sign-request sync s3://nanopore-human-wgs/chm13/ .

The s3 command can also be used to get information on the dataset, for example reporting the size of every file in human-readable format.

aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/

or to obtain technology-specific sizes.

aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/nanopore/fast5
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/nanopore/rel2
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/assemblies

Amending the max_concurrent_requests etc. settings as per this guide will improve download performance further.

Contact

Please raise issues on this Github repository concerning this dataset.

History

* rel1 and 2: 2nd March 2019. Initial release.

Irenexzwen / CHM13

Telomere-to-telomere consortium

Introduction

Data reuse and license

Draft Assembly

Downloads

Sequencing Data

Oxford Nanopore Data

rel2 (genomic DNA)

Downloads

rel1 (genomic DNA)

Downloads

fast5 data

Downloads

10X Genomics Data

Raw fastq files

Downloads

BioNano DLS Data

Downloads

HiC Data

Previously generated PacBio data

Notes on downloading files.

Contact

History

About