Yedomon / pangenome

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pangenome

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

img

Code availability

Pan-genome analysis reveals genomic variations associated with domestication traits in broomcorn millet

img

Code

The pan-genome and local adaptation of Arabidopsis thaliana

img

A Citrullus genus super-pangenome reveals extensive variations in wild and cultivated watermelons and sheds light on watermelon evolution and domestication

img

Pan-genome analysis highlights the role of structural variation in the evolution and environmental adaptation of Asian honeybees | Tweet | Code

A pangenome reference of 36 Chinese populations img

A draft human pangenome reference

img

Editing and scientific illustration services https://www.letpub.com/

A graph-based genome and pan-genome variation of the model plant Setaria | Code github |

img

Tweet === > https://twitter.com/RJABuggs/status/1667073518721286145

Paper ===> Transposon signatures of allopolyploid genome evolution

Code github

Pangenomic analysis identifies structural variation associated with heat tolerance in pearl millet

img

Plant pan-genomics: recent advances, new challenges, and roads ahead

Graph-based pan-genome: increased opportunities in plant genomics

A pan-genome and chromosome-length reference genome of narrow-leafed lupin (Lupinus angustifolius) reveals genomic diversity and insights into key industry and biological traits

Genomic approaches for studying crop evolution

img

Yam genomics supports west Africa as a major cradle of crop domestication

img

Plantae blog

Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations

img

How the pan-genome is changing crop genomics and improvement

img

Cai, Chang, Wang & co construct a pan-genome of Brassica rapa from 18 genomes covering different morphologies. They infer the ancestral genome, and construct a graph genome, which they use to genotype SVs in 524 accessions. Leafy head linked to a deletion.

Impacts of allopolyploidization and structural variation on intraspecific diversification in Brassica rapa

img

Breaking New today 23 May 2021 Sorghum Pangenome from another team Out

Extensive variation within the pan-genome of cultivated and wild sorghum

Code availability

img

Breaking news | Cotton pangenome out

Cotton pan-genome retrieves the lost sequences and genes during domestication and selection

img

New Special Issue: Feeding the World: The Future of Plant Breeding from Rajeev about genomic breeding... Awesome ..Designing Future Crops: Genomics-Assisted Breeding Comes of Age

a quote

The caveats associated with fragmented genome assemblies came to the fore and a pressing need was to construct more genome sequences representative of species (pangenome) or even the entire genus (super-pangenome) in order to capture a comprehensive view of genetic diversity that spans the entire crop gene pool

img

Pangenome litterature IRD

Nice review from the pioneer of plant pan genomics Agniesca | 2019 | Trends in Genetics |Pangenomics Comes of Age: From Bacteria to Plant and Animal Applications

An other one from Lei et al | 2021 | Annual Review of Plant Biology | Plant Pan-Genomics Comes of Age

Pangenome of sorghum...Great competition between USA and India | Both team publish a bioarchive preprints.

USA | Pan-genome Analysis in Sorghum Highlights the Extent of Genomic Variation and Sugarcane Aphid Resistance Genes | USA Video Presentation by Bo wang from Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA here

image

India | Sorghum pan-genome explores the functional utility to accelerate the genetic gain

img

They used vg tookit

Construction of Graph-based Pan-genome MH63RS3 was set as a reference and the pan-PAVs sequences were saved in variant call format (VCF). The graph-based pan-genome was construct via the vg (https://github.com/vgteam/vg, version v1.29.0) toolkit (Garrison et al., 2018) with default parameters.

Personnaly I think vg toolkit is for variant detections

A nice review on pangenome graph construction here

they suggested minigraph and seqwish as alignment-based pangenome construction

Recent unpublished methods explore two new alternatives to alignment-based pangenome construction. Minigraph (https://github.com/lh3/minigraph) extends the minimap2 (83) alignment chaining model to work on graphs. It applies this alignment model to progressively build out a pangenome graph from a series of genomes that contains large sequences (>250 base pairs) that were not previously seen in other genomes. The resulting pangenome does not contain all input sequences and variation between them but rather a representative subset and large structural variants. By contrast, seqwish (48; https://github.com/ekg/seqwish) generates the full variation graph implied by a collection of sequences and alignments between them. The paths embedded in its output graph precisely and completely reconstruct the input sequences, while the topology of the graph describes all variants represented in the input alignments.

Moreover here in Building pan-genome infrastructures for crop plants and their use in association genetics

img

The soybean pangenome paper finaly show me that vg is for structural variation detection...After constructing the graph base pangenome, it is possible to call variant with thousand of genotypes. They used mummer and vg for this section.

Structural Variation Identification

SNPs and indels were identified using show-snps (-ClrT) of the MUMmer4 toolkit. We use the SVMU (structural variants from MUMmer) (Chakraborty et al., 2019) pipeline to automate presence and absence variation (PAV) discovery by parsing the result of NUCmer. From the SVMU results, insertion/deletion (with tag INS/DEL) was treated as PAV. The genome region neither detected as synteny block by NUCmer nor insertion/deletion by SVMU was also treated as the PAV region.

For copy number variation (CNV), we first filtered the synteny block less than 100 bp. The sequence region with two or more separate synteny blocks (> 90% identity) overlapping was detected as CNV. Translocation and inversion events (both refer to structure variation ≥ 1 Kbp) were detected by manual check depending on their location and orientation to their neighboring blocks based on the non-allelic homology blocks from the above alignment using MUMmer4. The neighboring blocks belonging to same type of events were merged together. For structural variation merging, we referred to a reported method from human beings (Audano et al., 2019). The ZH13 genome was set as the reference genome, SoyW01 served as the initial callset and new sites were added per sample. Any variants in the sample that had 50% reciprocal overlap with an existing discovery variant was excluded. This merging was performed separately by each variant type.

For graph-based genome construction and analyses, the ZH13 genome was set as a reference, the nonredundant structural variations with repetitive sequences less than 90% were saved in variant call format (VCF), and graph-based genome construction was performed via the vg (https://github.com/vgteam/vg, version v1.6.0) toolkit (Garrison et al., 2018). To genotype the structural variations in 2,898 accessions, we mapped the Illumina short reads from each accession to the graph-based genome via vg toolkit using default parameters.

An other interesting analysis is the Core and Dispensable Gene Family Clustering

ORTHOMCL for the orthologues detection, and they performed the annotation using KEEG, Interproscan.

The core and dispensable gene sets were estimated based on gene family clustering using OrthoMCL (Li et al., 2003) v2.0.9. For each de novo accession and ZH13, a gene containing CDS with 100% similarity to other genes was removed by using the cd-hit-est of CD-HIT (Li and Godzik, 2006) v4.6 toolkit with the parameter of –c 1 –aS 1. Protein sequences of the remaining genes were subjected to homologous searching by BLASTp (Camacho et al., 2009) with parameters of –evalue 1-e10 –max_target_seqs 116. OrthoMCL (version 2.0.9) was used to deal with the BLAST result with the parameter of percentMatchCutoff = 50 and -I 1.5 to make gene family clustering. The gene families that were shared among accessions were defined as core gene families, the gene families that were missed in one or two accessions were defined as softcore gene families, the gene families that were missed in more than two accessions were defined as dispensable gene families, and those that only existed in one accession were defined as private gene families. For phylogenetic analysis of each gene family, MUSCLE (Edgar, 2004) v3.8.31 was used for sequence alignment and MEGA6 (Tamura et al., 2013) was used for phylogenetic tree building.

For gene function annotation, KEGG pathway analysis was performed using KOBAS 3.0 (Xie et al., 2011), protein domain was annotated by InterProScan 5 (Jones et al., 2014), and Gene Ontology was annotated by PANNZER2 (Törönen et al., 2018). The enrichment test was performed by the ClusterProfiler (Yu et al., 2012) v3.10.1 package in R 3.5.0 (R Development Core Team, 2013). QTL information was obtained from SoyBase (https://www.soybase.org/search/qtllist_by_symbol.php).

Definitively, the soybean pan-genome is an excellent guide for my pangenome thesis.

FURTHER READING
Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).

Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).

Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).

Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell. 182, 145–161 (2020).

Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).


About


Languages

Language:Perl 52.4%Language:Shell 24.2%Language:R 15.9%Language:Visual Basic 6.0 5.4%Language:Python 2.0%