Pangenome

A Citrullus genus super-pangenome reveals extensive variations in wild and cultivated watermelons and sheds light on watermelon evolution and domestication

Pan-genome analysis highlights the role of structural variation in the evolution and environmental adaptation of Asian honeybees | Tweet | Code

A pangenome reference of 36 Chinese populations

A draft human pangenome reference

Editing and scientific illustration services https://www.letpub.com/

A graph-based genome and pan-genome variation of the model plant Setaria | Code github |

Tweet === > https://twitter.com/RJABuggs/status/1667073518721286145

Paper ===> Transposon signatures of allopolyploid genome evolution

Code github

Pangenomic analysis identifies structural variation associated with heat tolerance in pearl millet

Plant pan-genomics: recent advances, new challenges, and roads ahead

Graph-based pan-genome: increased opportunities in plant genomics

A pan-genome and chromosome-length reference genome of narrow-leafed lupin (Lupinus angustifolius) reveals genomic diversity and insights into key industry and biological traits

Genomic approaches for studying crop evolution

Yam genomics supports west Africa as a major cradle of crop domestication

Plantae blog

Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations

How the pan-genome is changing crop genomics and improvement

Cai, Chang, Wang & co construct a pan-genome of Brassica rapa from 18 genomes covering different morphologies. They infer the ancestral genome, and construct a graph genome, which they use to genotype SVs in 524 accessions. Leafy head linked to a deletion.

Impacts of allopolyploidization and structural variation on intraspecific diversification in Brassica rapa

Breaking New today 23 May 2021 Sorghum Pangenome from another team Out

Extensive variation within the pan-genome of cultivated and wild sorghum

Code availability

Breaking news | Cotton pangenome out

Cotton pan-genome retrieves the lost sequences and genes during domestication and selection

New Special Issue: Feeding the World: The Future of Plant Breeding from Rajeev about genomic breeding... Awesome ..Designing Future Crops: Genomics-Assisted Breeding Comes of Age

a quote

The caveats associated with fragmented genome assemblies came to the fore and a pressing need was to construct more genome sequences representative of species (pangenome) or even the entire genus (super-pangenome) in order to capture a comprehensive view of genetic diversity that spans the entire crop gene pool

Pangenome litterature IRD

Nice review from the pioneer of plant pan genomics Agniesca | 2019 | Trends in Genetics |Pangenomics Comes of Age: From Bacteria to Plant and Animal Applications

An other one from Lei et al | 2021 | Annual Review of Plant Biology | Plant Pan-Genomics Comes of Age

Pangenome of sorghum...Great competition between USA and India | Both team publish a bioarchive preprints.

USA | Pan-genome Analysis in Sorghum Highlights the Extent of Genomic Variation and Sugarcane Aphid Resistance Genes | USA Video Presentation by Bo wang from Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA here

India | Sorghum pan-genome explores the functional utility to accelerate the genetic gain

The Gap-free Rice Genomes Provide Insights for Centromere Structure and Function Exploration and Graph-based Pan-genome Construction

They used vg tookit

Construction of Graph-based Pan-genome MH63RS3 was set as a reference and the pan-PAVs sequences were saved in variant call format (VCF). The graph-based pan-genome was construct via the vg (https://github.com/vgteam/vg, version v1.29.0) toolkit (Garrison et al., 2018) with default parameters.

Personnaly I think vg toolkit is for variant detections

A nice review on pangenome graph construction here

they suggested minigraph and seqwish as alignment-based pangenome construction

Recent unpublished methods explore two new alternatives to alignment-based pangenome construction. Minigraph (https://github.com/lh3/minigraph) extends the minimap2 (83) alignment chaining model to work on graphs. It applies this alignment model to progressively build out a pangenome graph from a series of genomes that contains large sequences (>250 base pairs) that were not previously seen in other genomes. The resulting pangenome does not contain all input sequences and variation between them but rather a representative subset and large structural variants. By contrast, seqwish (48; https://github.com/ekg/seqwish) generates the full variation graph implied by a collection of sequences and alignments between them. The paths embedded in its output graph precisely and completely reconstruct the input sequences, while the topology of the graph describes all variants represented in the input alignments.

Moreover here in Building pan-genome infrastructures for crop plants and their use in association genetics

The soybean pangenome paper finaly show me that vg is for structural variation detection...After constructing the graph base pangenome, it is possible to call variant with thousand of genotypes. They used mummer and vg for this section.

Structural Variation Identification

SNPs and indels were identified using show-snps (-ClrT) of the MUMmer4 toolkit. We use the SVMU (structural variants from MUMmer) (Chakraborty et al., 2019) pipeline to automate presence and absence variation (PAV) discovery by parsing the result of NUCmer. From the SVMU results, insertion/deletion (with tag INS/DEL) was treated as PAV. The genome region neither detected as synteny block by NUCmer nor insertion/deletion by SVMU was also treated as the PAV region.

For copy number variation (CNV), we first filtered the synteny block less than 100 bp. The sequence region with two or more separate synteny blocks (> 90% identity) overlapping was detected as CNV. Translocation and inversion events (both refer to structure variation ≥ 1 Kbp) were detected by manual check depending on their location and orientation to their neighboring blocks based on the non-allelic homology blocks from the above alignment using MUMmer4. The neighboring blocks belonging to same type of events were merged together. For structural variation merging, we referred to a reported method from human beings (Audano et al., 2019). The ZH13 genome was set as the reference genome, SoyW01 served as the initial callset and new sites were added per sample. Any variants in the sample that had 50% reciprocal overlap with an existing discovery variant was excluded. This merging was performed separately by each variant type.

For graph-based genome construction and analyses, the ZH13 genome was set as a reference, the nonredundant structural variations with repetitive sequences less than 90% were saved in variant call format (VCF), and graph-based genome construction was performed via the vg (https://github.com/vgteam/vg, version v1.6.0) toolkit (Garrison et al., 2018). To genotype the structural variations in 2,898 accessions, we mapped the Illumina short reads from each accession to the graph-based genome via vg toolkit using default parameters.

An other interesting analysis is the Core and Dispensable Gene Family Clustering

ORTHOMCL for the orthologues detection, and they performed the annotation using KEEG, Interproscan.

The core and dispensable gene sets were estimated based on gene family clustering using OrthoMCL (Li et al., 2003) v2.0.9. For each de novo accession and ZH13, a gene containing CDS with 100% similarity to other genes was removed by using the cd-hit-est of CD-HIT (Li and Godzik, 2006) v4.6 toolkit with the parameter of –c 1 –aS 1. Protein sequences of the remaining genes were subjected to homologous searching by BLASTp (Camacho et al., 2009) with parameters of –evalue 1-e10 –max_target_seqs 116. OrthoMCL (version 2.0.9) was used to deal with the BLAST result with the parameter of percentMatchCutoff = 50 and -I 1.5 to make gene family clustering. The gene families that were shared among accessions were defined as core gene families, the gene families that were missed in one or two accessions were defined as softcore gene families, the gene families that were missed in more than two accessions were defined as dispensable gene families, and those that only existed in one accession were defined as private gene families. For phylogenetic analysis of each gene family, MUSCLE (Edgar, 2004) v3.8.31 was used for sequence alignment and MEGA6 (Tamura et al., 2013) was used for phylogenetic tree building.

For gene function annotation, KEGG pathway analysis was performed using KOBAS 3.0 (Xie et al., 2011), protein domain was annotated by InterProScan 5 (Jones et al., 2014), and Gene Ontology was annotated by PANNZER2 (Törönen et al., 2018). The enrichment test was performed by the ClusterProfiler (Yu et al., 2012) v3.10.1 package in R 3.5.0 (R Development Core Team, 2013). QTL information was obtained from SoyBase (https://www.soybase.org/search/qtllist_by_symbol.php).

Definitively, the soybean pan-genome is an excellent guide for my pangenome thesis.

EUPAN TOOLBOX EUPAN enables pan-genome studies of a large number of eukaryotic genomes
Nice review on the application of the pangenome How the pan-genome is changing crop genomics and improvement
Great inspiration from my virtual mentor Philipp E. Bayer for R-genes pangenomics publications
Sesame pangenome
Tool for multiple whole genome alignment for eucakyotes in a context of pangenome | SibeliaZ |Github | Conda version available usiong the following command line conda install -c bioconda sibeliaz
GSAlign: an efficient sequence alignment tool for intra-species genomes
2021 | Pan-genomes: moving beyond the reference

FURTHER READING
Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).

Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).

Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).

Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell. 182, 145–161 (2020).

Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).

2021 | Targeted plant improvement through genome editing: from laboratory to field
2021 | Hotter, drier, CRISPR: the latest edit on climate change
2021 | Building pan-genome infrastructures for crop plants and their use in association genetics
2016 | The pangenome of an agronomically important crop plant Brassica oleracea
How the pan-genome is changing crop genomics and improvement
definition
Apple pan-genome | 02 Noember 2020 | Domestication | Assembly improvement | Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication | Code availability
PacBIO a dit "Sequencing multiple individuals is the best way to understand genomic variation in a species or across closely related species." photo
Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus
Discovery and population genomics of structural variation in a songbird genus
Pan-Genome of Wild and Cultivated Soybeans
Six reference-quality genomes reveal evolution of bat adaptations
Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance
Beyond a Single Reference Genome – The Advantages of Sequencing Multiple Individuals
Rice | Rice wild
Plant pan-genomes are the new reference
Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement

Yedomon / pangenome

Pangenome

The Gap-free Rice Genomes Provide Insights for Centromere Structure and Function Exploration and Graph-based Pan-genome Construction

EUPAN TOOLBOX EUPAN enables pan-genome studies of a large number of eukaryotic genomes

Nice review on the application of the pangenome How the pan-genome is changing crop genomics and improvement

Great inspiration from my virtual mentor Philipp E. Bayer for R-genes pangenomics publications

Sesame pangenome

Tool for multiple whole genome alignment for eucakyotes in a context of pangenome | SibeliaZ |Github | Conda version available usiong the following command line `conda install -c bioconda sibeliaz`

GSAlign: an efficient sequence alignment tool for intra-species genomes

2021 | Pan-genomes: moving beyond the reference

2021 | Targeted plant improvement through genome editing: from laboratory to field

2021 | Hotter, drier, CRISPR: the latest edit on climate change

2021 | Building pan-genome infrastructures for crop plants and their use in association genetics

2016 | The pangenome of an agronomically important crop plant Brassica oleracea

How the pan-genome is changing crop genomics and improvement

definition

Apple pan-genome | 02 Noember 2020 | Domestication | Assembly improvement | Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication | Code availability

PacBIO a dit "Sequencing multiple individuals is the best way to understand genomic variation in a species or across closely related species." photo

Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus

Discovery and population genomics of structural variation in a songbird genus

Pan-Genome of Wild and Cultivated Soybeans

Six reference-quality genomes reveal evolution of bat adaptations

Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance

Beyond a Single Reference Genome – The Advantages of Sequencing Multiple Individuals

Rice | Rice wild

Plant pan-genomes are the new reference

Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement

About

Languages

Pangenome

EUPAN TOOLBOX EUPAN enables pan-genome studies of a large number of eukaryotic genomes

Nice review on the application of the pangenome How the pan-genome is changing crop genomics and improvement

Great inspiration from my virtual mentor Philipp E. Bayer for R-genes pangenomics publications

Tool for multiple whole genome alignment for eucakyotes in a context of pangenome | SibeliaZ |Github | Conda version available usiong the following command line conda install -c bioconda sibeliaz

2021 | Pan-genomes: moving beyond the reference

2021 | Targeted plant improvement through genome editing: from laboratory to field

2021 | Hotter, drier, CRISPR: the latest edit on climate change

2021 | Building pan-genome infrastructures for crop plants and their use in association genetics

2016 | The pangenome of an agronomically important crop plant Brassica oleracea

Apple pan-genome | 02 Noember 2020 | Domestication | Assembly improvement | Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication | Code availability

PacBIO a dit "Sequencing multiple individuals is the best way to understand genomic variation in a species or across closely related species." photo

Rice | Rice wild

About

Languages

Tool for multiple whole genome alignment for eucakyotes in a context of pangenome | SibeliaZ |Github | Conda version available usiong the following command line `conda install -c bioconda sibeliaz`