PeterMulhair / Lepidoptera_homeobox

Data and scripts used in Mulhair et al. 2023 to analyse the evolution of homeobox genes across Lepidoptera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lepidoptera homeobox evolution

This repository contains the datasets and scripts needed to reproduce the results in Diversity, duplication, and genomic organization of homeobox genes in Lepidoptera

Each folder contains data and code required to recreate results and figures in each section of the manuscript.

If you use any scripts from this repository please cite: Mulhair et al. 2023

Instructions

  • 00_all_homeobox/ contains files on the homeobox genes present in each of the species for each of the homeobox gene classes. It also contains .tsv files and hbx_count_heatmap.R required to reproduce Figure 1 in the manuscript. This directory also contains expression_results which has all code and data required to measure expression of homeobox genes in a select number of species.
  • 01_Hox_gene_cluster/ contains files on genes present in the Hox cluster for each species as well as plot_Hox_cluster.R required to reproduce Supplementary Figure 2. It also contains two subdirectories TAD_analysis/ and gene_tree/ which contain all necessary data and code required to reproduce analyses and figures for Figure 3 & 4.
  • 02_Shx_duplications/ contains files on LINE density in the Hox cluster and its association with Shx gene duplication. All data and code required to reproduce Figure 5 are present. It also contains the subdirectory TE_annotation/ which contains all code required to annotate TE content in the genomes, as well as to recreate Supplementary figure 3.
  • 03_NK_gene_cluster/ contains files on genes present in the NK cluster for each species as well as plot_NK_cluster.R required to reproduce Supplementary Figure 3.

Genomes used in this analysis from the Darwin Tree of Life project can be downloaded by using code from here

Annotation of homeobox genes from all classes using these genomes can be carried out using the HbxFinder pipeline


About

Data and scripts used in Mulhair et al. 2023 to analyse the evolution of homeobox genes across Lepidoptera

License:GNU General Public License v3.0


Languages

Language:Python 66.1%Language:R 20.2%Language:HyPhy 8.1%Language:Shell 5.6%