This repository contains the datasets and scripts needed to reproduce the results in Diversity, duplication, and genomic organization of homeobox genes in Lepidoptera
Each folder contains data and code required to recreate results and figures in each section of the manuscript.
If you use any scripts from this repository please cite: Mulhair et al. 2023
00_all_homeobox/
contains files on the homeobox genes present in each of the species for each of the homeobox gene classes. It also contains.tsv
files andhbx_count_heatmap.R
required to reproduce Figure 1 in the manuscript. This directory also containsexpression_results
which has all code and data required to measure expression of homeobox genes in a select number of species.01_Hox_gene_cluster/
contains files on genes present in the Hox cluster for each species as well asplot_Hox_cluster.R
required to reproduce Supplementary Figure 2. It also contains two subdirectoriesTAD_analysis/
andgene_tree/
which contain all necessary data and code required to reproduce analyses and figures for Figure 3 & 4.02_Shx_duplications/
contains files on LINE density in the Hox cluster and its association with Shx gene duplication. All data and code required to reproduce Figure 5 are present. It also contains the subdirectoryTE_annotation/
which contains all code required to annotate TE content in the genomes, as well as to recreate Supplementary figure 3.03_NK_gene_cluster/
contains files on genes present in the NK cluster for each species as well asplot_NK_cluster.R
required to reproduce Supplementary Figure 3.
Genomes used in this analysis from the Darwin Tree of Life project can be downloaded by using code from here
Annotation of homeobox genes from all classes using these genomes can be carried out using the HbxFinder pipeline