HuffordLab / Gene_Annotation_Pipeline

Gene Annotation Pipeline

Repository from Github https://github.comHuffordLab/Gene_Annotation_PipelineRepository from Github https://github.comHuffordLab/Gene_Annotation_Pipeline

Gene Annotation Pipeline

Steps

  1. Organizing inputs: Naming genomes and RNAseq reads
  2. Mapping RNAseq reads to the genome (STAR)
  3. Calculating genome quality metrics (Assemblethon, BUSCOs) - seprate alt-haplotypes in genome, if any.
  4. Remapping RNAseq reads to primary scaffolds/contigs only (STAR)
  5. Running transcript assembly (Cufflinks, Strawberry, Stringtie, Class2 and Trinity), Splice junctions (Portcullis)
  6. Run ab initio gene prediction (BRAKER)
  7. Map Trinity to genome to generate GFF3 (GMAP)
  8. Pick transcripts for evidence-based predictions (Mikado)
  9. Combine annotations (Mikado and Homology based predictions with ab initio) (GeMoMa)
  10. Identify primary transcripts (TRaCE)
  11. Finalize GFF3 files (custom) and calculate annotation metrics (AGAT, BUSCO)
  12. Perform repeat annotations (EDTA)
  13. Functional Annotations (TBD)

About

Gene Annotation Pipeline