mingyue-mingyue / GoatAssemblyScripts

A loose collection of scripts and utilities for processing and analyzing the Goat reference genome assembly

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Goat Assembly Scripts


Accessory scripts designed to scaffold, analyze and validate Eukaryotic genome assemblies generated by long reads.

Table of contents

The scripts and data files are split into four main categories, each with their own directories in the repository:

  • assembly_frc_benchmarking
  • gap_identification_scripts
  • pga_assembly_benchmarking
  • pga_assembly_correction

Here is a brief description of the contents of each folder:

Assembly FRC Benchmarking

This folder contains scripts that were used to estimate the quality value (QV), structural variant types and feature response curves (FRC) for each tested assembly. In order to run these scripts, you must first install the following software on your PATH:

Further instructions on how to use the shell scripts are provided by the README in the directory.

Gap Identification Scripts

This folder contains Perl scripts used to identify gap regions, cross-align gap-flanking sequence on other assemblies and to estimate the number of filled gaps within an assembly. There are several required Perl modules that you must install in order to use these scripts:

  • Mouse
  • Mouse::NativeTraits
  • namespace::autoclean

In order to get usage statements, just execute the scripts without arguments. Examples of script usage are provided in the manuscript supplementary notes.

PGA Assembly Benchmarking

This folder contains a Python script and data files that were used to compare the PGA scaffolding results and to summarize output.

PGA Assembly Correction

This folder contains Perl scripts that were used to interrogate PGA scaffold order against a known genetic map. These scripts require the same Perl Modules as listed for the gap_identification_script folder.

About

A loose collection of scripts and utilities for processing and analyzing the Goat reference genome assembly


Languages

Language:Perl 45.8%Language:Python 42.9%Language:Shell 11.3%