Accessory scripts designed to scaffold, analyze and validate Eukaryotic genome assemblies generated by long reads.
The scripts and data files are split into four main categories, each with their own directories in the repository:
- assembly_frc_benchmarking
- gap_identification_scripts
- pga_assembly_benchmarking
- pga_assembly_correction
Here is a brief description of the contents of each folder:
This folder contains scripts that were used to estimate the quality value (QV), structural variant types and feature response curves (FRC) for each tested assembly. In order to run these scripts, you must first install the following software on your PATH:
Further instructions on how to use the shell scripts are provided by the README in the directory.
This folder contains Perl scripts used to identify gap regions, cross-align gap-flanking sequence on other assemblies and to estimate the number of filled gaps within an assembly. There are several required Perl modules that you must install in order to use these scripts:
- Mouse
- Mouse::NativeTraits
- namespace::autoclean
In order to get usage statements, just execute the scripts without arguments. Examples of script usage are provided in the manuscript supplementary notes.
This folder contains a Python script and data files that were used to compare the PGA scaffolding results and to summarize output.
This folder contains Perl scripts that were used to interrogate PGA scaffold order against a known genetic map. These scripts require the same Perl Modules as listed for the gap_identification_script folder.