levinas / francisella_project

Francisella sequence comparison

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Francisella Sequence Comparison

Define regions of difference (RDs or SNPs) by comparing genomes of six isolates to a published F. tularensis Schu S4 genomic sequence.

Reference

Francisella tularensis Schu S4: GenBank AJ749949.2

Six isolates

Summary

You can find the table of regions of difference in RD.html

You can see the complete list of 17 SNPs and 4498 INDELs RD_full.html

You can find a detailed report in report.md.

Phylogenetic tree

  • NJ tree based on GSNP counts (SNPs that have 20 identical base pairs that flank them) NJ tree based on GSNP counts

  • NJ tree based on aligned 4515 variant bases

NJ tree based on aligned 4515 variant bases

SNPs

Isolate_name SNPs (NCBI assembly) SNPs (reads: Pond lib) SNPs (reads: Solexa lib) Ref bases with no read coverage
FSC043 1 0/0 0/0 0
FTS-634 8 6/6 6/6 0
NR-10492 1 0/0 0/1 0
NR-28534 2 0/0 0/2 0
NR-643 8 7/7 10/11 0
SL 10 0/0 0/1 298 (0.016%)

Each isolate is associated with two paired-end libraries. The number of SNPs estimated through read mapping is shown in the format of # filterd SNPs / # raw SNPs. The minimum variant fraction for filtered SNPs is 0.3.

Analysis

  • vcf/: Raw SNPs estimated by read mapping (BWA-mem v0.7.7, Samtools v0.1.19)
  • snps/: Annotated filtered SNPs
  • bbhs/: Protein-protein bidiretional best hits and unique proteins (reference vs de novo assemblies, both annotated using RAST)
  • mummer/: DNA-level difference computed using Mummer (reference vs de novo assemblies)
  • uncov/: Base positions in the reference contigs with no read coverage (empty if none)
  • assembly/: De novo assemblies computed using Assembly-RAST (with --recipe "rast")
  • rast2/: RAST2 annotations of de novo assembled contigs
  • olive-mummer/: SNPs estimated by comparing reference and NCBI scaffolds downloaded from the Olive data portal of Broad Institute.
  • mugsy/: Whole-genome DNA alignment of all NCBI scaffolds and reference using Mugsy
  • trees/: Phylogenetic trees of the seven isolates

About

Francisella sequence comparison


Languages

Language:Perl 57.5%Language:Gnuplot 39.9%Language:Shell 2.6%