cogerk / GenomicAnalysis

General repo for genomic analysis of nitrifying organsims

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GenomicAnalysis

This project improves reproducibility and speeds up analysis of 16s sequencing results from complex microbial communities in biofilms for the Winkler Lab at the University of Washington.

Pipeline.ipynb is a Jupyter Notebook takes the user's raw sequencing results and a taxonomic table. This is accomplished using USEARCH for data manipulations and sequence binning and the Ribosome Data Project database for taxonomic assignment. The user should be aware of how sequencing data is processed prior to their data's publication, so this notebook provides a step-by-step walk through of how data is manipulated and processed to yield the final table.

Data Analysis.ipynb is a Jupyter Notebook that takes the output from Pipeline.ipynb and uses pandas to data clean it for easy visualization with either R, matplotlib, or even excel.

_split_demux_fastq.py is a supporting script required for the analysis performed in Pipeline.ipynb

rdp_16s_v16.udb is the database file associated with the Ribosome Database Project: Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-5267; doi: 10.1128/AEM.00062-07 [PMID: 17586664]

Note that this pipeline was designed using sequencing results from MR DNA. It will need to be modified if the raw sequencing data format differs from the MR DNA standard deliverable.

About

General repo for genomic analysis of nitrifying organsims


Languages

Language:Jupyter Notebook 97.9%Language:Python 2.1%