lvn3668's repositories
fastqcparser
Python code to compute adatper content in reads, kmer content, per-base-GC content (at a specific position in a read alignment, against reference genome), per base NC content (at a specific position in a read alignment against the reference genome), per base seq quality (across aligned reads), per base sequence content, per base quality scores, per tile sequence quality
rebase-using-kmp
restriction enzyme cleavage site identifier using kmp algorithm and REBASE
BaselineSurveyResponseSimulator
Simulation of baseline survey response variables to store information in EPIC Number of variables: 495 Code: Java 8 Reads in csv file of variables and value range of permitted responses Use randomization to a) induce error b) generate valid responses, and simulate 1M responses, to use as prototype data to populate i2b2 deployments in support of the MVP project for VABHS
caBIO-load-scripts
ETL scripts to populate caBIO database
caMODloadscripts
ETL scripts to populate caMOD database with MTB data from Jackson Labs
DNAExtractionModule
Code that polls beckman coulter and stores rack information, and protocols initiated (if any) on samples.
DNAtoProteintranslation
Converts DNA to Protein along 1, 6 (fwd or reverse strand) or user defined frames
findNmerfrequencies
C++ code to calculate nmer frequencies (n= 1 to 6) and write out to file
findPalindromesandInvertedRepeats
Finds palindromes and inverted repeats in DNA Sequences based on user defined inputs
gatkparser
Python package to parse GATK Output and extract summary statistics at mbq 0,10,20,30 and variant evaluation metrics
InterferenceEstimation
Java based implementation of an MLE method using chi square test to calculate interference during meiotic crossover (the number of double strand dna breaks that don't result in a crossover)
LaminarFlowHoodModule
Module for tracking tubes and aliquots and assign storage in the freezers ; Part of the MVP specimen processing system VABHS. Prototype
Microarray
Microarray data analysis using R / BioConductor
naivevariantcaller_ECGR_variantdetection
Python code to detect ECGR Mutations; Takes a reference genome and bunch of reads as input and finds mutations (1-3 bp length) where number of supporting reads greater than 5
Phycastats
16sRNA Microbial Profiling R scripts to find most significant OTUs in 16RNA data after data normalization, followed by ordination and clustering and then plotting iTOL
pileupnotationvariantcaller
Variant caller from pileup notation / samtools alignment
probeDesign
Takes as input FNA file, PTT file, desired probe length, cross-reactivity allowed, overhang
RShinyEntrezViewer
Application to view Entrez data (distribution of Hs / Mm genes per chromosome) using RShiny and MongoDB
samtoolsparsers
Parses Samtools output and extracts flagstat results such as number of reads that are pass/fail that are properly aligned, etc.
UMBICARB
1.Partial Scripts to process sequence clusters from 16000 microbial genomes to find orthologous protein clusters, using the most representative sequence per cluster 2. Find fold distribution across protein hits from SCOP and ASTRAL 3. Fnd most significant structural hits and perform structure alignment 4. Eliminate LGT in sequence clusters and realign phylogenetic tree for each of the pruned set of sequence clusters (pruned on basis of number of sequences, most representative seuqence not being an LGT, age in reference phylogenetic tree) 5. Correlate gaps in seuqence alignment with gaps in sequence-represenation of structure alignment to test hypothesis that indels cause fold evolution.
variantAnnotation
Variant annotation of vcf file using exac and vep
KmerCounter
Kmer counter is written in GO Lang v 1.16.5 To install GO on Windows, follow the instructions at https://golang.org/doc/install 4 GO implementation of N-mer counter in DNA sequences which tests for validity of input. It reads in file name (of fasta file) It reads the size length (kmer length) for which counts are desired and writes out to file, counts of all overlapping kmers of size 1 through the specified input. It checks if fasta file is empty amd whether kmer length is specified.