Gaurav Sablok's repositories
expression_deep_neural_network
a deep neural expression based classifier demonstrated to fit a unbalanced dataset, expression datasets across the samples
plant_bacterial_computational_KO_phylogenemoics
This repository contains the Python R and Java code which i coded using the mathematical expression for the genome based ontologies annotation and the phylogenomics informativeness
plant_microarray_analysis
analysis of already normalized microarray expression profiles and perform batch analysis and plots the volcano plots and differential expression
plant_resistance_gene_fetcher
i coded this a custom function to fetch the dna and the protein sequence from the plant resistance gene database and get the corresponding dna_sequence and the protein_sequence.
tairaccession
a python package for working with the tair, phytozome and conversions and also the annotation and coordinates checker.
awk_shell_file_directory_size_plotter_awk
A awk based sort index way to plot the files or the directories across the dockers and intergrate this in your ~/.bashrc or the ~./zshrc or a cron job for managing the disk space across dockers
bacterial_insertion_crispr_site_checker
A faster implementation of the string search for the check of the insertion elements and the CRISPR sites if present in the genome string and then clip those insertion sites and get a clean genome.
bacterial_tolerance_rate_support_vector_machine
a support vector based machine learning to predict the tolerance rates in the bacterial infections. It uses eps-regression and although the c-type classification can be applied if you want to predict the time variable
diff_alternative_data_structure_R
I read this post today and they mentioned the diff which i have used a lot in R but i want to put this git just to show that you can also do this from a data structure point of view
gene_annotation_count_arguably
I implemented the arguably with a function to calculate the genome annotation for the microbiome and also for the other genomes. It will take a genome annotation or a text file and will prepare the count and also for the gene ontology analysis
genome_annotation_clean
A parallel encoded cluster computing Genome annotation cleaner that will take a genome annotation file and will clean them for the annotations and prepares for the machine learning
genotyping_platform_prepare
This repository contains a custom function which can be used to prepare the files for the genotyping or the sequencing. You can specify the path and the fasta files and mark them according to the desired condition for the genotyping or sequencing
gitMaker
A ruby class that will do all the tasks for the git initialize, commit, push, generating the git tokens and committing to specific branches
linear_regression_bounded_memory_linear_regression
fitting a linear regression on the height and the bolting time of the lettuce phenotypes to see if there can be a linear regression to be established
metagenomics_abundance_normalize
a metagenomics abundance normalizer which will take the abundance OTUs file and gives you a normalized ratio for plotting of the species
MiSeq-NextSeq-NovaSeq_genome_shell_assembler
A pure shell assembler that takes only the directory path and does all the cleaning of the reads, mapping, remapping and assembly. From start to finish everything by providing a simple directory path. It works with MiSeq, NextSeq, NovaSeq
pangenomeMetagenomicsNormalizer
a pangenome metagenomics normalizer, given a gene ontology based presence and absence and a species file, it first summarizes the count across the species and then takes the count of the gene ontologies and present a ratio The higher the ratio the more presence of that ontology across the species.
pbs_backup_simulator
This repository contains the code for the PBS backup simulator and you can run your code with in the PBS simulator to avoid any breakage and system configurations loss
pbs_configure_python_function
This repository contains two custom functions that will prepare the PBS files for your cluster computing. Simply call the function and it will ask for the parameters and then it will output the complete PBS file so that you can submit to the cluster
pbs_configure_R
This version contains the R code for the PBS users so that they can invoked the R session and submit the same on the PBS clusters
plant_resistance_gene_logistic_regressor
an application of the logistic regressor for the plant disease resistance genes. Given a fasta file and the corresponding expression file and a motif types which you think are associated with the plant disease resistance, if prepares the classification datasets and then fits a logistic regressor for the model building.
plant_resistance_gene_miner
I coded this plant resistance gene miner which uses a regular expression plus a web scrap approach and given a resistance gene id, it will return the genbank id
ruby_genome_annotation_iterator_large_scale
A genome annotation length calculator written in ruby. It invokes the shell subprocess with in ruby to parse the iterators at the faster rate. if you have dozens of genome sequenced, simply mention the column number and the iterator will hash the length. added support for the features as
ruby_on_rails_app_for_genomic_trait_analysis_genome
This repository contains a complete ruby on rails application for the development and analysis of genomic traits for the sequenced genomes and how it can be deployed for the machine learning. It integrates sqlite3 and postgresql as a backhand and uses bootstrap for the custom appearance.
rust_based_docker_containerization_arrays
I applied nushell rust programming approach to docker containerization and created arrays from the same. A fresh way to view the docker containerization
shell_plotter
A shell plotting function that extends a ruby framework and plots the metagenomics abundances right in your shell for checking the abundance distribution
tair_gff_ids
A set of functions which will provide easy access and cleaned gff from tair and uses a dataframe and datascience approach to get the systematic tair ids and their coordinates from the tair 10 gff version. It can be applied to any version of the tair for getting the systematic retrival of the tair ids.
transdecoder_trinity_assembly_visualization
A regular expression based trinity assembly transdecoder predictions encoder which will parse and will prepare the transcript annotations for visualization with any genome visualization kit such as pygenomeviz, mauve and others, it prepares the coordinates as tuples
warp_bioinformatics_workflows
warp bioinformatics workflows for integration into warp for launching complete workflows on the computing cluster.
warp_datascience_workflows
a collection of the warp workflows that i have written for direct integration into warp workflows. You can integrate this into your workflows. Either integrate all of them using the shell or add each workflow independently