sablokgaurav

Gaurav Sablok's repositories

python_analytics_classes

A collection of multiple python class codes for several purposes from the plots, to data structures and datascience and engineering

Language:Jupyter NotebookMIT2 10

fungal_metagenomics_ITS_coverage_calculator

I coded this function to estimate the fraction of the ITS predictions from the fungal metagenomics and it estimates by taking into account the sequence length and also the ITS1 and ITS2 start and stop coordinates. Provided a keyworded argument, it estimates the coverage accordingly

Language:PythonMIT1 10

keras_bacterial_class_machine_learning

A keras implementation of machine learning for bacterial genomes, takes a fasta file and the annotation features and the genes you want to train the keras model

Language:PythonMIT1 10

long_read_polyATGC_trimmer

A regular expression based polyATGC trimmer for the long reads or the fastq reads extremely fast and returns a fasta and also a dataframe for the sequence classification

Language:PythonMIT1 10

pacbio_oxford_nanopore_repeat_coverage

a long read repeat coverage calculator,given an long read file before assembly either direct from the sequencing runs or after the cleaning, it calculates the total amount of the repeat stretches present in the sequencing reads and you can plot them before assembly

Language:PythonMIT1 10

python_algorithms_structures_data_structures

This repository contains the codes which i have posted on linkedln solutions for the leetcode, interview query and the codewars questions and i used a different approach as compared to the approach everywhere mentioned

Language:PythonMIT1 20

plant_long_read_resistance_gene_isolator

I coded this function to make a comprehensive gene isolation for the plant resistance genes from the long reads sequencing. Given PacBio or Oxford Nanopore Reads, it will assemble, predict the plant disease resistance genes and will allow you to analyze the mutations in the plant disease resistance genes

Language:ShellMIT010

candida_literature_miner

This prepares the candida literature for the machine learning. Although prepared for candida, it can be used for any specific term that you want to search in pubmed

Language:PythonMIT020

candida_ontology_network_analyzer

A faster implementation of the gene ontology analyzer for the candida genomes, given the candida go ontology files and a search GO term, it extracts all the alt_id, relationship_ids and associated function with those gene ontology for the network analysis and to link with the expression analysis.

Language:PythonMIT010

devops_ruby_server_profiler

a ruby function to aid your devops for the sequencing reads and to run and generate the paths for the files for the entire sequencing reads cluster

Language:RubyMIT000

devops_system_util_profiler

A devops ruby system utils to get your system information and performing all the tasks on the clusters and computing storage

Language:RubyMIT000

evolutionary_fitness_calculation

A data structure approach to generate a random sequence from the polyATGC stretches for evolutionary fitnes. Given a stretch of homopolymers*2 generate a sequence where under the evolutionary fitness if the selection pressure would have acted accordingly without slip strand mutation.

Language:RubyMIT000

evolutionary_rate_analyer

A R function for the analysis of the evolutionary rates from the fasta files, and uses the ka/ks and the dn/ds and plots the evolutionary rates.

Language:RMIT000

evolutionCal

A evolutionary function in ruby which given a similarity score means the number of the similar bases and the dissimilar bases and then the sequence rate and the divergence rate calculates a ratio which can tell us how much sequencing depth to be covered

Language:RubyMIT000

exact_motif_localizer

a exact motif localizer based on the string pattern searching algorithm. It returns a dataframe with the start and the stop position which you can easily use for the extraction of those position.

Language:RMIT000

long_read_polyATGC_trimmer_recursion

A shorter version of the long_read_polyATGC_trimmer after implementing a array recompile and storing the variables while running a subtle part of the recursion.

Language:PythonMIT000

longread_bcftools_filter

making bcftools filtering easy. bcftools_filter which will allow for the faster filtering of the variant calls according to the allelic depth and the tags using simple to overlap approaches as compare to implementing the regular patterns.

Language:PythonMIT010

machine_learning_automated_framework_expression_sequences

a automated framework for the sequence to expression machine learning. it takes the fasta sequences, expression file from the expression analysis and then writes the pickle file

Language:PythonMIT010

miRNA_neural_network

A function to prepare the neural network sequence for the miRNA predictions. It uses the target prediction and the transcript and prepares the targets for the neural networks

Language:PythonMIT000

neural_network_metagenomics_metatranscriptomics_transcriptomics_hidden_layer

a function to generate the hidden layers from the given fasta and the expression files. it takes the replicate columns and then calculates the expression and length as a hidden layer. Applying to the transcriptomics, meta transcriptomics and other expression datasets

Language:PythonMIT000

pacbio_nanopore_R_function_to_estimate_the_proportion_of_patterns

A implication of stringr package to calculate the pattern detection in R using the stringR package for the pacbio and the oxford nanopore reads

Language:RMIT010

pacbio_oxford_assembly_binding_ruby_motif_scanner

a fasta class to make genomic sliding window for the if you have a binding site and you want to prepare the motifs for the binding site as a sliding window you can also use the same. You can also search the sliding window motifs starting with the particular tag to see how many of them are generated.

Language:RubyMIT000

pdf_BERT_literature_trainining_classification

python class for literature training from biomedical literature. It reads the text from the pdf and then implements the tokens and then uses the BERT model to train the model

Language:PythonMIT010

plant_long_read_resistance_gene_variant_estimator

This contains a complete workflow for the estimation of variant estimators from the long reads and it estimates using the NBS LRR specific resistance genes.

Language:ShellMIT010

recurrent_neural_network_sequence_classification

implementing a RNN neural network for the sequence classification network to identify and train the plant organelles to identify the subset

MIT000

ruby_gem_creator

A ruby class which will create your ruby gem creating easy. It has two methods one to make the template and one to write the template. You can check the template before writing and then simply write the template for gem creation.

Language:RubyMIT000

scalable_parallel_faster_genomic_transcriptomics_annotations

This repository contains a scalable and faster implementation for the genome and the transcriptome annotations for large scale sequencing datasets.

Language:PythonMIT010

sequence_evolutionary_rate_function

applying sequence evolutionary function to multitude of files estimating it across the sites and the branch models and estimating the lineage evolutionary pressure

Language:PythonMIT000

sequencing_barcoding_labelling_generator

A ruby function to generate dna sequence barcodes for sequencing labelling. It takes a barcode length, and the iteration you want to produce.

Language:RubyMIT000

tair_pubmed_connector

There is no function to fetch automatically the information on the reported pubmed articles links in the tair to be used for the language models, so i coded this function which will take the tair information, a gene or locus tag and will fetch the corresponding pubmed and then from the pubmed the corresponding abstracts

Language:PythonMIT010