Gaurav Sablok's repositories
python_analytics_classes
A collection of multiple python class codes for several purposes from the plots, to data structures and datascience and engineering
fungal_metagenomics_ITS_coverage_calculator
I coded this function to estimate the fraction of the ITS predictions from the fungal metagenomics and it estimates by taking into account the sequence length and also the ITS1 and ITS2 start and stop coordinates. Provided a keyworded argument, it estimates the coverage accordingly
keras_bacterial_class_machine_learning
A keras implementation of machine learning for bacterial genomes, takes a fasta file and the annotation features and the genes you want to train the keras model
long_read_polyATGC_trimmer
A regular expression based polyATGC trimmer for the long reads or the fastq reads extremely fast and returns a fasta and also a dataframe for the sequence classification
pacbio_oxford_nanopore_repeat_coverage
a long read repeat coverage calculator,given an long read file before assembly either direct from the sequencing runs or after the cleaning, it calculates the total amount of the repeat stretches present in the sequencing reads and you can plot them before assembly
python_algorithms_structures_data_structures
This repository contains the codes which i have posted on linkedln solutions for the leetcode, interview query and the codewars questions and i used a different approach as compared to the approach everywhere mentioned
plant_long_read_resistance_gene_isolator
I coded this function to make a comprehensive gene isolation for the plant resistance genes from the long reads sequencing. Given PacBio or Oxford Nanopore Reads, it will assemble, predict the plant disease resistance genes and will allow you to analyze the mutations in the plant disease resistance genes
candida_literature_miner
This prepares the candida literature for the machine learning. Although prepared for candida, it can be used for any specific term that you want to search in pubmed
candida_ontology_network_analyzer
A faster implementation of the gene ontology analyzer for the candida genomes, given the candida go ontology files and a search GO term, it extracts all the alt_id, relationship_ids and associated function with those gene ontology for the network analysis and to link with the expression analysis.
devops_ruby_server_profiler
a ruby function to aid your devops for the sequencing reads and to run and generate the paths for the files for the entire sequencing reads cluster
devops_system_util_profiler
A devops ruby system utils to get your system information and performing all the tasks on the clusters and computing storage
evolutionary_fitness_calculation
A data structure approach to generate a random sequence from the polyATGC stretches for evolutionary fitnes. Given a stretch of homopolymers*2 generate a sequence where under the evolutionary fitness if the selection pressure would have acted accordingly without slip strand mutation.
evolutionary_rate_analyer
A R function for the analysis of the evolutionary rates from the fasta files, and uses the ka/ks and the dn/ds and plots the evolutionary rates.
evolutionCal
A evolutionary function in ruby which given a similarity score means the number of the similar bases and the dissimilar bases and then the sequence rate and the divergence rate calculates a ratio which can tell us how much sequencing depth to be covered
exact_motif_localizer
a exact motif localizer based on the string pattern searching algorithm. It returns a dataframe with the start and the stop position which you can easily use for the extraction of those position.
long_read_polyATGC_trimmer_recursion
A shorter version of the long_read_polyATGC_trimmer after implementing a array recompile and storing the variables while running a subtle part of the recursion.
longread_bcftools_filter
making bcftools filtering easy. bcftools_filter which will allow for the faster filtering of the variant calls according to the allelic depth and the tags using simple to overlap approaches as compare to implementing the regular patterns.
machine_learning_automated_framework_expression_sequences
a automated framework for the sequence to expression machine learning. it takes the fasta sequences, expression file from the expression analysis and then writes the pickle file
miRNA_neural_network
A function to prepare the neural network sequence for the miRNA predictions. It uses the target prediction and the transcript and prepares the targets for the neural networks
neural_network_metagenomics_metatranscriptomics_transcriptomics_hidden_layer
a function to generate the hidden layers from the given fasta and the expression files. it takes the replicate columns and then calculates the expression and length as a hidden layer. Applying to the transcriptomics, meta transcriptomics and other expression datasets
pacbio_nanopore_R_function_to_estimate_the_proportion_of_patterns
A implication of stringr package to calculate the pattern detection in R using the stringR package for the pacbio and the oxford nanopore reads
pacbio_oxford_assembly_binding_ruby_motif_scanner
a fasta class to make genomic sliding window for the if you have a binding site and you want to prepare the motifs for the binding site as a sliding window you can also use the same. You can also search the sliding window motifs starting with the particular tag to see how many of them are generated.
pdf_BERT_literature_trainining_classification
python class for literature training from biomedical literature. It reads the text from the pdf and then implements the tokens and then uses the BERT model to train the model
plant_long_read_resistance_gene_variant_estimator
This contains a complete workflow for the estimation of variant estimators from the long reads and it estimates using the NBS LRR specific resistance genes.
recurrent_neural_network_sequence_classification
implementing a RNN neural network for the sequence classification network to identify and train the plant organelles to identify the subset
ruby_gem_creator
A ruby class which will create your ruby gem creating easy. It has two methods one to make the template and one to write the template. You can check the template before writing and then simply write the template for gem creation.
scalable_parallel_faster_genomic_transcriptomics_annotations
This repository contains a scalable and faster implementation for the genome and the transcriptome annotations for large scale sequencing datasets.
sequence_evolutionary_rate_function
applying sequence evolutionary function to multitude of files estimating it across the sites and the branch models and estimating the lineage evolutionary pressure
sequencing_barcoding_labelling_generator
A ruby function to generate dna sequence barcodes for sequencing labelling. It takes a barcode length, and the iteration you want to produce.
tair_pubmed_connector
There is no function to fetch automatically the information on the reported pubmed articles links in the tair to be used for the language models, so i coded this function which will take the tair information, a gene or locus tag and will fetch the corresponding pubmed and then from the pubmed the corresponding abstracts