Gaurav Sablok's repositories
python_analytics_classes
A collection of multiple python class codes for several purposes from the plots, to data structures and datascience and engineering
fungal_metagenomics_ITS_coverage_calculator
I coded this function to estimate the fraction of the ITS predictions from the fungal metagenomics and it estimates by taking into account the sequence length and also the ITS1 and ITS2 start and stop coordinates. Provided a keyworded argument, it estimates the coverage accordingly
large_scale_genomic_alignment_extraction
A scalable large scale genomic fraction aligner and extractor for the large scale alignment of the genomes and the transcriptomes and process them over the cores for the extraction of the aligned regions. The aligned regions can also be mapped to the length plotter and can be machine trained for specific applications
linear_regression_training_model_based_on_sequence_characteristics
I coded this linear regression based training model based on the sequence features across the sequences. It has two arguments, just train the model or train and predict the model
numpy_shell_builder
A numpy shell builder to extract and how to use the numpy across the arrays.I am putting the entire manual for those who like to search immediately rather than looking here and there.
pacbio_oxford_nanopore_repeat_coverage
a long read repeat coverage calculator,given an long read file before assembly either direct from the sequencing runs or after the cleaning, it calculates the total amount of the repeat stretches present in the sequencing reads and you can plot them before assembly
python_algorithms_structures_data_structures
This repository contains the codes which i have posted on linkedln solutions for the leetcode, interview query and the codewars questions and i used a different approach as compared to the approach everywhere mentioned
slurm_pbs_cluster_scripts
SLURM and PBS scripts for Illumina and Long read genome assembly, transcriptome and metagenomics and comparative analysis
plant_long_read_resistance_gene_isolator
I coded this function to make a comprehensive gene isolation for the plant resistance genes from the long reads sequencing. Given PacBio or Oxford Nanopore Reads, it will assemble, predict the plant disease resistance genes and will allow you to analyze the mutations in the plant disease resistance genes
bacterial_disease_model_attributable_fractions
This repository contains the risk model function of the disease model in virbrio infections and how can be modeled to estimate the attributable rates
bacterial_plant_fungal_domain_analyzer
This repository contains a datascience based faster implementation of the domain predictions from the interpro scan and it will give you a complete domains information, coordinates and other associative information. I used a mapping dataframe approach to make it faster rather than looping it over and over.
bacterial_plant_fungal_domain_directed_graphs
This repository contains a function which will prepare the domain graphs analysis, if you will specify a domain or an interpro, it will give you all the parent and the child graphs for the directed and undirected graphs modelling
candida_ontology_network_analyzer
A faster implementation of the gene ontology analyzer for the candida genomes, given the candida go ontology files and a search GO term, it extracts all the alt_id, relationship_ids and associated function with those gene ontology for the network analysis and to link with the expression analysis.
cookiecutter
A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
cookiecutter-django
Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.
cookiecutter-django-rest
Build best practiced apis fast with Python3
cookiecutter-flask
A flask template with Bootstrap, asset bundling+minification with webpack, starter templates, and registration/authentication. For use with cookiecutter.
dive
A tool for exploring each layer in a docker image
djangopackages
Django Packages is a directory of reusable apps, sites, tools, and more for your Django projects.
genome_transcriptome_annotation_make
A function to make the genome and the transcriptome annotations to reflect the gene regions and how they should be displayed. Provided a gff file and asked annotation and other columns it uses pygenomeviz to make all the annotation maps.
genomics_datascience_quick_bash
This repository has been made to assist you in writing the bash based workflow and this includes how to do normal BASH based task and how to develop and deploy workflows on the cluster
longread_bcftools_filter
making bcftools filtering easy. bcftools_filter which will allow for the faster filtering of the variant calls according to the allelic depth and the tags using simple to overlap approaches as compare to implementing the regular patterns.
odd_ratio_estimator_from_specific_geographical_location
This function will take a data frame of the outbreak and will predict the odd ratios and the specific likelihood of occurrence of the disease in that specific geographical location
pbs_altair_pro_bash_manual
This repository contains the code for the PBS Altair Pro at CHPC and you can save this code ending with .sh and run the script as .sh and you dont have to remember the PBS Pro manual.
phytozome_pacid_fetcher
this function takes ids file with the gene of interest and the phytozome gff files and will fetch the pacid for the genes of interest.
pubmed_indexer_abstract_fetcher
This function will prepare the abstract and the id information for all the pubmed articles that you want to read and have as a citation. I coded this using a web scraping approach and it is blazing fast and parses better than ncbi eutils
scalable_parallel_faster_genomic_transcriptomics_annotations
This repository contains a scalable and faster implementation for the genome and the transcriptome annotations for large scale sequencing datasets.
seagrass_supplementary_seagrassdb
This repository contains the sequence repository for the seagrasses paper transcriptome assembly and database. The server is down, please use the files for the further analysis such as BLAST and comparative analysis
tair_pubmed_connector
There is no function to fetch automatically the information on the reported pubmed articles links in the tair to be used for the language models, so i coded this function which will take the tair information, a gene or locus tag and will fetch the corresponding pubmed and then from the pubmed the corresponding abstracts
ZSH_POSH_web_scrapping
ZSH_POSH_web_scrapping: This repository contains the bash based web scrapping if you want to install the nerd fonts for programming