Molitor Corentin's repositories
vargen
VarGen is an R package designed to get a list of variants related to a disease. It just need an OMIM morbid ID as input and optionally a list of tissues / gwas traits of interest to complete the results. You can also use your own customised list of genes. VarGen is capable of annotating the variants to help you identify the most impactful ones.
Solanum_sitiens_assembly
Instructions to reproduce the de novo assembly of Solanum sitiens (accession LA1974).
plot_transcripts_filtering.py
Script to plot the number of transcripts left after filtering by low expression
plotReadLengths
Python Script to create a histogram of the sequences lengths in a Fasta file (useful to get the distribution of Pacbio Reads for example)
ChromosomesOverview
R code to create bar graph of chromosomes and add on them the positions of transcripts/genes
efficientR
Efficient R programming: a book
Solanum_chilense_assembly
Scripts and files used to perform the de novo assembly of Solanum chilense (LA1972)
sra-cleaning
Python script to automatically parse a "Contamination.txt" file from the Sequence Reads Archive (SRA) and correct the assembly FASTA file and annotation GTF file.
tutorial-kmer-spectra
R markdown explaining k-mer spectra, and how sequencing errors and heterozygosity are affecting them.
GeneToCN
Gene copy number prediction from k-mer frequencies
PRS-Tutorial
A tutorial on how to run basic polygenic risk score analysis
run_pilon_batches.sh
Script to run pilon by batches of contigs (to avoid out of memory issues)
SAIGE
Development for SAIGE and SAIGE-GENE(+)
SIFT4G_Create_Genomic_DB
Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.