Greg Gavelis (ggavelis)

ggavelis

Geek Repo

Company:Bigelow Laboratory for Ocean Science

Location:Boothbay, Maine

Home Page:https://www.researchgate.net/profile/Greg-Gavelis

Github PK Tool:Github PK Tool

Greg Gavelis's repositories

Language:NextflowStargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

condense_InterProScan_annots

InterProScan is useful, but its annots are multiline and redundant. This collapses them into a single, human-readable line for each annotated sequence.

Language:PythonStargazers:0Issues:1Issues:0

HGT_v_Contamination_assessor

How can we discriminate contaminants from HGT? Alien indices are often used to screen out foreign sequences, but can 'overclean' by removing bona fide HGT. This script leverages metadata about each DNA/AA sequence (i.e. whether it is spliced, has a polyA tail or spliced leader), and uses that to assess the extent to which AI-based cleaning is removing legitimate HGT.

Language:PythonStargazers:0Issues:1Issues:0

infer_splice_variants_from_Trinity

Did you know that Trinity predicts splice variants? (Chrysalis works even for de novo transcriptomes, and its input--though heuristic--is valuable). Likewise, TransDecoder can predict multiple ORFs per protein--potentially capturing alternative splicing. These predictions are usually lost once we rename our proteins to shorter seqids. This script stores and abbreviates potential splicing info for later use--e.g. for discerning prokaryotic from eukaryotic transcripts.

Language:PythonStargazers:0Issues:1Issues:0

Protein_renamer

Tools to add phylogeny-ready names (including accession, genus, species, lineage & taxid) to protein fastas from any of (A) genbank (B) SRA (C) Genome_paper_supp_data

Language:PythonStargazers:0Issues:0Issues:0

tblastn_exon_stitcher

Need an ORF from an unannotated genome? This script exploits ncbi's ability to tBLASTn against genome assemblies, to get provisional exon sets from BLAST query-hit alignments.

Language:Jupyter NotebookStargazers:0Issues:1Issues:0