This Repo contains my own 'study notes' as I learn genomic-scale cloud bioinformatics. It includes descriptions of common tools, platforms and summaries of my work with clients. I update this Repo frequently. It is organized via the folder structure shown below.
- 🗒️ Concepts and Terms (genomics files types, use cases, terminology and also whitepapers)
- 🔬 Lab Testing (Illumina and more)
- ⚒️ Genomic Tools (GATK, VariantSpark, HAIL and many more - this section updates OFTEN)
- 📦 Genomics Platforms (Terra.bio, Galaxy Project, IDSeq and others)
- ☁️ Public Cloud Genomics (Alibaba Cloud, AWS, Azure or GCP). The general approach is to implement a cloud-native Data Lake pattern for scalable genomic analysis. A conceptual rendering of this pattern is shown below.
- 📚 LLMs for Bioinformatics (Reading List). So many papers and tools are being published in this area. Here's what I am reading now.
In addition to this Repo, I have a number of other Repos with cloud bioinformatics information. Also, I've included two of my favorite link aggregator resources here for additional learning.
- GENERAL CLOUD - my
learn-cloud
Repo - https://github.com/lynnlangit/learning-cloud - GCP - my
gcp-for-bioinformatics
open source course - https://github.com/lynnlangit/gcp-for-bioinformatics - AWS - my
aws-for-bioinformatics
open source course - https://github.com/lynnlangit/aws-for-bioinformatics - WDL language - my
learn-wdl
open source course - https://github.com/openwdl/learn-wdl
- a link Collection : link to Repo (awesome bioinformatics) with large number of curated links for learning about bioinformatics tools and topics
- bioinformatics benchmark papers - link links to published benchmark papers for bioinformatics
The Data Lake (or Data Mesh [Lake of Lakes]) pattern is key for implementing bioinformatics workloads effectively on any public cloud. Shown below is a simple conceptual explanation of this key concept.
Teri is the impetus for my movement into the world of genomic research. She was diagnosed with breast cancer in 2016. She survived, but suffered a long course of intense and painful treatment due in part to the lack of availability of personalized treatment options at the time of her diagnosis.