Delvin So's repositories
covid19_unique_tweets
An on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000 tweets collected since mid-January 2020.
friends-tv-show-analysis
Analysis of the Friends series by mining transcripts of all 236 episodes.
spotify-predict-playlist-followers
A repository outlining the retrieval of Spotify's featured playlists and track level characteristics, feature engineering, exploratory data analysis, and modelling of a playlist's success based on followers.
scraping-and-analyzing-aggregate-review-sites
Can we identify fraudulent behaviour using inferential testing?
topic-modelling-subreddit-toronto
Using Google Big Query and Topic Modelling to understand /r/Toronto
a-2017
Public Repository for cs109a, 2017 edition
abstract-screening
manuscript code
exome-report-scripts
Scripts for handling exome reports output from CCM's various pipelines. These are supplementary to those scripts found in `report-scripts` in the CCM repo.
kidney_label_classifier
convulational neural network for predicting ultrasound views
chicago_crime
An explatory analysis of Chicago crime from 2001 to 2018.
crg2
Research pipeline for exploring clinically relevant genomic variants
divvy-chicago
Code for '14 Million Bike Rides in The Windy City (2013 - 2017)' on my site
label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
OCCC
Repository for OCCC related tidbits
perl_scripts
Random assortment and snippets of perl scripts. See the readme for more details.
pushshift-most-requent-words-posts
Using Google Big Query and Pushshift to Analyze Occurences of Words in Titles of Reddit Submissions
rna-seq_master_script_primers
Primers for creating master shell scripts commonly associated with RNA-seq tools and their analysis of NGS data.
zillow-nyc-housing-scrape-prediction
Scraping NYC property data from Zillow, GIS feature engineering, and predictive modelling