Olivia Liang's repositories
Time-Series-Forcasting-Seq2Seq
A time series forecasting project from Kaggle that uses Seq2Seq + LSTM technique to forecast the headcounts. Detailed explanation on how the special neural network structure works is provided.
Statistical-Similarity-Measurement
A methodology designed to validate the statistical similarity of synthetic data generated by GAN models. The metrics contain Auto-encoder, PCA, t-SNE, KL-divergence, Clustering, and Cosine Similarity.
All-About-Movie-Data
A hub that stores data science and analytics done on movie related data. The techniques used include EDA, NLP topic analysis, Recommender System, and advanced visualization in Tableau
COVID-19-Forecasting
A self-driven project utilizing ARIMA, Seq2Seq, and XGBoost to help design the COVID19 forecasting algorithm. Data sources are from Kaggle Competition and JHU CSSE.
Sport-Analytics
An interesting project that creatively applied clustering and Association rules mining on soccer match data to find insights for better player formation strategies for soccer teams.
Causal-Inference-Experiment
An A/B testing project done with survey to examine whether people are more likely to click on thumbnails that contain their own racial features.
AWS-Click-Prediction
A big data project that utilizes E3, Athena, EMR, SageMaker and QuickSight on AWS to build Random Forest and xgBoost model in Spark and SQL that predict the CTR of ads on a large relational database.