Audrey Le's repositories
categorical_encoding
Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library
Enron-email-parser
Email parser written in Scala to clean unstructured dataset
iCorruptionHack
Auditing FEC Bulk Data
market-data-stream
A repository for some of the code I used in kaggle data science & machine learning tasks.
my-data-ontology
Provides a standardized extensible semantics for representing information about a person’s profile.
News-Analysis
Using Gensim and SpaCy models for topic modeling in the news, and experimenting with LTSMs and GRUs to explore features such as writing style and sentiment per news category
Predicting-ICU-Deaths
Submission for the WiDS 2020 Kaggle competition: Predicting ICU deaths using CatBoost, XGBoost and RandomForest classifiers and the MIT GOSSIS dataset
Real-Time-Stock-Updates
Published data stream to Kafka (aka Azure Event Hubs), serialized it and wrote SparkSQL queries against resulting Delta Lake tables.
social-network-sql-database
Wrote a SQLite backend in Python to add, delete, search and update users, their statuses and images. Ran unit tests, a REST API and a few optimizations in SQLite and MongoDB.
Wikipedia-Property-Graph
Scala function to transform RDF into property graph using Spark GraphX API
private-data-objects
The Private Data Objects lab provides technology for confidentiality-preserving, off-chain smart contracts.