Eric Xiao's repositories
spark-syntax
This is a repo documenting the best practices in PySpark.
deep-dive-into-spark
Workshop on optimizing PySpark pipelines.
Miscellaneous
Scripts and code examples. Includes Spark notes, Jupyter notebook examples for Spark, Impala and Oracle.
TwitterAPI
My own Twitter Class
1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
dbt-presto
The presto adpter plugin for dbt (https://getdbt.com)
grok_sdi_educative
Grokking the System Design Interview Course
incubator-paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
matterport-dl
A downloader for matterport virtual tours
TakeHomeDataChallenges
My solution to the book <A collection of Data Science Take-home Challenges>