Drake Wang's repositories
fast-data-dev
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
ByConity
ByConity is an open source cloud-native data warehouse
cloudera-playbook
Cloudera deployment automation with Ansible
delta-sharing
An open protocol for secure data sharing
dolphinscheduler
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
facebook-hive-udfs
Facebook's Hive UDFs
impala
Apache Impala
starrocks
StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
flink-parquet-demo
A simple demo to use parquet format to write hdfs file.
infoworld-post
Code examples for a blog post on infoworld.com
Java2Scala
Some demo code while playing with Java & Scala
jdbook_crawler
craw jd.com book infomation
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
medium-blog-kafka-udemy
Supporting repository for the blog post at https://medium.com/@stephane.maarek/how-to-use-apache-kafka-to-transform-a-batch-pipeline-into-a-real-time-one-831b48a6ad85
nifi
Mirror of Apache NiFi
openai-cookbook
Examples and guides for using the OpenAI API
openbilibili-go-common
🙈!🙉!🙊!我不清楚这些是啥… 想谈道德的请把出门右转996.icu!
scala-labs-exercises
my standalone version of scala-labs project
spark
Apache Spark - A unified analytics engine for large-scale data processing
SparkInternals
Notes talking about the design and implementation of Apache Spark
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)