Evan Sun's repositories
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
aws_notebook
aws notebook
dask
Parallel computing with task scheduling
dolphinscheduler
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
dva-example-user-dashboard
👲 👬 👨👩👧 👨👩👦👦
elyra
Elyra extends JupyterLab Notebooks with an AI centric approach.
enterprise_gateway
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
hadoop-yarn-api-python-client
Python client for Hadoop® YARN API
hudi
Upserts, Deletes And Incremental Processing on Big Data.
hue
Open source SQL Query Assistant service for Databases/Warehouses
incubator-livy
Mirror of Apache livy (Incubating)
incubator-seatunnel
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
ipython-sql
%%sql magic for IPython, hopefully evolving into full SQL client
jupyter_client
Jupyter protocol client APIs
jupyter_server
The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications.
jupyterhub
Multi-user server for Jupyter notebooks
mage-ai
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
python
Official Python client library for kubernetes
scala
Scala 2 compiler and standard library. For bugs, see scala/bug
skein
A tool and library for easily deploying applications on Apache YARN
spark
Apache Spark - A unified analytics engine for large-scale data processing
sudospawner
Spawn JupyterHub single-user servers with sudo
superset
Apache Superset is a Data Visualization and Data Exploration Platform
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
watchdog
Python library and shell utilities to monitor filesystem events.
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow