XiuhongTang's repositories
datahub-helm
Repository of helm charts for deploying DataHub on a Kubernetes cluster
incubator-seatunnel-web
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
tiktok-scraper
TikTok Scraper. Download video posts, collect user/trend/hashtag/music feed metadata, sign URL and etc.
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
cube-studio
云原生一站式机器学习平台,多租户,数据资产,notebook在线开发,拖拉拽任务流编排,多机多卡分布式训练,超参搜索,推理服务,多集群调度,多项目组资源组,边缘计算,大模型实时训练, ai应用商店
kubesphere
The container platform tailored for Kubernetes multi-cloud, datacenter, and edge management ⎈ 🖥 ☁️
xiaohongshu
小红书自动化,自动登录、可选择Cookie登录、支持上传图文、视频并自动发布
flink-table-store-101
Playground for Flink Table Store with use cases and performance features
Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
hudi
Upserts, Deletes And Incremental Processing on Big Data.
arroyo
Arroyo is a distributed stream processing engine written in Rust
emqx
The most scalable open-source MQTT broker for IoT, IIoT, and connected vehicles
the-algorithm
Source code for Twitter's Recommendation Algorithm
God-Of-BigData
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
chitu-sdp
赤兔实时计算平台是基于 Apache Flink 构建的企业级、一站式、高性能、低门槛实时大数据实时计算平台,广泛适用于流式数据应用开发场景。
spark
Apache Spark - A unified analytics engine for large-scale data processing
docker-hadoop
Apache Hadoop docker image
hadoop
Apache Hadoop
ddia
《Designing Data-Intensive Application》DDIA中文翻译
alldata
💥🔥 为了解决企业建设大数据平台的痛难点, 本项目旨在对Apache众多大数据平台组件进行二次开发维护,并输出一款通用的大数据平台底座,重点解决数据采集, 数据存储, 数据计算, 数据开发和数据运营场景遇到的问题与挑战, 初衷是建设开源业界领先的一站式大数据平台, 赋能成千上万个中小企业的业务快速发展, 以及给热爱大数据的开发者提供一系列解决方案。
alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
docker-krb5-server
A Krb5Server Docker Image very easy and simple to use.
ozone
Scalable, redundant, and distributed object store for Apache Hadoop
juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
flink-sql-security
FlinkSQL的行级权限解决方案及源码,支持面向用户级别的行级数据访问控制,即特定用户只能访问授权过的行,隐藏未授权的行数据。此方案是实时领域Flink的解决方案,类似离线数仓Hive中Ranger Row-level Filter方案。
mootdx
通达信数据读取的一个简便使用封装