Laurence Geng (bluishglc)

bluishglc

Geek Repo

Location:Shanghai, China

Home Page:https://laurence.blog.csdn.net/

Github PK Tool:Github PK Tool

Laurence Geng's starred repositories

awesome-public-datasets

A topic-centric list of HQ open datasets.

paper-reading

深度学习经典、新论文逐段精读

License:Apache-2.0Stargazers:26505Issues:725Issues:0

datahub

The Metadata Platform for your Data Stack

Language:JavaLicense:Apache-2.0Stargazers:9755Issues:252Issues:2172

handson-ml3

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7470Issues:137Issues:105

YCSB

Yahoo! Cloud Serving Benchmark

Language:JavaLicense:Apache-2.0Stargazers:4928Issues:215Issues:953

docker-hadoop

Apache Hadoop docker image

Language:ScalaLicense:Apache-2.0Stargazers:584Issues:373Issues:62

docker-hadoop-spark

Multi-container environment with Hadoop, Spark and Hive

Language:ShellStargazers:195Issues:9Issues:0

bdp

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

nyc-tlc-data

Backup for NYC TLC data for the DE Zoomcamp course

ssb-dbgen

Star Schema Benchmark dbgen

ssb-kylin

Star Schema Benchmark Tool for Apache Kylin

Language:CLicense:Apache-2.0Stargazers:96Issues:127Issues:2

tpcds-kit

TPC-DS benchmark kit with some modifications/fixes

Language:CStargazers:87Issues:69Issues:0

docker-hadoop-workbench

A Hadoop cluster based on Docker, including Hive and Spark.

Language:ShellLicense:Apache-2.0Stargazers:78Issues:4Issues:5

hive-testbench

Testbench for experimenting with Apache Hive at any data scale.

kafkaproxy

kafkaproxy is a reverse proxy for the wire protocol of Apache Kafka.

Language:JavaLicense:Apache-2.0Stargazers:64Issues:8Issues:25

kafka-connect-examples

Kafka Connect Examples

Language:ShellStargazers:42Issues:2Issues:0

lastfm-dataset-2020

New Last.fm Dataset 2020 for music auto-tagging purposes.

Language:PythonLicense:MITStargazers:28Issues:2Issues:2

flink-sql-CDC

Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! :whale:

Language:DockerfileStargazers:24Issues:1Issues:0

aasPractice

《spark高级数据分析》练习

Language:ScalaStargazers:22Issues:0Issues:0

serverless-datalake-example

A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.

Language:ShellStargazers:16Issues:0Issues:0

apache-hudi-core-conceptions

A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.

Language:Jupyter NotebookStargazers:10Issues:0Issues:0

ranger-emr-cli-installer

This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLDAP & Windows AD both, and works in all AWS regions (also including China regions).

aws-cli-plus

This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.

Language:ShellStargazers:1Issues:0Issues:0