KennethanCeyer / awesome-data-pipeline

Awesome list for datapipeline

Home Page:https://github.com/KennethanCeyer/awesome-data-pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Data Pipeline Awesome

Awesome list for Data Pipeline

Awesome Data Pipeline - Awesome list for data pipeline

Data Pipeline is:
A series that moves data from source to destination efficiently and automatically.

Contents

Components

Workflow Management

Data Ingestion

  • Apache Flume - (Apache foundation / Data Ingestion / Open Source / Free).
  • Stitch - (Talend / ETL / Subscription fee).
  • Logstash - (Elastic / Data Ingestion / Cloud or On-prem / Hybrid fee).
  • Filebeat - (Elastic / Data Ingestion / Cloud or On-prem / Hybrid fee).
  • Fluentd - (CNCF foundation / Open Source / Free or License fee).
  • Datadog - (Datadog / Cloud / APM / Subscription fee).
  • New Relic - (New Relic / Cloud / APM / Subscription fee).

Data Lake

Data Warehouse

  • Aapache Hive - (Apache foundation / Hadoop-friendly / MapReduce / Free).
  • Snowflake - (Multi-cloud / SQL-friendly / Subscription fee).
  • AWS Redshift - (AWS Cloud / SQL-friendly / Subscription fee).
  • Azure Synapse Analytics - (Azure Cloud / SQL-friendly / Subscription fee).
  • GCP BigQuery - (Google Cloud / SQL-friendly / On-demand fee).
  • IBM DB2 - (IBM / On-prem / SQL-friendly / Subscription fee).

Data Store

  • Apache Druid - (Apache foundation / Real-time datastore / Free).
  • Apache Pinot - (Apache foundation / Real-time datastore / Free).
  • AWS Aurora - (AWS Cloud / Rich-cloud datastore / Subscription fee).
  • GCP Cloud Spanner - (Google Cloud / HA datastore that breaks away from CAP / Subscription fee).
  • Azure Cosmos DB - (Azure Cloud / NoSQL datastore / Subscription fee).

Query Engine

  • Presto - (Facebook / Open Source / SQL-friendly / Free or License fee).
  • Apache Impala - (Apache foundation / Cloudera / Open Source / SQL-friendly / Free or License fee).
  • AWS Athena - (AWS Cloud / SQL-friendly / On-demand fee).
  • AWS Redshift Spectrum - (AWS Cloud / SQL-friendly / On-demand fee).

Streaming

  • Apache Kafka - (Apache foundation / Confluent / Linkedin / Message Broker / Open Source / Free or License fee).
  • RabbitMQ - (VMWare / Messaging Queue / Free or License fee).
  • AWS Kinesis - (AWS Cloud / Message Broker / Subscription fee).
  • AWS SQS - (AWS Cloud / Messaging Queue / Subscription fee).
  • GCP PubSub - (Google Cloud / Message Borker / Subscription fee).
  • Azure Event Hub - (Azure Cloud / Messsage Borker / Subscription fee).

Data Transformation

  • Apache Spark - (Apache foundation / Databricks / In-memory processing / Open Source / Free or License fee).
  • Apache Beam - (Apache foundation / Google / Data processing / Open Source / Free or License fee).
  • Apache Storm - (Apache foundation / Backtype / Twitter / Stream processing / Open Source / Free).
  • Apache Flink - (Apache foundation / Stream processing / Open Source / Free).
  • AWS Glue - (AWS Cloud / Integrated Data System / ETL / On-demand fee).

Data Analysis

  • Apache Superset - (Apache foundation / Airbnb / Business Intelligence (BI) / Open Source / Free).
  • Apache Airpal - (Apache foundation / Airbnb / Query Editor / Open Source / Free).
  • Apache HUE - (Apache foundation / Cloudera / Query Editor / Open Source / Free).
  • Kibana - (Elastic / Dashboard / Hybrid fee).
  • Databricks Notebook - (Databricks / Notebook / Hybrid fee).
  • Jupyter Notebook - (Jupyter / Notebook / Open Source / Free).
  • Pandas - (NumFOCUS / Data processing / Open Source / Free).
  • Plotly - (Plotly / Data visualization / Hybrid fee).

Data Format

  • Apache Parquet - (Apache foundation / Data Format / Open Source / Free).
  • Apache ORC - (Apache foundation / Hortonworks / Facebook / Data Format / Open Source / Free).
  • Apache Avro - (Apache foundation / Data Format / Open Source / Free).
  • Apache Kudu - (Apache foundation / Cloudera / Data Format / Open Source / Free).
  • Apache Arrow - (Apache foundation / Data Format / Open Source / Free).
  • Delta - (Databricks / Data Format / Free or License fee).
  • JSON - (Data Format / Free).
  • CSV - (Data Format / Free).
  • TSV - (Data Format / Free).
  • HDF5 - (The HDF Group / Data Format / Open Source (licensed by HDF5) / Free).

Business Intelligence

  • Apache Zeppelin - (Apache foundation / Business Intelligence (BI) / Open Source / Free or License fee).
  • Tableau - (Salesforce / Business Intelligence (BI) / Hybrid fee).
  • Redash - (Redash Inc / Databricks / Business Intelligence (BI) / Hybrid fee).
  • Looker - (Looker Data Sciences Inc / Business Intelligence (BI) / Subscription fee).
  • Data Studio - (Google Cloud / Business Intelligence (BI) / Free).
  • PowerBI - (Microsoft / Business Intelligence (BI) / Subscription fee).

AI/ML

  • H2O - (H2O.ai / Model Evaluation / Subscription fee).
  • Feast - (Tecton / Gojek / Feature Store / Open Source / Free).
  • Vertex AI - (Google Cloud / Hybrid Features for AI / Subscription fee).
  • Data Robot - (DataRobot Inc / Feature Engineering / Subscription fee).
  • WandB - (Weights & Biases / Model Evaluation / Subscription fee).

Community

Vendors

Open Source / Foundation

Materials

Books

Dummies Guide