There are 77 repositories under etl topic.
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
An orchestration platform for the development, production, and observation of data assets.
Fancy stream processing made operationally mundane
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Actively curated list of awesome BI tools. PRs welcome!
A Python stream processing engine modeled after Yahoo! Pipes
🧙 Mage is an open-source tool for building and running data pipelines that transform your data.
Desktop application to efficiently search and analyze super-structured data. Powered by Zed.
Sync data between persistence engines, like ETL only not stodgy
A lightweight stream processing library for Go
a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
This repository is a getting started guide to Singer.
Example project implementing best practices for PySpark ETL jobs and applications.
React components to build CSV files on the fly basing on Array/literal object of data
AIStore: scalable storage for AI applications
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Logical Replication extension for PostgreSQL 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc.
The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
:crystal_ball: Transform, query, and download geospatial data on the web.
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
A simplified, lightweight ETL Framework based on Apache Spark
A serverless cluster computing system for the Go programming language
The premier open source Data Quality solution
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.