There are 50 repositories under data-processing topic.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
A lightweight data processing framework built on DuckDB and 3FS.
A light-weight, flexible, and expressive statistical data testing library
Data transformation framework for AI. Ultra performant, with incremental processing.
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Extract Transform Load for Python 3.5+
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Scalable data pre processing and curation toolkit for LLMs
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
All-in-one text de-duplication
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
A list about Apache Kafka
Machine Learning notebooks for refreshing concepts.
Harmonious distributed data analysis in Rust.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
📈 PatternPy: A Python package revolutionizing trading analysis with high-speed pattern recognition, leveraging Pandas & Numpy. Effortlessly spot Head & Shoulders, Tops & Bottoms, Supports & Resistances. For experts & beginners. #TradingMadeEasy 🔥