There are 28 repositories under dataframe topic.
Modin: Scale your Pandas workflows by changing a single line of code
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Apache DataFusion SQL Query Engine
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
Koalas: pandas API on Apache Spark
C++ DataFrame for statistical, financial, and ML analysis in modern C++
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Fastest library to load data from DB to DataFrames in Rust and Python
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
AI code-writing assistant that understands data content
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Apache DataFusion Ballista Distributed Query Engine
A curated list of amazingly awesome Cybersecurity datasets
Machine learning with dataframes
Clean APIs for data cleaning. Python implementation of R package Janitor
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
A nimble options backtesting library for Python
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
A connector for Spark that allows reading and writing to/from Redis cluster
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
The most advanced data processing framework allowing to build scalable data processing pipelines and move data between various data sources and destinations.