data-processing

data-processing

Geek Repo

Github PK Tool:Github PK Tool

data-processing's repositories

kafka-embedded

Runs embedded, in-memory Apache Kafka instances. Helpful for integration testing.

Language:ScalaLicense:NOASSERTIONStargazers:0Issues:0Issues:0

kafka-manager

A tool for managing Apache Kafka.

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

kangaroo

Hadoop utilities for Kafka

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

klio

Smarter data pipelines for audio.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mpire

A Python package for easy multiprocessing, but faster than multiprocessing

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Neuraxle

Build neat pipelines with the right abstractions to do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

rabit

Reliable Allreduce and Broadcast Interface for distributed machine learning

Language:C++License:BSD-3-ClauseStargazers:0Issues:0Issues:0
Language:ScalaStargazers:0Issues:0Issues:0

bloop

A hot bloop for your productivity

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

crawler4j

Open Source Web Crawler for Java

Language:JavaStargazers:0Issues:0Issues:0

dask

Task scheduling and blocked algorithms for parallel processing

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

dataduct

DataPipeline for humans.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

disque

Disque is a distributed message broker

Language:CLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

emr-bootstrap-actions

This repository hold the Amazon Elastic MapReduce sample bootstrap actions

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

faust

Python Stream Processing

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

fireant

Data analysis and reporting tool for quick access to custom charts and tables in Jupyter Notebooks and in the shell.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

flink

Mirror of Apache Flink

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

gain

Web crawling framework based on asyncio for everyone.

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, Baidu and others) by using proxies (socks4/5, http proxy) and with many different IP's, including asynchronous networking support (very fast).

Language:PythonStargazers:0Issues:0Issues:0

grpc-java

The Java gRPC implementation. HTTP/2 based RPC

Language:JavaLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

HiBench

HiBench is a Hadoop benchmark suite.

Language:JavaLicense:NOASSERTIONStargazers:0Issues:0Issues:0

hydra

Hydra is a framework for elegantly configuring complex applications

License:MITStargazers:0Issues:0Issues:0

Persimmon

A visual dataflow programming language for sklearn

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

pyspider

A Powerful Spider System with Web UI

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:JavaLicense:NOASSERTIONStargazers:0Issues:0Issues:0

samoa

SAMOA (Scalable Advanced Massive Online Analysis) is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

spark-redshift

Spark and Redshift integration

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Stream-Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0

ufora

Compiled, automatically parallel Python for data science

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0