hugovk / pescador

Stochastic multi-stream sampling for iterative learning

Home Page:https://pescador.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pescador

PyPI Anaconda-Server Badge Build Status Coverage Status Documentation Status DOI

Pescador is a library for streaming (numerical) data, primarily for use in machine learning applications.

Pescador addresses the following use cases:

  • Hierarchical sampling
  • Out-of-core learning
  • Parallel streaming

These use cases arise in the following common scenarios:

  • Say you have three data sources (A, B, C) that you want to sample. For example, each data source could contain all the examples of a particular category.

    Pescador can dynamically interleave these sources to provide a randomized stream D <- (A, B, C). The distribution over (A, B, C) need not be uniform: you can specify any distribution you like!

  • Now, say you have 3000 data sources, each of which may contain a large number of samples. Maybe that's too much data to fit in RAM at once.

    Pescador makes it easy to interleave these sources while maintaining a small working set. Not all sources are simultaneously active, but Pescador manages the working set so you don't have to.

  • If loading data incurs substantial latency (e.g., due to accessing on-disk storage or pre-processing), this can be a problem.

    Pescador can seamlessly move data generation into a background process, so that your main thread can continue working.

Want to learn more? Read the docs!

Installation

Pescador can be installed from PyPI through pip:

pip install pescador

or with conda using the conda-forge channel:

conda install -c conda-forge pescador

About

Stochastic multi-stream sampling for iterative learning

https://pescador.readthedocs.io

License:ISC License


Languages

Language:Python 99.3%Language:Shell 0.7%