- python API for Apache Flink
- Flink allows you to build scalable batch and streaming workloads
- udf support and integration with pandas
- powerful relational queries (e.g. sql)
- more lowlevel
- enable full 3rd party python use
- scalar, table and modular functions
- parallelization
- can configure size of batch to convert to Panda series
- decreased serialization overhead
- python version >= 3.5
- download from PyPi
- process and group
- latent dirichlet - why some parts of a dataset are related or similar to each other
- set up sources+sinks and Tables for each