GenStage is a specification for exchanging events between producers and consumers.
This project currently provides the following functionality:
-
Experimental.GenStage
(docs) - a behaviour for implementing producer and consumer stages -
Experimental.DynamicSupervisor
(docs) - a supervisor designed for starting children dynamically. Besides being a replacement for the:simple_one_for_one
strategy in the regularSupervisor
, aDynamicSupervisor
can also be used as a stage consumer, making it straight-forward to spawn a new process for every event in a stage pipeline
The module names are marked as Experimental
to avoid conflicts as they are meant to be included in future Elixir releases. In your code, you may add alias Experimental.{DynamicSupervisor, GenStage}
to the top of your files and use the relative names from then on.
You can find examples on how to use the modules above in the examples directory:
-
ProducerConsumer - a simple example of setting up a pipeline of
A -> B -> C
stages and having events flowing through -
DynamicSupervisor - an example of how to use one or more
DynamicSupervisor
as a consumer to a producer that works as a counter -
GenEvent - an example of how to use
GenStage
to implement aGenEvent
replacement that leverages concurrency and provides more flexibility regarding buffer size and back-pressure
GenStage requires Elixir v1.3.
-
Add
:gen_stage
to your list of dependencies in mix.exs:def deps do [{:gen_stage, "~> 0.1"}] end
-
Ensure
:gen_stage
is started before your application:def application do [applications: [:gen_stage]] end
Here is a list of potential topics to be explored by this project (in no particular order or guarantee):
-
Provide examples with delivery guarantees and/or checkpoints
-
Consider using DynamicSupervisor to implement Task.Supervisor (as a consumer)
-
TCP and UDP acceptors as producers
-
Ability to attach filtering to producers - today, if we need to filter events from a producer, we need an intermediary process which incurs extra copying
-
Connecting stages across nodes - because there is no guarantee demand is delivered, the demand driven approach in
GenStage
won't perform well in a distributed setup -
Integration with streams - how the
GenStage
foundation integrates with Elixir streams? In particular, streams composition is still purely functional while stages introduce asynchronicity. -
Exploit parallelism - how the GenStage foundation can help us implement conveniences like pmap, chunked map, farming and so on
-
Explore different windowing strategies - the ideas behind the Apache Beam project are interesting, specially the mechanism that divides operations between what/where/when/how (1, 2) as well as windowing from the perspective of aggregation (3)
-
Introduce key-based functions - after a
group_by
is performed, there are many operations that can be inlined likemap_by_key
,reduce_by_key
and so on. The Spark project, for example, provides many functions for key-based functionality (4)
Other research topics include the Titan (5), Naiad's Differential Dataflow engine (6) and Lasp (7).
- https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
- http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
- http://www.vldb.org/pvldb/vol8/p702-tangwongsan.pdf
- http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs
- http://asc.di.fct.unl.pt/~nmp/pubs/clouddp-2013.pdf
- http://research-srv.microsoft.com/pubs/176693/differentialdataflow.pdf
- https://lasp-lang.org/
Same as Elixir.