Fito is to Data Science what SQLAlchemy is for data bases. Fito is the Data Science ORM.
It helps you organize you code and integrate different technologies while handling a consistent and clear object model.
Things like:
- Mapping between config files and behaviour,
- Caching results of execution (in memory, or in any key value store)
- Or attaching metadata to executions (like metrics, scores, plots, etc)
become trivial
Fito is a package that works around four concepts:
First there are Specs
. A Spec
specifies an object.
It provides the capability of specifiyng things, like models or data sources.
Also an Spec
can be combined with another Spec
which allows them to specify
things like experiments that combine both models and data sources.
Specs are both json-serializable and hasheable.
An Operation
is an Spec
that computes something out of it. Can be though
as a currified function
That leads us to the DataStore
, whose capability is to index an Spec
(or any subclass, of course).
There are three implementations, one that uses python dictionaries, another uses the file system and
a third one backed on MongoDB.
One nice combination of having this abstraction, is that we can do automatic caching. That can be performed just by linking operations and data stores together
Besides that, fito provides very helpful decorator, as_operation
that
turns any function into a subclass of Operation
.
It looks like this
from fito.data_store import DictDataStore
from fito import as_operation
ds = DictDataStore() # Can be any implementation of data store
@as_operation(cache_on=ds)
def f(x, y=1):
return x + y
f(1).execute() # executed
f(1).execute() # retrieved from cache
That code is enough to cache the executions of f
into memory
You can see more examples here:
-
A simple execution flow: Shows how operations can be used to express entities linked together by their execution
-
The auto caching decorator: Shows how operations joint with data stores can be used for automatic function caching
-
The execution FIFO: Shows how we can leverage on execution cache to avoid recomputing recently executed operations
This is my first open source piece of software where I'm commiting myself to mantain for the next year.
Let me know if you happen to use it!
And please, do not hesitate on sending pull requests :D
pip install fito