Codato ("COvid DAta TOols") is a machine learning framework integrating a number of different data science applications, packaged as lightweight, autonomous, pluggable apps. Although this framework is dedicated to analyzing Covid-19, it could easily be applied to any epidemiological study. (Indeed, given a bit more time, we'd like to run it for historical outbreaks of which we already know the outcomes.) Moreover the models are fully swappable, so datasets need not be limited to the epidemiological.
It's data driven, not hard coded, and the toolset itself is domain agnostic, so it could easily be used with any time series data influenced by environmental factors. Models can be written either in C (as the SEIR model here is) or Python (as the NLP model is, although it's also Cythonised.) Obviously, there's a pretty big performance difference.
- Overview
- Build
- Install
- Deploy
- Run
- Modules
- SEIR+ Modeling
- Web Server
- Examples
- Release Notes
- Contributing
- Resources
- Live Demo
- License
- In Gratitude
Codato models, such as the SEIR epidemiological model, the NLP model used for question answering (“unendlich-verstehen”), curve fitting and others are written in C / Cython, so they need to be compiled for your architecture.
Source files can be found in build/src/c
. Simply run the build.sh
script. Oh, it didn't work? Welcome to the machine.
Codata is designed to be fault tolerant for missing apps. So you should be able to run it without building anything, and everything else should just work. Test coverage on this is, shall we say, less than complete.
-
Dependencies * Codato is is a framework, not a library; all dependencies are managed in the environment.
-
Installation steps (once you have built)
* create env / source env file * run installer
environment.yml
should have everything you need to get started. If you’re not using Conda, I feel bad for you son, I got 99 problems but dependencies ain’t one.
NB. there are 2 environment files shipped, production and development. The production env currently still uses the built in server.
Answer a couple questions and it's off to the races:
./install.sh
Codato is deployable via Kubernetes, plain old Docker or bare naked metal if you prefer to go containerless (a bit like streaking through a data center, I’d say.)
Helm charts are located in: /charts
At some point, will probably move all 3 deploy methods into deploy
directory.
docker compose up
if you have root. If you don’t have root, you go hungry, I guess. Maybe find somone who does.
Just run the installer.
\m/ metal
Now you can run codato server start
from the command line, or invoke start.sh directly if you're still using ansible.
./start.sh
To start the app, run the following command. Yes, it's a server, that's how the app works. It's all back to front.
run webserver
This is essentially the equivalent of invoking:
python run.py
Since the API is served with flask, you should be able to start the application server with flask run
which will first look for app.py
and not finding it, will attempt to execute run.py
Simply pass the theme name to codato webserver
such as:
codato webserver --theme=pride
(rainbow pride)
codato webserver --theme=blm
(dark mode)
codato webserver --defalt
(default theme)
Ex: Run Predict as standalone module: (ie not using python -m)
python apps/predict.py
A number of flags are supported on start up:
-D load default settings, --defaults
-d date range to display, --dates
-h display help page, --help
-m model to be used, --model
-t select training period, --train
-v display version info, --version
(aka FEATURES or APPS)
Not to be confused with features of a model, Codato platform features are standalone, pluggable modules (aka apps) which define a reactive front end and the machine learning callbacks it requires, and are automatically added simply by dropping them in the apps
directory. Currently its main callback needs to be maually registered in server.py, but this will be autodetected as well, allowing apps to easily be added and/or swapped out unidrectionally. Martin Fowler is smiling his happy smile.
In the current setup, some apps export layout as var and some expose as functon. Does it makes sense to support both? @Question
Codato includes the following apps by default: (move app to ".inactive" folder to remove.)
This is not an app per se but is the underlying interface all apps implement. Documentation generated using Open API using either redoc or Swagger.
* API first platform
* Ref Doc via redoc
* Fully Stateless Endpoints
* Front End Conversational Interface
* Natural language queries over any dataset
* Add dataset via upload or URL
* Browse data tables
* Edit data in browser
* Save or downloaded updated dataset
* @TODO: Data output directly available to Model app.
* Visually explore datasets to identify salient features
* Feature distance comparator
* Add features to your model
* Fit using fractal gradient reduction
* Run model over any available dataset
* Adjust parameters in real time
* Match model parameters to real-world data
* Validated output fed into Re-model
* Time-series cross-validation
* Hyperparameter tuning
Since k-fold Cross Validation doesn't work for time series data., the framework includes custom time-series cross-valisation (TX) that avoids "rolling" the data.
* Visual explorer of real world Covid data
- Fixed population constraint in SEIR modeling.
- Standard SEIR models is they assume infection period is equivalent for both those who recover and those who do not. This is only an assumption.
- Implicit assumption that infectiou period is equivalent to the contagion period, which doesn't allow for for latency in sympton onset. (Since it's classically defined as delay in symptom onset from exposure. But in the case of CoronaVirus, patients cold be infectious for days or even weeks before they become symptomatic- if they become symptomatic at all.)
The limitations of the standard SEIR model have been addressed, with additional considerations:
- Recovered ("Removed" in SIR mode) split into recovered, and not-recovered.
Isolation and Quarantine rates are predicted based on Social Mobility Data Sets. (The default is Google.) Tune model to see exactly why 10 days is chosen as the optimal isolation constant (and 14 for quarantine.) Change dates of social distancing directives and degree of compliance. Download updated model or feed into simulation.
- We welcome contributions!
- Front End & Designers esp!
- Send that Pull Request!
Let us know how you really feel.
t/k
- Kind of a lot right now, it's literally version 0.1
And no, there's NO animated race chart, there’s enough of those on the Internet. But the point is how trivial it is to add one here by implementing a module that exposes 2 interfaces: layout
, which is a list of html elements and reactive components of at least length 1, and callbacks
which returns function for a reactive binding. That's it.
Tell us one we haven't thought of!
- Unconditional Mean, Volatility, and the FOURIER-GARCH Representation
- Fractional and Fractal Formulations of Gradient Linear and Nonlinear Elasticity
A live demo of this projec it running at Coda.to.
Codato is released under either the MIT or the GPL license, depending who you ask. Please contact us for clarification.
You know who you are.
About: It's less about optics or end users, more about collaborative tools.
Team: We are a group of research scientists and data engineers based mainly in New York, but collaborating over Zoom (actually, Slack)
Contact: Email suits us best.