This application is the backend server for the PhotonRanch Datalab. It is a django application with a REST API for communicating with the Datalab UI.
- Python >= 3.9
- Django >= 4
Start by creating a virtualenv for this project and entering it:
python -m venv /path/to/my/virtualenv
source /path/to/my/virtualenv/bin/activate
Then install the dependencies:
pip install -e .
The project is configured to use a local sqlite database. You can change that to a postgres one if you want but sqlite is easy for development. Run the migrations to setup the database and then you can run the server.
./manage.py migrate
./manage.py runserver
If you want to start with some test data, run this management command one time after running migrations to add some test data to the database. The test data creates two datasessions with some operations, and a user test_user
with password test_pass
and API token 123456789abcdefg
.
./manage.py populate_test_data
The application has a REST API with the following endpoints you can use. You must pass your user's API token in the request header to access any of the endpoints - the headers looks like {'Authorization': 'Token 123456789abcdefg'}
if you are using python's requests library.
Datasessions can take an input_data
parameter, which should contain a list of data objects. The current format is described below, but this is probably something that will evolve as we learn more how we are using it.
session_input_data = [
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010332'
},
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010333'
},
]
Data operations can have a varying set of named keys within their input_data
that is specific to each operation. For example it would look like this for an operation that just expects a list of files and a threshold value:
operation_input_data = {
'input_files': [
{
'type': 'fitsfile',
'source': 'archive',
'basename': 'mrc1-sq005mm-20231114-00010332'
}
],
'threshold': 255.0
}
POST /api/datasessions/
post_data = {
'name': 'My New Session Name',
'input_data': session_input_data
}
GET /api/datasessions/
GET /api/datasessions/datasession_id/
DELETE /api/datasessions/datasession_id/
Available Operations are introspected from the data_operations
directory and must implement the BaseDataOperation
class. I expect we will add more flesh to those classes when we actually start using them.
GET /api/datasessions/datasession_id/operations/
POST /api/datasessions/datasession_id/operations/
post_data = {
'name': 'Median', # This must match the exact name of an operation
'input_data': operation_input_data
}
DELETE /api/datasessions/datasession_id/operations/operation_id/
- Come up with operation
wizard_description
format and add endpoint to get them for all available operations so the frontend can auto-create UI wizards for new operations. - Figure out user accounts between PTR and datalab - datalab needs user accounts for permissions to gate access to only your own sessions.
- Implement operations to actually do something when they are added to a session
- Figure out caching and storage of intermediate results
- Figure out asynchronous task queue or temporal for executing operations
- Add in operation results/status to the serialized operations output (maybe to the model too as needed)