statisticsnorway / microdata-job-service

M2.0 Service for managing jobs in the microdata platform

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

microdata-job-service

Service for managing jobs in the microdata platform.

Endpoints

[GET] /jobs

Query for existing jobs.

Example request

curl <url>/jobs?status=queued&operation=ADD,REMOVE

Responses
status json
200 [{...}, {...}]
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}
Query Parameters
  • status - filter on job status
  • operation[] - filter on job operation
  • ignoreCompleted - ignore completed jobs True | False

[GET] /jobs/<jobId>

Get one existing job by job id

Example request

curl <url>/jobs/123

Responses
status json
200 {"message": "OK"}
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[POST] /jobs

Post new jobs

Example request

curl -X POST <url>/jobs -d '{"jobs": [{...}, {...}]}'

Responses
status json
200 [{"job_id": "123-123-123", status": "CREATED", "msg": "OK"}, {"status": "FAILED", "msg": "Missing operation"}]
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[PUT] /jobs

Update an existing job with a new description or a change of status.

Example request

curl -X PUT <url>/jobs -d '{"status": "failed", "log": "Unexpected failure"}'

Request Body Must include either description or status
  • description - Job description
  • status - Updated job status
  • log - Optional log describing update (Optional)
Responses
status json
200 {"message": "OK"}
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[GET] /importable-datasets

Get information about the importable datasets in the input directory.

Example request

curl -X GET <url>/importable-datasets

Responses
status json
200 [{"datasetName": "MY_DATASET", "hasMetadata": false, "hasData": true}, ...]
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[DELETE] /importable-datasets

Delete importable datasets in the input directory.

Example request

curl -X DELETE <url>/importable-datasets/<dataset_name>

Responses
status json
200 {"message": "OK, <dataset_name> deleted"}
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[GET] /targets

Get all target objects

Example request

curl -X GET <url>/targets

Responses
status json
200 [...targets]
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[GET] /targets/<name>

Get all jobs for target with given name, or datastore bump jobs containing the target name in its datastore updates.

Example request

curl -X GET <url>/targets/<name>/jobs


[POST] /maintenance-status

Sets a flagg that prevents starting new jobs. This will typically be the case when new versions of containers are about to be deployed.

Example request

curl -X POST <url>/maintenance-status -d '{"msg": "We need to upgrade", "pause": 1}'

Responses
status json
200
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[GET] /maintenance-status

Retrieves the flagg preventing starting new jobs.

Example request

curl -X GET <url>/maintenance-status

Responses
status json
200
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}

[GET] /maintenance-history

Retrieves a list of all statuses sorted in reverted chronological order.

Example request

curl -X GET <url>/maintenance-history

Responses
status json
200
400 {"message": "<Error message>"}
500 {"message": "<Error message>"}



Database

This service is bundled with an instance of mongodb run through docker, and uses two collections under the jobdb database:

  • completed: completed jobs
  • inprogress: jobs in progress. This collection has an unique index on the datasetName field. In other words there can only be one active import job for a given dataset at any given time.

SETUP

On initialization of the database log into the mongodb container, and enter the mongo shell:

mongo --username <USER> --password <PASSWORD>

In the mongo shell create an index for the inprogress collection:

> use jobdb
> db.inprogress.createIndex({"datasetName": 1}, {unique: true})

Contribute

Set up

To work on this repository you need to install poetry:

# macOS / linux / BashOnWindows
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

# Windows powershell
(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python -

Then install the virtual environment from the root directory:

poetry install

Intellij IDEA

Add Python Interpreter "Poetry Environment".

Running unit tests

Open terminal and go to root directory of the project and run:

poetry run pytest --cov=job_service/

Running locally

If you want to test the service completely in your local environment:

  • Run docker compose up in test/resources/local to run a mongodb instance in docker
  • Set environmental variables on your system: MONGODB_URL=mongodb://localhost:27017/jobdb
export MONGODB_USER=USER \
export MONGODB_PASSWORD=123 \
export INPUT_DIR=tests/resources/input_directory \
export DOCKER_HOST_NAME=localhost
  • Run application from root directory: poetry run gunicorn job_service.app:app

Running on server

Build the mongo image locally, tag it and push it to Nexus so it is available for Jenkins in secure zone:

docker pull mongo:5
sudo docker tag mongo:5 nexus.ssb.no:8443/raird/mongo:latest
sudo docker push nexus.ssb.no:8443/raird/mongo:latest

Built with

  • Poetry - Python dependency and package management
  • Gunicorn - Python WSGI-server for UNIX
  • Flask - Web framework
  • Pydantic - Data validation

About

M2.0 Service for managing jobs in the microdata platform

License:MIT License


Languages

Language:Python 97.9%Language:Dockerfile 2.1%