tskir / validator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyPI version

Build Status

codecov

Docker Repository on Quay.io

opentargets-validator

Evidence string validator.

Purpose

This tool is intended to validate JSON files that have a single JSON object per line. This is the format that is required from the data sources that provide us with evidence for our target-disease associations.

The validator will check the expected structure, defined in a JSON schema which must be provided via a --schema argument.

Be aware that this is not a general-purpose JSON validator, and use of "pretty-printed" JSON will cause errors.

Schema URLs

The Open Targets JSON schema is located at https://github.com/opentargets/json_schema. Note that you should not use master as this may change any time, instead use the latest available tag, e.g. 1.6.3. If you are a data provider, you will always receive an email from Open Targets with information about what JSON schema version to use. Also, when specifying the schema to the validator you have to use the "raw" GitHub URL:

https://raw.githubusercontent.com/opentargets/json_schema/1.6.3/opentargets.json

How to install it

The easiest way is with pip:

pip install -U opentargets-validator

It supports both Python 2 and Python 3.

How to use it

You have two options:

  • pass a filename or URL as a positional argument
  • read from stdin (e.g. a shell pipe)

Read from stdin

cat file.json | opentargets_validator --schema https://raw.githubusercontent.com/opentargets/json_schema/{tag_version}/opentargets.json

Read from positional argument

This can automatically decompress gzip'ed files. Compression will be detected via filename e.g. ending with .json.gz.

Examples of acceptable paths are:

opentargets_validator --schema https://raw.githubusercontent.com/opentargets/json_schema/{tag_version}/opentargets.json https://where/myfile/is/located.json

Note

There used to be a --log-lines argument that could be used to exit early when a certain number of errors occored. This is no longer supported, and with parallelization improvements it is rarely necessary in practice.

Evidence lines are checked for uniqueness by calculating the hash of the unique_association_fields field. This can be done in the validator using the --hash argument.

How to develop

Within a virtualenv you can install with:

pip install -e .[dev]

and you can run the tests with:

pytest --cov=opentargets_validator --cov-report term tests/ --fulltrace

This repository has Travis integration and CodeCov integration .

Releases are put on PyPI automatically via Travis from GitHub tags.

About

License:Apache License 2.0


Languages

Language:Python 94.9%Language:Dockerfile 5.1%