build automated testing suite for SmartAPI Registry uptime and contents, API_LIST APIs used, and `testExamples` x-bte annotation

Question

build automated testing suite for SmartAPI Registry uptime and contents, API_LIST APIs used, and `testExamples` x-bte annotation

colleenXu opened this issue 4 months ago · comments

[#782 needs to be done first, before working on this]

For many of the x-bte operations I write, I've recorded the info for a small manual test (if I query with this input, do I get this in the response?) in the operation's testExamples field. During the on-site week (rough notes here), we agreed that it'd be useful to process this info, turn it into an automated test suite, and then run that test suite regularly.

It could be useful for regression testing and for alerting us to issues with those APIs or BTE:

whether the API is available or not - right now we don't do any uptime monitoring for APIs we use through x-bte annotation
data / response-structure changes
issues with BTE's subquerying or post-processing

Example of the annotation and how I manually test using it

For this operation for ComplexPortal (external API):

There's 1 testExample entry:

      testExamples:
        - qInput: "GO:0070026"                  ## nitric oxide binding
          oneOutput: "ComplexPortal:CPX-52"     ## iNOS-S100A8/A9 complex

To test this operation, I query this API ONLY thru BTE (http://localhost:3000/v1/smartapi/326eb1e437303bee27d3cef29227125d/query). I use the inputs, outputs, predicate, and qInput fields' info:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["GO:0070026"],
                    "categories": ["biolink:MolecularActivity"]
                },
                "n1": {
                    "categories": ["biolink:MacromolecularComplex"]
                }
            }
        }
    }
}

In the response, I expect to see this edge connecting the qInput ("GO:0070026") to the oneOutput (ComplexPortal:CPX-52)

                "a1fdb08982935506514b277e69c3e41c": {
                    "predicate": "biolink:enabled_by",
                    "subject": "GO:0070026",
                    "object": "ComplexPortal:CPX-52",
                    "attributes": [],
                    "sources": [
                        {
                            "resource_id": "infores:complex-portal",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:complex-portal"
                            ]
                        }
                    ]

Noting complex situations

some operations have qualifier sets, we'd want to turn that into a set of qualifier-constraints for the test queries (ex: BioThings DGIdb)
some operations have multiple testExamples elements. These could be tested one-by-one or together. Ex: v3 Monarch operation
- In my manual tests, I test them together: I run both qInputs together in 1 test query, and check that each have an edge to their corresponding oneOutput. I do that to test that BTE is handling the multiple input IDs correctly.
I wrote operations using the IDs found in the API, which may not be the primary node ID used by Translator. That'll make finding the matching edge in the response more complicated. Ex: v3 Monarch operation using the HGNC Gene IDs when BTE will end up using NCBIGene IDs (noted in the comments)

In prep for this, I've uncommented all of the testExamples info that was in the x-bte annotations for Service-Provider-only APIs (main commit, adjustment, monarch v3).

The APIs with x-bte annotation but lacking testExamples info right now are:

BioThings SEMMEDDB: auto-generated. It may be possible to generate this info...
Text-Mining Targeted + 4 Multiomics APIs (Wellness, EHR risk, biggim drug response, clinicaltrials): some info is in comments, but it's not the current format, is commented-out, and is old. Would want their cooperation to get the testExamples info written

Jackson Callaghan · Answer 1 · Wed Feb 28 2024 01:48:52 GMT+0800 (China Standard Time)

Note that the way we're intending to implement this is by making it a nightly cron job similar to updating the SmartAPI specs. The procedure would be as follows (with failures at any point being reported to Sentry):

Test that SmartAPI is available
Update SmartAPI specs
For each API enabled in the API_LIST:
- Ensure that there is a SmartAPI spec for it
- Test if the API is reachable
- Run each testExample, checking that the response satisfies expectations

Additionally, these tests should be run against the instance-appropriate servers (where applicable), and we would want the ability to use a pnpm script locally to test against a specified instance level/API.

Much of the logic to run these tests can be obtained from the existing BTE structure (smartapi-kg, etc).