Martha
Google Cloud Functions for resolving DOS URIs.
Martha v1
Removed as of March 2020. Please use Martha v2.
Martha v2
To call martha_v2
, perform an HTTP POST
to the appropriate URL. The content-type
of your request should be either
application/json
or application/x-www-form-urlencoded
with the content/body of your request encoded accordingly.
The body of the request must be a JSON Object with one value:
a DOS URL. You may also specify an Authorization
header on
the request with a valid OAuth bearer token. Martha uses the DOS URL to retrieve a data object, unpacks it, and returns
a JSON Object containing one or two values: the list of URIs where the underlying resource may be accessed, and
(optionally) the private key information for the
Google Service Account that you may use to access
the underlying resource. The Google Service Account information will only be included in the response if you provided an
Authorization
header on your request.
Staging: https://us-central1-broad-dsde-staging.cloudfunctions.net/martha_v2 Production: https://us-central1-broad-dsde-prod.cloudfunctions.net/martha_v2
Martha v3
To call martha_v3
, perform an HTTP POST
to the appropriate URL. The content-type
of your request should be either
application/json
or application/x-www-form-urlencoded
with the content/body of your request encoded accordingly.
The body of the request must be a JSON object with at least one value:
a DOS or DRS URL. You must also specify an Authorization
header on
the request with a valid OAuth bearer token. Martha uses the URL to retrieve a data object, unpacks it, and returns
a standard JSON Object containing the object metadata and (optionally) the private key information for the
Google Service Account that you may use to access
the underlying resource. The Google Service Account information will only be included in the response if the URL
should return the service account from the account linking service Bond.
Staging: https://us-central1-broad-dsde-staging.cloudfunctions.net/martha_v3 Production: https://us-central1-broad-dsde-prod.cloudfunctions.net/martha_v3
It will return an object with the properties:
contentType: string [resolver sometimes returns null],
size: int [resolver sometimes returns null],
timeCreated: string [the time created formatted using ISO 8601, resolver sometimes returns null],
timeUpdated: string [the time updated formatted using ISO 8601, resolver sometimes returns null],
bucket: string [resolver sometimes returns null],
name: string [resolver sometimes returns null],
gsUri: string [resolver sometimes returns null],
googleServiceAccount: object [null unless the DOS url belongs to a Bond supported host],
fileName: string [resolver sometimes returns null],
accessUrl: object [resolver sometimes returns null],
hashes: object [contains the hashes type and their checksum value; if unknown, it returns null]
Example response for /martha_v3:
{
"contentType": "application/octet-stream",
"size": 156018255,
"timeCreated": "2020-04-27T15:56:09.696Z",
"timeUpdated": "2020-04-27T15:56:09.696Z",
"bucket": "my-bucket",
"name": "dd3c716a-852f-4d74-9073-9920e835ec8a/f3b148ac-1802-4acc-a0b9-610ea266fb61",
"gsUri": "gs://my-bucket/dd3c716a-852f-4d74-9073-9920e835ec8a/f3b148ac-1802-4acc-a0b9-610ea266fb61",
"googleServiceAccount": null,
"fileName": "hello.txt",
"accessUrl": {
"url": "https://storage.example.com/f3b148ac-1802-4acc-a0b9-610ea266fb61?sig=ABC",
"headers": {
"Authorization": "Basic Z2E0Z2g6ZHJz"
}
},
"hashes": {
"md5": "336ea55913bc261b72875bd259753046",
"sha256": "f76877f8e86ec3932fd2ae04239fbabb8c90199dab0019ae55fa42b31c314c44",
"crc32c": "8a366443"
}
}
The fields are:
gsUri
: The full Google Cloud Storage URI/URL/path to the blob storing the databucket
: The bucket name part of thegsUri
name
: The object name part of thegsUri
fileName
: The file name for the bytescontentType
: The type of data stored in the bytessize
: The size of the bytesaccessUrl
: The url and optional headers to fetch the byteshashes
: The various hash types and values for the bytestimeCreated
: The time of creation for the bytestimeUpdated
: The time of last update for the bytesgoogleServiceAccount
: An optional service account that should be used to access thegsUri
bondProvider
: An optional Bond provider that may be used to retrieve credentials to access the bytes
The body of the request JSON object may also contain a key named fields
with a value of an array of strings. The
response will only contain the fields listed in the array. The array should only contain field names from the above
list.
Example request to return the default fields:
curl \
localhost:8010/martha_v3 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{"url": "dos://foo/bar"}'
Example request to return only hashes
, size
, and bondProvider
:
curl \
localhost:8010/martha_v3 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{"url": "dos://foo/bar", "fields": ["hashes", "size", "bondProvider"]}'
NOTE:
There was an early substitution recommendation to
users, instructing
them to convert their URL schemes from "dos" to "drs". Some underlying servers hosting the DOS/DRS metadata have not yet
upgraded to support the DRS request path-prefix and DRS response JSON metadata, so martha_v2
and martha_v3
still
communicate with those servers using the older request/response format.
At the same time, those server hosts are also working to submit test accounts for automated testing purposes. The final
list of supported martha_v3
servers is still being finalized while those test accounts are being created.
Martha's martha_v3
implementation translates requests-to and responses-from the following hosts:
- β
Jade Data Repo (JDR)
- Prod host:
data.terra.bio
- Dev host:
jade.datarepo-dev.broadinstitute.org
- Martha Testing: π€ Continuous Automated
- Returns Bond SA: No
- Requires OAuth for metadata: π Yes
- Example:
drs://jade.datarepo-dev.broadinstitute.org/v1_0c86170e-312d-4b39-a0a4-2a2bfaa24c7a_c0e40912-8b14-43f6-9a2f-b278144d0060
- Prod host:
- β DataGuids.org
(any drs://dg.* other than drs://dg.4503, drs://dg.712C, drs://dg.ANV0, drs://dg.4DFC, drs://dg.F82A1A,
and not drs://dataguids.org)
- Prod host:
gen3.biodatacatalyst.nhlbi.nih.gov
- Dev host:
staging.gen3.biodatacatalyst.nhlbi.nih.gov
- Martha testing: π Manual (in production)
- Returns Bond SA: Yes, Bond provider
dcf-fence
- Requires OAuth for metadata: π No
- Example: unknown
- Prod host:
- β DataGuids.org (drs://dg.4503 in prod and drs://dg.712C in non-prod)
- Prod host:
gen3.biodatacatalyst.nhlbi.nih.gov
- Dev host:
staging.gen3.biodatacatalyst.nhlbi.nih.gov
- Martha testing: π Manual
- Returns Bond SA: Yes, Bond provider
fence
- Requires OAuth for metadata: π No
- Example:
drs://dg.712C/fa640b0e-9779-452f-99a6-16d833d15bd0
- Prod host:
- β The Analysis, Visualization and Informatics Lab-space
(The AnVIL, dg.ANV0)
- Prod host:
gen3.theanvil.io
- Dev host:
staging.theanvil.io
- Martha testing: π« Mock only
- Returns Bond SA: Yes, Bond provider
anvil
- Requires OAuth for metadata: π No
- Example:
drs://dg.ANV0/00008531-03d7-418c-b3d3-b7b22b5381a0
- Prod host:
- β DataGuids.org (drs://dataguids.org, but not drs://dg.*)
- Prod host:
dataguids.org
- Dev host: unknown
- Martha testing: π« Mock only
- Returns Bond SA: Yes, Bond provider
dcf-fence
- Requires OAuth for metadata: π No
- Example:
dos://dataguids.org/a41b0c4f-ebfb-4277-a941-507340dea85d
- Prod host:
- β UCSC Single Cell Dev Server
- Prod host: unknown
- Dev host:
drs.dev.singlecell.gi.ucsc.edu
- Martha testing: π« Mock only
- Returns Bond SA: Yes, Bond provider
dcf-fence
- Requires OAuth for metadata: π Yes
- Example:
drs://drs.dev.singlecell.gi.ucsc.edu/bee7a822-ea28-4374-8e18-8b9941392723?version=2019-05-15T205839.080730Z
- β Gabriella Miller Kids First Pediatric Data Resource
(drs://dg.F82A1A)
- Prod host:
data.kidsfirstdrc.org
- Dev host:
gen3staging.kidsfirstdrc.org
- Martha testing: π« Mock only
- Returns Bond SA: Yes, Bond provider
kids-first
- Requires OAuth for metadata: π No
- Example:
drs://data.kidsfirstdrc.org/ed6be7ab-068e-46c8-824a-f39cfbb885cc
- Prod host:
- β Cancer Research Data Commons (CRDC, drs://dg.4DFC)
- Prod host:
nci-crdc.datacommons.io
- Dev host:
nci-crdc-staging.datacommons.io
- Martha testing: π« Mock only
- Returns Bond SA: Yes, Bond provider
dcf-fence
- Requires OAuth for metadata: π No
- Example:
drs://nci-crdc.datacommons.io/0027045b-9ed6-45af-a68e-f55037b5184c
- Prod host:
β = Hosts that either a) don't support DRS v1.0, or b) haven't been tested with Martha's `martha_v3` endpoint
Other DRS servers might work with Martha's martha_v3
endpoint, however only the servers above are officially
supported. For more information see these documents:
- Mapping Data GUIDs to DRS Server Hostnames
- DRS 1.1 Transition within NCPI
- Getting Through the DRS 1.1 Compact Identifier Transition for Gen3/Terra
If you have an additional server you'd like to add to Martha, please store the test credentials in Vault and submit a PR with both the integration test and updated documentation. If you do not have direct access to Vault, please contact us via Jira to have your test credentials stored. NOTE: You will need to create a free account to access the Jira board.
File Summary v1
The file summary service will return metadata and a signed download URL good for one hour (in the case of a DOS URI, only if the caller is linked in Bond).
It expects the following:
- an
Authorization
header containing a bearer token Content-Type: application/json
- an object containing the key
uri
It will always return an object with the same properties:
contentType: string,
size: int,
timeCreated: string by design [resolver sometimes returns null],
updated: string by design [resolver sometimes returns null],
md5Hash: string,
bucket: string,
name: string,
gsUri: string,
googleServiceAccount: string [always null],
signedUrl: string [absent for dos when caller is not linked in Bond]
Get Signed Url
Requires a bearer token in the authorization
header.
Expects JSON with the keys bucket
, object
, and optionally dataObjectUri
.
Returns JSON with the key url
.
If present, dataObjectUri
is used to determine a provider for Bond. Otherwise, a standard pet service account from Sam
is used.
Development
Setup
- Install Node 16, the current LTS
- Clone the Martha git repository and
cd
to it - Make sure your version of npm is up-to-date:
npm install -g npm
- Install dependencies:
npm install
ESLint
ESLint is a tool for identifying and reporting on patterns found in ECMAScript/JavaScript code, with the goal of making code more consistent and avoiding bugs. More information can be found on it's website.
Installation and Usage
Prerequisites: Node.js (>=16.x) built with SSL support
- Install ESLint using npm or yarn:
npm install eslint --save-dev
oryarn add eslint --dev
- One can setup their own configuration file using
npx eslint --init
(and prompts followed) or use the.eslintrc.js
file found at root of this project - Run ESLint on any specific file or directory
npx eslint <file_name or directory_name>
. To run ESLint from the root of the project usenpx eslint .
Fix Automatically
Many problems ESLint finds can be automatically fixed. When ESLint is run on file or directory, at the end it states how many errors or warning can be
fixed automatically. --fix
option on the command line can be used for this.
Run the npx
command using --fix
flag: npx eslint <file_name/directory_name> --fix
Google Cloud Functions (GCF) Framework
- The Martha functions may be run locally via the
functions-framework, started with following command
npm start
- From another terminal, test the function:
curl \ localhost:8010/martha_v3 \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{"url": "dos://foo/bar"}'
- To stop the functions-framework press
Control-C
in the terminal runningnpm start
. - Google application credentials are required to test any DRS URI that uses passports and mTLS. Download a service account key and set the environment variable
GOOGLE_APPLICATION_CREDENTIALS
to the path to that key. The service account must have the rolesecretmanager.secretAccessor
on the Google projectbroad-dsde-dev
.
Run Tests
npm test
Run Integration Tests
Prerequisites:
- Access to Vault for retrieving integration test credentials
- A checkout of Bond to run Bond locally on
127.0.0.1:8080
- Python virtual environments to run parts of Bond in Python 2 and Python 3:
Setup:
- From your martha directory render the credentials for Martha's integration tests
docker \
run \
--rm \
--volume "$PWD:$PWD" \
--env INPUT_PATH="$PWD/automation" \
--env OUT_PATH="$PWD/automation" \
--env ENVIRONMENT=dev \
--env VAULT_TOKEN="$(cat ~/.vault-token)" \
broadinstitute/dsde-toolbox \
render-templates.sh
- Follow the steps referenced in "Bond: Run
locally" to start a local Bond server on
127.0.0.1:8080
- Ensure you have rendered the Bond configs
- You be running two virtual environment sessions for Bond, one with Python 2 and one Python 3
Running the Integration Tests:
- After finishing your setup, start your martha emulator in a separate terminal
- Start Martha using
ENV=mock npm start
. This will start the functions-framework to listen for requests on port 8010. - Console logs will print to the terminal
- Whenever you make changes you will need to kill and restart Martha
- Stop Martha using Control-C
- Start Martha using
- In a separate terminal window, run Martha's integration tests via:
ENV=mock npm run integration
Deployment and Releasing
-
Deployments to the
cromwell-dev
tier are triggered manually by running./deploy-cromwell-dev.sh
. The script will build a docker image using your current working directory and current git branch name, and then deploy the resulting code tobroad-dsde-cromwell-dev
. There you can test out changes before submitting pull requests.broad-dsde-cromwell-dev
is an environment administered by the DSP-Batch team, who previously worked primarily on Cromwell development and now also maintains Martha. -
Deployments to the
dev
tier are triggered automatically whenever code is pushed/merged to thedev
branch on github. -
When the latest code passes tests in CircleCI, it is tagged
dev_tests_passed_[timestamp]
where[timestamp]
is the epoch time when the tag was created.
NOTE:
- Each deployment will redeploy all supported versions of functions.
- It is important that you deploy to all tiers. Because Martha is an "indie service", we should strive to make sure
that all tiers other than
cromwell-dev
anddev
are kept in sync and are running the same versions of code. This is essential so that, as other DSP services are tested during their release process, they can ensure that their code will work properly with the latest version of Martha running inprod
.
Docker
The Dockerfile for Martha builds a Docker image that, when run, does the following:
- Starts the Google Cloud Functions Framework
- Serves all supported Martha functions via the functions-framework
- Exposes port
8010
(the port previously used by the functions-emulator) - Handles
HTTP
requests to functions served over the exposed port
Run the Docker Container
To run the Martha container, whether running a locally built image or an image pulled from a repository, you must
start the container with appropriate port mapping between the host and the container. You can choose whatever host
port you may require; in the following example port 58010
is used:
docker run --publish 58010:8010 us.gcr.io/broad-dsp-gcr-public/martha:latest
Building Docker Images
Public images are published to Google Container Registry (GCR) for each branch.
To list images run:
gcloud container images list-tags us.gcr.io/broad-dsp-gcr-public/martha
To build a new Docker image for Martha:
cd
to the root of the Martha codebase- Run:
docker build -f docker/Dockerfile .
Logs (for live app)
- Can be viewed on Google Cloud Platform
- Go to console.cloud.google.com
- Select Cloud Functions from the main (on the left side) menu
- Find the version of the function you want to check
- Click the vertical three dots and choose "view logs"