This repository holds the docker and TDS configuration files related to the Unidata TDS instance in EC2 used to test accessing data stored in S3 directly. In this case, we have two flavors of the Multiscale Ultrahigh Resolution (MUR) L4 daily analysis datasets:
The testing period covers the entirety of 2018 and 2019 for both flavors of MUR.
The Unidata S3 bucket is laid out in the following way.
Two top level prefixes:
MUR/
(0.01 degree, a.k.a. 1 km) *MUR25/
(0.25 degree)
Under the MUR/
prefix, objects are named as follows:
- yyyy_MM_dd_090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
As an example, the full key to the MUR 1km analysis from 2019-02-13 would be MUR/2018_02_13_090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
Similarly, under the MUR25/
prefix, objects are named as follows:
* yyyy_MM_dd_090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
As an example, the full key to the MUR 0.25 degree analysis from 2019-02-13 would be MUR25/2019_02_13_090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
.
These object names are a little different than the names of the files as obtained from JPL.
For example, the 2019-02-13 MUR 1km file from JPL is named 20190213090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
I chose to rename the objects based on how I see designing the TDS S3 datasetScan
capability.
I can see providing the datasetScan
element a bucket, a prefix (e.g. MUR/
) and a delimiter (_
for my object names).
This will allow the user who browses the HTML catalogs the ability to "click through" by year, month, and date.
This will reduce the time needed by the TDS to generate a catalog at a given level, and will help keep the number of entries shown in any single catalog manageable (the maximum number of links you would see on any one html page would be 31, as opposed to all 1460 objects).
-
Two shell scripts to control
docker-compose
:- tds-start.sh (start the TDS)
- tds-stop.sh (stop the TDS)
Note that the underlying container is removed when the TDS is stopped.
-
docker-compose
configurationThe scripts above assume that the main configuration file used by docker-compose to setup the TDS container is named
docker-compose.yml
.docker-compose.yml.example
is an example of what that looks like on the Unidata system. This example config refers tocompose.env
, which holds certain container configuration related environmental variables.compose.env.example
is an example of what is used on the Unidata EC2 instance. The example config also refers to a few other files, namely:files/creds/aws_creds
files/creds/tomcat-users.xml
An example of what those files might look like can be found under
files/
. The root level.gitignore
is setup to ignore anything underfiles/creds/
, so it should be safe to place the files appropriate for your setup in that directory. Depending on how you are managing credentials, theaws_creds
file may not be needed. -
TDS configuration files
The TDS configuration files live under
tds-configs/
. Perhaps the most critical at this point is themur-1km.xml
TDS Configuration Catalog. This is where we tell the TDS where to look for data, what services to expose, what metadata to add, etc. The critical elements related to S3 are thedatasetRoot
elements, and the next-to-last dataset elements. ThedatasetRoot
element essentially provides a top level alias to the S3 bucket (i.e.mur-test
is associated with theunidata-jpl-sandbox
bucket). The firstdatasetRoot
, which is commented out, refers to the name of a profile in an AWS credentials file, which is used to define the AWS region in use as well as the credentials needed to access the S3 resources. It is better to avoid the use of a credentials file when possible, and one way to do that is to attach an IAM Policy role to the EC2 instance running the TDS, which is demonstrated by the uncommenteddatasetRoot
element. The IAM Policy will need to allowGet
andList
access to the bucket hosting the data. For example, the IAM Policy might look like:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:Get*", "s3:List*" ], "Resource": "*" } ] }
As always, work with your AWS administrator to set up an appropriate IAM Policy for your specific needs that meets all of the requirements for your organization. Finally, the
dataset
element then uses themur-test
datasetRoot name in combination with the key to the granule we with wish to serve (i.e.MUR/2019_01_01_090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
).The
mur-1km.xml
is also setup to serve a seven day NcML aggregation. Aggregation of S3 objects have some limitations in how they are configured. First, you will need to edit yourthreddsConfig.xml
to enable an experimental option, which tells the netCDF-Java library to use its new builder interface. Next, the aggregation needs to live in a separate NcML file (i.e. the aggregation cannot be done in the catalog). In this case, the seven day aggregation lives at tds-configs/ncml_files/mur-2019-01-01-seven-day-agg-with-cred-profile.ncml (credentials file authorization) or tds-configs/ncml_files/mur-2019-01-01-seven-day-agg-iam.ncml (IAM authentication). Finally, the NcML aggregation must be defined by listing the individual granules (i.e. there isn't an S3 version of<datasetScan>
yet).These restrictions will be removed in future versionsof the TDS, but for now, welcome to the bleeding edge of our S3 capabilities.
One final note: if you are using an IAM policy, you will need to tell the TDS which AWS region your S3 bucket resides. There are two options that do not require the use of a credentials file:
- Set the
AWS_REGION
environmental variable to a valid region code (e.g.export AWS_REGION=us-west-2
- see https://docs.aws.amazon.com/general/latest/gr/rande.html#regional-endpoints for the options). - Set the
aws.region
Java System property on the Java procress running your servlet container (e.g.-Daws.region=-us-west-2
)
- Set the
The Unidata version of aws_creds
has a profile named mur-bucket
, which has the credentials needed to do basic read and list object calls on the bucket.
When you look at the TDS configuration catalogs that use the credentials file to managing S3 access level credentials, you will see things like cdms3://mur-bucket@aws/...
- the profile name is the userinfo
sub-component of the authority
component of the cdms3 URI.