hapi-server / servers

Catalogs of known HAPI servers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HAPI crawler

jbfaden opened this issue · comments

Bob and I were thinking it would be nice to have some way of aggregating all the labels for datasets in HAPI servers, which clients could use to locate data. I could write an Autoplot script to go through the known servers, which would create a JSON file containing the dataset identifiers and descriptions. This file would then be posted here, when changes are detected, so that we could see the history and evolution of the servers, and so there's a known place where clients can search for parameters. Bob's use case was DST, which might be "dst" on one server and "DST" on another, and it's not even clear who is hosting it.

This is done and the result is https://github.com/hapi-server/servers/tree/master/index

I'll have Srikar convert this to a script that only requires Python.

It seems that this could be done with a ~10-line Python code that uses the Python client. I'll look into this. We should also try to figure out a decent sampleStartDate for datasets that don't have it.

This was done with an Autoplot script which runs weekly and commits the results to https://github.com/hapi-server/servers/index/. This was broken for the past 6 months but is running again. I've added dataset time intervals as well, which is useful for constraining searches.

There's an issue where it would gum up the CDAWeb server, so that is disabled, and I'll start another ticket for that problem.