ilagnev / barnes-tms-extract

Barnes Foundation Collection Website

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

barnes TMS extract

Scripts to import the barnes eMuseum api into elasticsearch to be used by barnes-collection-www.

We have elasticsearch and kibana v5.4 running on aws. Contact Steven Brady for credentials.

For more context into the early decision making of the system, see the architecture doc. For more information about how the CSV files for image information are created, see the datascience doc.

Data Pipeline

On a nightly basis, the scripts in scripts/update run on the admin server to:

  1. Export TMS data from the eMuseum API.
  2. Create a new, timestamped index in Elasticsearch. The naming pattern for Elasticsearch indices is collection_<timestamp>.
  3. Ingest the TMS data exported from eMuseum into the new Elasticsearch index.
  4. Add color data to all documents in the Elasticsearch index.
  5. Add image secrets to all documents, which allows the client to access images in S3.
  6. Add computer vision data to all documents.
  7. Add tags to all documents.

These scripts rely on the existence of a series of CSV files to add image secrets, computer vision data, etc. Those files must be stored in the directory referenced in config/base.json in CSV.dataPath. If these files are missing, the update scripts will not finish running, and the collection index will be considered incomplete and ignored by the front-end.

Using elasticserch

The barnes-collection-www application is a good example for reading from the most recent complete Elasticsearch index. It looks for indices named collection_* and selects the index with the latest timestamp that also has content in the tags field on individual objects.

You can get a sorted list of the collection indices by running in Kibana Dev Tools:

GET _cat/indices/collection_*?v&s=index

The most recent index will be at the bottom of the list. It should contain around 2265 documents.

Data Mapping

The mapping for the collection data stored in Elasticsearch is defined in config/mapping.json.

You can also retrieve the mapping for a specific index by running in Kibana Dev Tools:

GET collection_<timestamp>/_mapping

Resources

The Elasticsearch v5.4 documentation is unfortunately, the best resource we have found so far. Be aware that there are completely separate references for each version of Elasticsearch. If you locate solutions to problems through Google or StackOverflow, you will frequently be taken to a reference page for a different version than the one you’re using. Make sure to watch for that, so that you don’t get stuck trying a method that’s been deprecated or hasn’t been introduced in your version.

Clicking “View in Console” on any example queries in the docs will take you to your Kibana Dev Tools console and insert the query. This is very helpful.

About

Barnes Foundation Collection Website

License:GNU General Public License v3.0


Languages

Language:JavaScript 95.4%Language:HTML 3.7%Language:Python 0.5%Language:Shell 0.5%