sul-dlss / purl-fetcher

An HTTP API for querying and updating PURLs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CI

purl-fetcher

An HTTP API for querying and updating PURLs. See the API section below for docs.

Requirements

  1. Ruby (3.2 or greater)
  2. bundler gem
  3. Apache Kafka (0.10 or greater), or Docker

Installation

Clone the repository:

git clone https://github.com/sul-dlss/purl-fetcher.git
cd purl-fetcher

Install dependencies:

bundle install

Set up the database:

rake db:migrate

Developing

The API communicates with a Kafka broker to dispatch and process updates asynchronously. You can run a Kafka broker locally, or use the provided docker-compose configuration:

docker-compose up

Then, in a separate terminal, start a development API server:

bin/rails server

Finally, in another terminal, you can run the Kafka consumer to process updates from the Kafka broker:

bundle exec racecar PurlUpdatesConsumer

Making requests

You can make requests to the API using curl or a similar tool. To add an object to the database, you can first download its public Cocina JSON from production PURL:

curl https://purl.stanford.edu/bb112zx3193.json > bb112zx3193.json

Then, you can use the POST /purls/:druid endpoint to add the object to the database:

curl -X POST -H "Content-Type: application/json" -d @bb112zx3193.json http://localhost:3000/purls/bb112zx3193

After the object has been added, it will show up in the list of changes:

curl http://localhost:3000/docs/changes

Testing

The full test suite (with RuboCop style enforcement) can be run with the default rake task:

rake

The tests can be run without RuboCop style enforcement:

rake spec

The RuboCop style enforcement can be run without running the tests:

rake rubocop

API

Purls

/purls/:druid

POST /purls/:druid

Summary

Purl Document Update

Description

The POST /purls/:druid endpoint provides the ability to create or update a PURL document from public Cocina JSON. This endpoint is used by dor-services-app as part of SDR workflows.

Parameters
Name Located In Description Required Schema Default
druid url Druid of a specific PURL Yes string eg(druid:cc1111dd2222) null
version header Version of the API request eg(version=1) No integer 1
Example Response
true

Docs

/docs/changes

GET /docs/changes

Summary

Purl Document Changes

Description

The /docs/changes endpoint provides information about public PURL documents that have been changed, their release tag information and also collection association. This endpoint can be queried using purl_fetcher-client.

Parameters
Name Located In Description Required Schema Default
first_modified query Limit response by a beginning datetime No datetime in iso8601 earliest possible date
last_modified query Limit response by an ending datetime No datetime in iso8601 current time
page query request a specific page of results No integer 1
per_page query Limit the number of results per page No integer (1 - 10000) 100
target query Release tag to filter on No string nil
version header Version of the API request eg(version=1) No integer 1
Example Response
{
  "changes": [
    {
      "druid": "druid:dd111ee2222",
      "latest_change": "2014-01-01T00:00:00Z",
      "true_targets": ["SearchWorksPreview"],
      "collections": ["druid:oo000oo0001"]
    },
    {
      "druid": "druid:bb111cc2222",
      "latest_change": "2015-01-01T00:00:00Z",
      "true_targets": ["SearchWorks", "Revs", "SearchWorksPreview"],
      "collections": ["druid:oo000oo0001", "druid:oo000oo0002"]
    },
    {
      "druid": "druid:aa111bb2222",
      "latest_change": "2016-06-06T00:00:00Z",
      "true_targets": ["SearchWorksPreview"]
    }
  ],
  "pages": {
    "current_page": 1,
    "next_page": null,
    "prev_page": null,
    "total_pages": 1,
    "per_page": 100,
    "offset_value": 0,
    "first_page?": true,
    "last_page?": true
  }
}

/docs/deletes

GET /docs/deletes

Summary

Purl Document Deletes

Description

The /docs/deletes endpoint provides information about public PURL documents that have been deleted. This endpoint can be queried using purl_fetcher-client.

Parameters
Name Located In Description Required Schema Default
first_modified query Limit response by a beginning datetime No datetime in iso8601 earliest possible date
last_modified query Limit response by an ending datetime No datetime in iso8601 current time
page query request a specific page of results No integer 1
per_page query Limit the number of results per page No integer (1 - 10000) 100
target query Release tag to filter on No string nil
version header Version of the API request eg(version=1) No integer 1
Example Response
{
  "deletes": [
    {
      "druid": "druid:ee111ff2222",
      "latest_change": "2014-01-01T00:00:00Z"
    },
    {
      "druid": "druid:ff111gg2222",
      "latest_change": "2014-01-01T00:00:00Z"
    },
    {
      "druid": "druid:cc111dd2222",
      "latest_change": "2016-01-02T00:00:00Z"
    }
  ],
  "pages": {
    "current_page": 1,
    "next_page": null,
    "prev_page": null,
    "total_pages": 1,
    "per_page": 100,
    "offset_value": 0,
    "first_page?": true,
    "last_page?": true
  }
}

Collections

/collections/:druid/purls

GET /collections/:druid/purls

Summary

Collection Purls route

Description

The /collections/:druid/purls endpoint a listing of Purls for a specific collection. This endpoint is used by the Exhibits application.

Parameters
Name Located In Description Required Schema Default
druid url Druid of a specific collection Yes string eg(druid:cc1111dd2222) null
page query request a specific page of results No integer 1
per_page query Limit the number of results per page No integer (1 - 10000) 100
version header Version of the API request eg(version=1) No integer 1
Example Response
{
  "purls": [
    {
      "druid": "druid:ee111ff2222",
      "published_at": "2013-01-01T00:00:00.000Z",
      "deleted_at": "2016-01-03T00:00:00.000Z",
      "object_type": "set",
      "catkey": "",
      "title": "Some test object number 4",
      "collections": [
        "druid:ff111gg2222"
      ],
      "true_targets": [
        "SearchWorksPreview"
      ]
    },
...
    {
      "druid": "druid:cc111dd2222",
      "published_at": "2016-01-01T00:00:00.000Z",
      "deleted_at": "2016-01-02T00:00:00.000Z",
      "object_type": "item",
      "catkey": "567",
      "title": "Some test object number 2",
      "collections": [
        "druid:ff111gg2222"
      ],
      "true_targets": [
        "SearchWorksPreview"
      ],
      "false_targets": [
        "SearchWorks"
      ]
    }
  ],
  "pages": {
    "current_page": 1,
    "next_page": null,
    "prev_page": null,
    "total_pages": 1,
    "per_page": 100,
    "offset_value": 0,
    "first_page?": true,
    "last_page?": true
  }
}

Administration

Reindexing

You can create Kafka messages that will cause all the Purls to be reindexed by doing:

Purl.unscoped.find_in_batches.with_index do |group, batch|
  puts "Processing group ##{batch}"
  group.each(&:produce_indexer_log_message)
end

Or only for searchworks:

Purl.target('Searchworks').find_in_batches.with_index do |group, batch|
  puts "Processing group ##{batch}"
  Racecar.wait_for_delivery do
    group.each { |purl| purl.produce_indexer_log_message(async: true) }
  end
end

Reporting

The API's internals use an ActiveRecord data model to manage various information about published PURLs. This model consists of Purl, Collection, and ReleaseTag active records. See app/models/ and db/schema.rb for details.

This approach provides administrators a couple ways to explore the data outside of the API.

Using Rails runner

With Rails' runner, you can query the database using ActiveRecord. For example, running the Ruby in script/reports/summary.rb using:

RAILS_ENV=environment bundle exec rails runner script/reports/summary.rb

produces output like this:

Summary report as of 2016-08-24 09:52:49 -0700 on purl-fetcher-dev.stanford.edu
PURLs: 193960
Deleted PURLs: 1
Published PURLs: 193959
Published PURLs in last week: 0
Released to SearchWorks: 5

Using SQL

With Rails' dbconsole, you can query the database using SQL. For example, running the SQL in script/reports/summary.sql using:

RAILS_ENV=environment bundle exec rails dbconsole -p < script/reports/summary.sql

produces output like this:

PURLs	193960
Deleted PURLs	1
Published PURLs	193959
Published this year	9
Released to SearchWorks	5

About

An HTTP API for querying and updating PURLs


Languages

Language:Ruby 93.4%Language:HTML 6.6%