REST API serving snapshots (`/snapshots`)
Legacy
Area | distribution-querying |
Language | Java |
Description | Serving up bulk snapshots of XML and JSON metadata. |
Production URLs | |
Quality | Sentry, no SONAR |
Upstream services | |
Upstream data | |
Downstream data | |
Source Code | |
Products |
Snapshot File Contents and Formats
Available snapshot files:
all.json.tar.gz
all.xml.tar.gz
AWS Access Control
The “org.crossref.snapshots” is the AWS S3 bucket with the bulk extracts. The bucket is available for read-only access by the “service-snapshot” IAM user. This user has an explicit policy associated with it; and so it does not show in S3 Permissions interface. This user’s access key is used by the CS services to grant access to the bucket.
The configuration in the Content System is determined by the following deployment-common.properties
:
qs.snapshot.aws-bucket-name
qs.snapshot.aws-access-key
qs.snapshot.aws-secret-key
Crossref Access Control
The service can be used by anyone (i.e. browsing the structure), but downloading is restricted to Plus members. Therefore, a member must have both the Metadata Plus service and their accompanying access token to download a snapshot. This data is transferred from Sugar every 4 hours during normal business hours (US/Eastern).
To download a snapshot, the member must provide in their request an Crossref-Plus-API-Token
HTTP header with their access token:
Crossref-Plus-API-Token: Bearer XXX
When the member uses the download URL with the access token then their HTTP client will be redirected to download the snapshot from S3 using a time limited, secure URL. The URL must be used before it expires. The URL expires in 15 minutes. The time limit is determined by qs.snapshot.url-maximum-age
.
Navigation Interface
The base URL for viewing the snapshot organization is
https://api.crossref.org/snapshots
The navigation interface is HTML and built from the S3 bucket item details on a schedule and cached locally as a org.crossref.qs.snapshot.Listing
, configured via org.crossref.qs.snapshot.SnapshotController
.
An update can be forced via JMX, but must be done per deployment, eg
$ curl \
"http://svc1a:8080/jmx/exec/qs.snapshot:name=Controller/updateListing" \
"http://svc1b:8080/jmx/exec/qs.snapshot:name=Controller/updateListing"
Navigation can be done without a Plus access token. Downloading, does require the access token in the authorization header. For example,
$ curl \
-o journals.xml.tar.gz \
-H'Authorization: Bearer XXX' \
'https://api.crossref.org/snapshots/monthly/2018/03/journals.xml.tar.gz'
If you want to download all of the month’s snapshots then you could use wget, but, generally, we expect members to use their own automation to download the wanted files and not all files.
A shortcut will direct to the the most recently uploaded set of files, for example
https://api.crossref.org/snapshots/monthly/latest
Usage Data
Each request for a download load URL is logged in a “snapshots_usage_YYYYMM” table in the “usages” MySql database. The data can be requested using the URL (Note: a ‘from’ date is required to produce a result).
http://api.crossref.org/snapshots/usage?from=2019-07-01&until=2019-07-30
This results in a tab-separated list of records. Each record has an “id”, “memberid”, “key”, and “requested” columns.
You can limit the data by providing query parameters. The parameters are
Parameter | Meaning |
---|---|
memberid |
Select only records with the given member id. The value is a decimal integer. |
key |
Select only records with the given S3 key to the downloaded item. The value is a string. Eg “key=monthly/2018/03/journals.xml.tar.gz” |
from |
Select only records requested after and including the given timestamp. The value is a string formatted YYYY-MM-DDTHH:MM:SS. |
until |
Select only records requested before and excluding the given timestamp. The value is a string formatted YYYY-MM-DDTHH:MM:SS. |
orderby |
Order the results by the named columns: “memberid”, “key”, and “requested”. Reverse the order using “desc” (ie descending), eg “orderby=requested+desc”. |