Metadata Bucket

Metadata Bucket

Component
Area greenfield
Quality No Sentry, no SONAR
Upstream data
Downstream services
Downstream data
Related services
Tags

The Metadata Bucket is a private S3 bucket containing metadata and updates. It is designed to feed the Cayenne REST API. The metadata has unrestricted references.

Bucket Structure

Data should be added to the Metadata Bucket using the following key structure:

{doi-hash}/{filename}

Assuming a bucket name of crossref-metadata-bucket-staging then some examples might be:

crossref-metadata-bucket-staging/8fd133785660bb26ebca632b8ca40104bef4ba7f/unixsd.xml
crossref-metadata-bucket-staging/8fd133785660bb26ebca632b8ca40104bef4ba7f/citation-update.json

Key Components

{doi-hash}

{doi-hash} is a sha1 hash of the lowercase DOI.

We use a hash for a number of reasons:

  1. DOIs can contain all kinds of characters, including non-printable ones, extra slashes, semicolons, question marks etc
  2. A hash supports much better prefix balancing than the literal DOI would. See here

We use sha1 for the hash because it is universal, improves distribution, and security isn’t a concern here.

{filename}

The filename of the metadata is based on the type of metadata. e.g. unixsd.xml, citation-update.json.

FAQ

Currently citation update file contains updates for multiple DOIs. Do we plan to keep one update file per DOI in this new architecture?

Yes, citation update files will only be for a single DOI with this model.

Metadata

Objects created in the metadata bucket should have a x-amz-meta-cr-doi metadata property added. This property should have the lowercase DOI value relating to the object.