Ingesting metadata from S3

The REST API can be run with profiles that will cause it to ingest metadata from S3. An example of this can be seen below.

AWS_REGION=eu-west-1 METADATA_BUCKET=crossref-metadata-bucket-temp lein run :nrepl :api :s3-ingest-xml :s3-ingest-update

When run with s3-ingest-xml or s3-ingest-update the rest API will start a task that will page through data (XMLs or citation count updates, respectively) in METADATA_BUCKET and index the data in Elasticsearch.

It is also possible to index a single DOI by using METADATA_DOI env var, for example:

AWS_REGION=eu-west-1 METADATA_BUCKET=crossref-metadata-bucket-temp METADATA_DOI=10.1145/253228.253255 lein run :nrepl :api :s3-ingest-xml

If you do not wish to rely on S3 then you can use a local directory, like so:

METADATA_BUCKET=/location/to/local/metadata METADATA_LOCAL_STORAGE=1 lein run :nrepl :api :s3-ingest-xml

Note METADATA_BUCKET must have been built using Metadata Bucket Builder or another tool that conforms to the spec.