Ingesting metadata from S3

Ingesting metadata from S3

Related services
Tags

The REST API can be run with profiles that will cause it to ingest metadata from S3. An example of this can be seen below.

AWS_REGION=eu-west-1 METADATA_BUCKET=crossref-metadata-bucket-temp lein run :nrepl :api :s3-ingest-xml :s3-ingest-update

When run with s3-ingest-xml or s3-ingest-update the rest API will start a task that will page through data (XMLs or citation count updates, respectively) in METADATA_BUCKET and index the data in Elasticsearch.

It is also possible to index a single DOI by using METADATA_DOI env var, for example:

AWS_REGION=eu-west-1 METADATA_BUCKET=crossref-metadata-bucket-temp METADATA_DOI=10.1145/253228.253255 lein run :nrepl :api :s3-ingest-xml

If you do not wish to rely on S3 then you can use a local directory, like so:

METADATA_BUCKET=/location/to/local/metadata METADATA_LOCAL_STORAGE=1 lein run :nrepl :api :s3-ingest-xml

Note METADATA_BUCKET must have been built using Metadata Bucket Builder or another tool that conforms to the spec.

Note This ingesting is done in a single thread and is not scalable at the moment.