Ingesting metadata from AWS SQS and S3

Ingesting metadata from AWS SQS and S3

Related services
Tags

Incremental ingestion

The REST API can be run in a profile that will cause it to ingest metadata from SQS. An example of this can be seen below.

AWS_REGION=eu-west-1 SQS_QUEUE_URL=http://sqs:9324/queue/ingest lein run :api :sqs-ingest

When run with sqs-ingest the rest API will start tasks that will consume data from SQS.

Data is produced to the queue whenever new file is added to, or existing file is replaced in the metadata bucket. Messages in SQS contain the name of the metadata bucket, the key of the file and the version of the file.

After receiving the message, the REST API reads the relevant file from the metadata bucket (waiting for the version to be available, if needed), parses it and indexes it in ES (either a work document or a citated by count update).

graph LR; integration(Integration Point) s3(S3 Bucket) sqs(AWS SQS queue) indexer(REST API Indexer) restApi(REST API) integration--Writes-->s3 s3--Events-->sqs; sqs--Consumes-->indexer indexer--Ingests-->restApi

Bulk ingestion

The REST API can also be run in a profile that will cause it to push metadata keys to SQS. An example of this can be seen below.

AWS_REGION=eu-west-1 SQS_QUEUE_URL=http://sqs:9324/queue/ingest lein run :api :s3-sqs-produce-xml

When run with s3-sqs-produce-xml or s3-sqs-produce-update the rest API will start tasks that will fetch all keys (XMLs or citated-by updates, respectively) from the metadata bucket.

Each of the keys fetched from the metadata bucket will be pushed to SQS, after which they will be handled by the incremental ingestion mechanism.

graph LR; restApi(REST API S3 Ingest Task) restApiIncremental(REST API SQS Ingest Task) s3(S3 Bucket) sqs(AWS SQS queue) restApi--Reads-->s3 s3--Keys-->restApi restApi--Produces-->sqs sqs--Consumes-->restApiIncremental click restApiIncremental "#incremental-ingestion" "Incremental Ingestion";