Troubleshooting REST API

Troubleshooting REST API

Why is a DOI missing from the REST API?

A DOI should be indexed in the REST API within a day at the worst case.

Support tests to determine that a DOI has been registered with us and should be appearing in the REST API, but is missing.

  1. Querying the API for the DOI will provide some useful information: https://api.crossref.org/works/[prefix]/[suffix]
  2. If the call response is “resource not found.”, then fair to conclude the DOI is missing.
  3. Has the DOI been registered? This call will tell you which agency has registered the DOI: https://api.crossref.org/works/[prefix]/[suffix]/agency
  4. A check of the reports tab in the admin tool will provide the deposit history for the DOI in question
  5. If a DOI has been registered with us and has a deposit history in the admin tool, it should appear in the API within a day.
  6. In some rare instances, a DOI may have been registered with a different registration agency and transferred to Crossref, but the metadata not yet redeposited with us. In those cases, the DOI will return a “resource not found.” message when resolving in the REST API and will also display the registration agency as “Crossref” with the above agency call. So, confirming via the reports tab is the preferred method of confirming the DOIs absence in the API.

Look at logs:

The normal updater sends XMLs to the indexer on the master Solr server, ssmds0a currently (Jan-19-2020).
The indexer has 3 logs to be aware of;
1. /home/crossref/sasdata/cayenne/log/log.txt
2. /home/crossref/sasdata/cayenne/data/feed.log
3. /home/crossref/sasdata/cayenne/data/feed-thread.log

log.txt is the main output from the indexer. This is where you might find Solr errors, and the status of the solr adds/commits or if there was some low 
level error with the indexer. 

feed.log contains information about files that have been pushed to the indexer that it has received. The file names include a UUID as part of their name
so are not easy to distinguish. The logging has been enhanced to also include the DOI (if available) when the file gets parsed, and that the processing is complete
errors should show up in there that are the bad file content type. 

feed-thread.log contains a running status of the worker threads that are looking for and processing the files that have been received. Errors don't usually show
in here, but it's a good record keeping place to see if the file in question was even picked up.

Look at the files:

In the same structure as the logs, there are 3 folders under /home/crossref/sasdata/cayenne/data/; 
1. feed-in
2. feed-processed
3. feed-failed
As you may imagine, pushed files go into feed-in. They are then read by the threads and processed. If the thread encounters an error, it should move a file to
feed-failed.
Normal pushes occur every 20 minutes, during which time the folder will populate and drain simultaneously. This usually only take a couple of minutes.

The feed-processed folder contains files that, hopefully, were processed correctly. The folder is emptied by a cron job that removes files older than 60 minutes.
*/2 * * * * find /home/crossref/sasdata/cayenne/data/feed-processed -mmin +60 -print |xargs rm -rf ; > /tmp/feed.log 2>&1

Feed failed is not deleted currently, and about 10 files end up in there daily, about 32K files currently. 

Force-push

Using jconsole, connect to cr6:8097. 
Under the qs.crmds package, there are a number of classes. The crmdsPushService has the operation of pushing citation id ranges. The pushes are batched in 1000
increments, so if you push fewer, they may not be sent until the next scheduled push fills up the batch. 
The attributes will tell you if the correct servers are set up to be push to. 
"CBC" stands for "CitedByCount" and are not the DOI pushes.