XML Updates and Transformations

XML Updates and Transformations

Related services


The XML that members register with us may be changed in two ways after they register it:

  • Mutation: the data as it’s stored is changed and the change can’t be automatically rolled back.
  • Transformation: the data as it’s stored hasn’t changed, but the process of querying the data adds something to the data for the output of the query.

Some of these changes happen as part of the deposit process, some happen as automated processes outside of the deposit process, some are from us making a one-off change such as a patch, and some are changes that happen at the point of querying, purely for the purpose of the query output.

Changes to XML through deposit process

Format transformations

If a member registers JATS XML with us, we transform this to our schema.

XML merges

Users can deposit partial updates to existing XML. This uses different rules for different types of change.

  • Resource-only deposit
  • Adding/deleting clinical trial nos

Product specific changes

When a complete deposit is made, the XML of the deposit may be changed by the deposit process itself. This includes:

  • Adding funder ID (If there is no funder ID, we attempt to insert it. This is fuzzy matching against the Funder Registry).
  • Moving the funding data from one place to another (funding metadata can live in one of two places in the XML: inside Crossmark metadata (if Crossmark is there) or in a separate section (only is there is no Crossmark). Sometimes we have to move funding metadata (for example, if Crossmark sections appears for the first time and the DOI already had funding metadata outside of it, or if Crossmark section is being deleted but the funding should be kept).

Regular changes to XML outside deposit process

  • Do we go back to check for Funder IDs?
  • Reference matching - splitting references, adding DOIs for references

“One-off” changes

One-off modifications to a large number of items that we need to manage carefully so it couldn’t be done by the members themselves.

  • Large scale updates for large publishers with millions of DOIs.
  • Bug fixing.
  • Changing title ownership

Changes applied by the process of querying

These changes are made as a transformation to the data at query time. They aren’t stored, for example, in the blob store.

CRM Items

XML Elements are added to Query Results, including <crm-item>s. See ‘DOI Info’ above. An update to one DOI may result in the change to the CRM-items for another DOI. For example, the addition of a new reference link may update the cited-by count for a differnt DOI, meaning that the relevant crm-items element is different the next time that DOI is queried. Or the addition of a DOI that expresses a relationship with a different DOI will mean the relationship crm-item is different the next time it’s queried.

  • Cited-by
  • Relations - eg components added (Note that this is about relations directly specified in the XML through <program xmlns="http://www.crossref.org/relations.xsd"><related_item>. Other relations, such as citations or funding links are not processed in this way.)