DR-0003: Simplify Handle prefix auth

Decision made around 2016. This write-up 2022-05-011.

Context

Put simply, the problem was that we recorded individual credentials for each new member and had to keep them in sync with the Handle registry. This was a major source of toil as we expanded our membership.

This decision concerns the simplification of the way that Crossref authenticates administrative access to the Handle Registry (e.g. registering and updating) for DOIs for which we are the registration agency.

This change was made in 2016 but there is no available decision record as of 2022. This document was written up in the context of investigating CR-209 and CR-229 from 2022-05-06. There are examples of Handle metadata which may become out of date, but the relevant data is quoted here.

Prefixes and Naming Authorities

In this document the word ‘Handle’ is used in the context of the Handle Registry system. The normative documents for terms are IETF RFC 3650 and IETF RFC 3651.

DOIs are a subset of Handles. Although DOIs are the main type of Handle that we deal with, we also deal with non-DOI Handles. The terms are not interchangeable and are (hopefully!) used accurately in this document.

Handles (and therefore DOIs) have a syntactic prefix, i.e. the bit before the slash, e.g. the 10.5555 in 10.5555/12345678. The prefix of a DOI is intrinsic to that DOI (the suffix can only be interpreted in the context of the prefix).

The Handle System has a number of Naming Authority Handles, e.g. 0.na/10.1016. These correspond directly to the syntactic prefix of the DOI. For every distinct DOI prefix there is one Naming Authority Handle.

One of the roles of the Naming Authority is during resolution, to discover the authoritative Handle server to retrieve handle metadata. Because the Naming Authority handle corresponds to the prefix of the Handle being resolved it’s not possible (or meaningful) to change this connection.

The handle namespace can be considered a superset of many local namespaces, with each local namespace having a unique naming authority under the Handle System. The naming authority identifies the administrative unit of creation, although not necessarily continuing administration, of the associated handles. RFC3650§3

We attach other Handle metadata to Naming Authority handles, such as Content Negotiation and DUL link headers. An example is index code 20:

20 HS_NAMESPACE

<HS_NAMESPACE>  
<DOI.RA>10.SERV/CROSSREF</DOI.RA>  
<locs>10.SERV/CROSSREF</locs>  
<locs>10.SERV/CROSSREF.ELSEVIER_LINK_HEADER</locs>  
</HS_NAMESPACE>

https://hdl.handle.net/0.na/10.1016?noredirect=true

This indicates that DOIs on this prefix should defer to 10.SERV/CROSSREF for <locs>. A glance there at index 4 shows:

4 10320/loc

<locations http_sc="302">  
<location weight="0" http_role="conneg" href_template="https://api.crossref.org/v1/works/{hdl}/transform" />  
</locations>

https://hdl.handle.net/10.SERV/CROSSREF?noredirect=true

Content Negotiation is included here as an aside to illustrate that Naming Authority Handles are used for a variety of purposes and not just auth.

Crossref “Owner Prefix”

The Naming Authority is a domain object that we attach meaning to. Originally Crossref issued one prefix to each ‘publisher’ and therefore assigned a Naming Authority to each ‘publisher’. This is the concept of a “Crossref Owner Prefix”. We attached other internal metadata such as a “Publisher Name” to this, and used it in billing and reporting.

As of 2016 (and at time of writing in 2022) we assign a new Owner Prefix (and with it a Naming Authority Handle) to every joining Member. But we do not assume that there is a one-to-one mapping.

This has become source of confusion, as publishers have merged and split, meaning that some Crossref Members have multiple Owner Prefixes on account (e.g. Elsevier) and in other cases they might share them. This confusion is beyond the scope of this document however!

HS_ADMIN

The HS_ADMIN value in Handle is used to denote which identity has permission to perform which administrative functions, see RFC 3651§3.2.1.

If a given DOI has an HS_ADMIN field that points to a Naming Authority handle, then authorisation is delegated to an identity connected to that Naming Authority. E.g. 10.1016/j.pupt.2022.102128 has the following entry:

100 HS_ADMIN

handle=0.na/10.1016; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]

https://hdl.handle.net/10.1016/j.pupt.2022.102128?noredirect=true

i.e. the DOI 10.1016/j.pupt.2022.102128 should defer to the identities on 0.na/10.1016, index code 200 for auth. Looking there tells us:

200 HS_VLIST

300:10.cradmin/shillum

https://hdl.handle.net/10.1016?noredirect=true

i.e. the 10.cradmin/shillum identity.

When Crossref registers a DOI we add an HS_ADMIN value at index code 100 which points to its prefix at the time of creation. This denotes that the bearer of an identity with authorization for that Naming Authority has permission to subsequently update metadata on that handle (for the most part this metadata is just the resolution URL but there are other fields, such as multiple resolution).

That ability to change HS_ADMIN to point to another Naming Authority gives us the ability to switch the authorization of an individual DOI to another owner upon transfer. We do this by assigning the DOI to a different Owner Prefix within Content System, and then updating the Handle record to switch the HS_ADMIN from the old prefix to the new one.

Crossref stores Handle credentials, within Content System, against each Owner Prefix. These correspond to Handle Identities (e.g. 10.cradmin/cshillum) for the corresponding Naming Authority records. Whenever we register modify a DOI in Handle we retrieve these credentials from our secret store for the corresponding Owner Prefix and use them to authorize the request.

Dragons

There is a source of confusion here. DOIs have at least two distinct types of connections to Naming Authority Records because those Naming Authority records are used for at least two kinds of purposes:

  1. Authoratitive query for Handle metadata, e.g. authoratitive Handle server, Content Negotiation locations
  2. Authorization to perform admin functions, e.g. updating metadata on individual DOIs, such as Resource URLs

In the first case a DOI always points to the Naming Authority record that corresponds to its prefix. In the second case there is no intrinsic link, and that link is made only by the presence of the HS_ADMIN value. If that HS_ADMIN is not set for the DOI then the Naming Authority record will not be consulted for auth.

Although the HS_ADMIN superficially indicates “who can edit this record” within the vocabulary of the Handle Registry, the actual access control is performed within Crossref prior to contacting Handle, and this access control is done via “publisher groups”, i.e. many-to-many mappings between Publisher Users (since renamed Legacy Roles) and Owner Prefixes.

As Crossref always looks up the appropriate credential for the prefix, so (assuming that the records’ HS_ADMIN values are in sync between Handle and Content System) this provides no extra level of security.

Some people assume that the HS_ADMIN denotes some structural information (e.g. this DOI “belongs” to Member X) when in fact it indicates only which party is able to perform administrative functions. The “ownership” metadata is recorded entirely outside of the Handle Registry and although it correlates with HS_ADMIN it does not denote ‘ownership’ of the DOI (or stewardship of the metadata or content).

Methods

As of 2016 there is a number of methods (relevant to Crossref’s purposes) for authorizing changes to handle metadata for a given Content Item’s DOI.

The first method is the HS_ADMIN value on the DOI. This points to an index (usually 200) on a Naming Authority record. For example 10.1093/bja/45.4.363 has an HS_ADMIN entry that points to 0.na/10.1016 (a prefix associated with Elsevier).

100 HS_ADMIN
handle=0.na/10.1016; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]`

https://hdl.handle.net/10.1093/bja/45.4.363?noredirect=true

The fact that the HS_ADMIN points to a different Naming Authority (10.1016) than the the one that is associated with the DOI’s literal prefix (10.1093) indicates that there has been a change in ownership during that DOI’s lifetime.

Another example 10.1016/j.pupt.2022.102128 has an HS_ADMIN that points to 0.na/10.1016 which means that it has not been transferred during its lifetime.

100 HS_ADMIN
handle=0.na/10.1016; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]

https://hdl.handle.net/10.1016/j.pupt.2022.102128?noredirect=true

This means that anyone bearing the credentials for that Naming Authority can modify the DOI. Content System retrieves these on each access.

The second method is a server-level credential which allows editing of Handles. The server that Crossref connects to is designated to be the Primary for all of “our” DOIs.

Problem statement

The setup as of 2016 involves toil and does not suit our ever growing membership. Specifically, to maintain the model we must maintain:

  • Handle Credentials in Content System in sync with Handle Identities
  • “Owner Prefix” links in Content System in sync with HS_ADMIN values on all individual DOIs.
  • Synchronous coordination when we onboard a new member and create a new Owner Prefix, and credentials along with it.

There are no benefits to keeping this model as separate credentials do not offer any additional security over our own checks.

In the past some members wished to control their own Handle credentials in case they wish to perform actions directly against Handle. This appears to have been a motivating factor early on in Crossref’s history but no longer applies.

Decision Drivers

  • Reduction of toil and possibility for manual error.
  • Reduction in complexity which might result in synchronization issues.
  • Reduction of complexity which will help future technical debt reduction.
  • Overall reduction in friction for our ever growing membership.

Considered Options

There are two places where we must make implementations: Handle and Crossref’s Content System. Each have options.

Handle Option 1: Server-wide credentials on 10.SERV/CROSSREF

With this option we install server-wide credentials that directly allow access to any DOI homed on the Crossref Handle server. This allows Crossref to store only a single credential, which makes it easier to rotate secrets.

All “Crossref DOIs” are homed on Crossref’s server, i.e. it is designated the Authorititive server for those DOIs. This indication is made via the DOI’s corresponding Naming Authority record, which indicates 10.SERV/CROSSREF at index 1. Crossref is by definition entitled to perform administrative actions on any of the DOIs on this server.

This also means we could remove entirely the HS_ADMIN field from DOIs, representing a significant reduction in complexity. But we are not obliged to take this extra step.

Handle Option 2: Duplicate credentials

In this model there are no structural changes except that the Crossref identity is used. This means that DOIs still defer, via a HS_ADMIN at index 100, to the Naming Authority record. And that Naming Authority record still has an HS_ADMIN value at index 200. However the identity for this value is the generic 10.cradmin/cruser not the specific one we would have issued for that prefix.

Pros:

  • We gain the toil reduction of synchronous setups for new members.
  • (We avoid having to retrofit this to older members. At some point some would have wanted to retain their credentials, but this is no longer true.)

Cons:

  • We don’t take advantage of single credential, as we store the same value multiple times.
  • It’s difficult to change in bulk.

Handle Option 3: DOI HS_ADMIN to 10.SERV/CROSSREF

We already have a Crossref-wide identity registered at on the 10.SERV/CROSSREF record E.g.

200 HS_VLIST

... « entries ommitted » ...
300:10.cradmin/cruser  
... « entries ommitted » ...

https://hdl.handle.net/10.SERV/CROSSREF?noredirect=true

For this option we would update each DOI record, switching the HS_ADMIN pointer away from the Naming Authority to Crossref directly. E.g this:

100 HS_ADMIN

handle=0.na/10.24254; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]

https://hdl.handle.net/10.24254/cnib.21.42?noredirect=true

would become this:

100 HS_ADMIN

handle=10.SERV/CROSSREF; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]

An important result to consider is that we would be removing any indication of “ownership” from each DOI within the Handle Registry. After this change, each DOI would have an HS_SERV link back to Crossref, but no link to its ‘owner’. Even though the prior HS_ADMIN configuration is not an accurate indication, the change would need to be communicated to avoid confusion.

Pros:

  • Correct use of HS_ADMIN, and the change in values would accurately reflect the auth semantics.
  • Easy to swich back at a later date.

Cons:

  • Means updating every DOI, all 100 million of them.
  • Noisy. Every DOI must link to Crossref as its adminstrator, which is a lot of data for no extra information. Handle option 1 would be much cleaner.

Content System Option 1: Universal credential

This single credential could stored as a single secret in our store. All Handle access could use this credential. We remove the concept of individual credentials for Owner Prefixes / Naming Authorities.

Compatible with Handle Option 1, as it corresponds to a server-wide credential.

Compatible with Handle Option 2, as long as we retrofitted the credential to all historical prefixes.

Pros:

  • This would reduce complexity and open the door to further simplifications of the Owner Prefix model.
  • This would reduce toil.
  • It would be easy to rotate secrets.

Cons:

  • Substantial code changes with a high impact if it goes wrong. This is stable code and there is no good way of automatically regression-testing.
  • Prevents individual prefixes from diverging from the standard credential. There’s no evidence that we still need to be able to do this, but this would have been a problem earlier in our history.

Content System Option 2: New credential for new Owner Prefixes but don’t retrofit

The 10.cradmin/cruser credential is added to each newly created Owner Prefix, meaning that new members are given the same Handle credentials. It is effectively duplicated for each new Prefix. From the perspective of the Content System, though, there is no change, only duplicated credential values.

Compatible with Handle Option 1 as long as we don’t remove the older identities from Handle.

Compatible with Handle Option 2 if we keep all values in sync.

Pros:

  • Simple to implement. No code change required.
  • If any members still have credentials and need to use them, they can.

Cons:

  • No easy way to update if the credential needs to change.
  • Inconsistently applied, with no easy to way to tell if an existing member uses the new credential or not.
  • Requires the old HS_ADMIN setup to be maintained for older prefixes.

Decision Outcome

  • We chose Handle Option 1, i.e. a credential was installed at the root level of the server denoted by 10.SERV/CROSSREF.
  • We chose Content System Option 2, i.e. simply duplicated the credential for all new Owner Prefixes.
  • We did not choose to update historical Handle records. And correspondingly did not update historical Owner Prefixes.

Positive Consequences

  • The objective of toil reduction was met.
  • No bugs or regressions were introduced, as no code was changed.

Negative Consequences

We only did part of the job, solving only the narrow objectives. Therefore the following situation still applies:

  • No easy way to update if the credential needs to change.
  • Inconsistently applied, with no easy to way to tell if an existing member uses the new credential or not.
  • Requires the old HS_ADMIN setup to be maintained in place for older prefixes.
  • Divergence amongst DOIs and amongst Naming Authority records can cause confusion and maintainability issues e.g. making it harder to debug problems which do crop up.

We did not change any code to reflect the model, resulting meaningless HS_ADMIN data in Handle.

e.g. this affected DOI:

100 HS_ADMIN

handle=0.na/10.24254; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]

https://hdl.handle.net/10.24254/cnib.21.42?noredirect=true

But if we follow that link to the relevant Naming Authority we see nothing at index 200:  https://hdl.handle.net/0.na/10.24254?noredirect=true . The HS_ADMIN value on the DOI is therefore impossible to follow.

Prior to the change this would have pointed to a VLIST. Here’s a old, unaffected DOI:

100 HS_ADMIN

handle=0.na/10.1016; index=200; [delete hdl,read val,modify val,del val,add val,modify admin,del admin,add admin,list]

https://hdl.handle.net/10.1016/j.pupt.2022.102128?noredirect=true

If we follow the Naming Authority we see the identity list at index 200:

200 HS_VLIST

300:10.cradmin/shillum

https://hdl.handle.net/0.na/10.1016?noredirect=true

These meaningless HS_ADMIN entries are created because the Content System behaviour still expects authorization to be performed via the HS_ADMIN delegation. It isn’t, and this link no longer serves its purpose.

Conclusion

This change did its job but bugs such as CR-209, CR-229 and others, along with the continued introduction of meaningless HS_ADMIN records mean that we are reconsidering this decision in 2022 and will investigate revisiting Content System option 1.

Last modified May 28, 2024: docs: lots of small updates (cc74111)