Checklist for Schema Changes

Checklist for Schema Changes

Tags

Every change to the schema is unique, but this checklist is a good skeleton procedure to follow. Not all points will apply to all changes, but you should consider each point. They are in chronological and dependency order, so later points rely on earlier points having been completed. Different tasks should be done variously by tech leads and product owners of a number of services across the Crossref’s biome.

Hint: Checkboxes are included in this page. Copy and paste the source Markdown of this page into an Epic issue to tick them off. You can find this easily by clicking ‘Edit this page’ at the bottom.

Principles

This checklist is predicated on these principle:

  • Never add a retroactive constraint, for example reducing the maximum length of a field. Doing so would break backwards compatibility meaning that we wouldn’t be able to automatically bump the schema bundle version for all emitted XML.
  • Our Schema Bundle comprises:
    • Deposit Schema: full metadata deposits. crossref<version>.xsd
    • Common: common elements used by other schemas. common<version>.xsd
    • Fundref: funding data. fundref<version>.xsd
    • Access Indicators: license data. AccessIndicators<version>.xsd
    • Relations: relationships between DOIs and other identifiers. relations<version>.xsd
    • Clinical Trials: relationships between publications that report on a clinical trial. clinicaltrials<version>.xsd
    • DOI Resources: used to append or update specific sets of metadata to an existing record. doi_resources<version>.xsd
    • Core: The whole metadata schema, used by the Deposit Schema, UniXref and query schema. New as of 5.0.0. crossref_core<version>.xsd
    • Query Input: Used to input XML queries to the system. Crossref_query_input<version>.xsd
    • Query Output: Returns Query Results. crossref_query_output<version>.xsd
    • UniXref: Returns query results in the UNIXML format. unixref<version>.xsd
    • OAI-PMH: Accommodates differences between Crossref’s OAI-PMH implementation and the published 2.0 schema. Unversioned. OAI-PMH.cr.xsd
  • The Schema bundle is versioned as a whole, according to Semantic Versioning.
  • For each version of the bundle, every versioned Schema within it has the same version.
    • e.g. for version 5.0.0, crossref5.0.0.xsd will reference crossref_core5.0.0.xsd, which will reference relations5.0.0.xsd.
    • Any change to any schema file will result in the whole bundle being bumped. So a new relation type would result in relations5.1.0.xsd, along with every other file using that version, and released as that version.

Step 1: Preparation

Planning

  • Create an implementation plan, based on this framework.
    • The list here should form an Epic, with sub points as user stories.
  • Are we planning to index new-version XML deposits in the REST API from the point in time of accepting them, or are we going to index at a later date?
    • If we plan to index immediately, REST API steps should be part of the whole project plan.
    • If not, they must still be considered so that we don’t introduce changes that will be subsequently problematic.
  • Are we planning to update our deposit tools to emit the new schema?
    • Web deposit form
    • Metadata Manager
    • POST Deposit
    • Synchronous deposit v1
    • Synchronous deposit v2

What kind of change, what does it represent, version number?

Aim to roll out schema changes one feature at at time, not as a big-bang release. This will mean splitting a big changes up into small versions of useful clusters.

  • Each change to the Schema bundle follows Semantic Versioning. Decide on the new version number.
    • The whole schema bundle is versioned as a whole.
    • Each schema file within it will have the same version number.
    • Entirely new schema files will incur a bump to the Major version of the bundle.
  • Update to existing schema
    • Does this constitute a backward-incompatible breaking change to the schema? I.e. will documents that succeed with this schema fail with previous ones?
    • Does this introduce new enumerations (e.g. new relation types) that will require changes in code (either CS or Cayenne)?
    • Decide on new version number for the Schema. Referring to Semantic Versioning.
  • Does this affect content types or container types in the Deposit System and/or REST API? Consult Content Types . Does this require:
    • a new container type / subtype from the perspective of Content System’s CitationInfo vocabulary?
    • a new type within the REST API vocabulary?
    • a new REST API type (alongside /works) in the REST API?
    • changes to the Query System / OAI-PMH tables?
    • changes to OpenURL?
    • changes to the list of types in ORCID Auto-update?
  • Does this affect constraints? For example, changing the length of a field, or checks we perform? Review:
    • Deposit System database storage for any field lengths updates. Pay attention to character encoding / collation.
    • Deposit System database storage to see if we need to update foreign key constraints or other relational issues in the SQL schema. e.g. CitedBy table.
    • Query System database storage for field lengths. Pay attention to character encoding / collation.
    • REST API Elastic Search schema and whether any mappings will need to be updated.

Finalize Schema and data files

  • Create a feature branch of the Schema repository named after the version, e.g. 5.0.0.
    • This will contain the schema files as they are being worked on. All work for this schema should be done on the branch.
    • Don’t merge this until everything is finalized at the end of this checklist.
  • Create a merge request against the master branch prefixed WIP (Work In Progress), e.g. WIP: 5.0.0.
  • Create a version-bumped copy of every schema file Schema Repository’s /schemas directory, e.g.:
    • crossref-core<version>.xsd (this will be introduced out as a common piece between the deposit schema and the query schema as of version 5.0.0)
    • crossref<version>.xsd
    • common<version>.xsd
    • fundref<version>.xsd (will be versioned as of 5.0.0)
    • clinicaltrials<version>.xsd (will be versioned as of 5.0.0)
    • doi_resources<version>.xsd
    • Crossref_query_input<version>.xsd
    • crossref_query_output<version>.xsd
    • unixref<version>.xsd (will be versioned as of 5.0.0)
    • crossref_output<version>.xsd (will be versioned as of 5.0.0)
  • Place Release Notes in the /releasenotes/<version>.md.
  • Merge Request description should reference the release notes files.
  • Generate the Help documentation manually using Oxygen. Upload to inf7 so that it’s served from http://data.crossref.org/reports/help/schema_doc/<version>/
  • Prepare a production release-ready version of the new version of the relevant XSD Schema file.
    • This doesn’t prevent it from being iterated and improved in response to testing.
  • Prepare a number of complete sample XML files in /examples/<version> that implement every useful feature of the schema. This will be used for member education.
  • Prepare a set of test XML test files in /test/<version> that exercise every corner of the whole schema, or every change. These will be public and checked into source control, so must be of the right quality. Corners:
    • Every field present.
    • Presence / absence of optional fields.
    • Representation differences due to XML mixed content representation.
    • Exercise the maximum and minimum length of any fields that have limits, using a selection of multibyte characters.
  • Code review example files with e.g. the tech lead of Deposit System or REST API to ensure that every feature / new feature is exercised.
  • Ensure that the Continuous Integration tests run correctly to indicate that the schema is well formed and that both the test files and the example files validate using the new schema.
  • Work with Product Owner and Tech Lead of REST API.
    • How to model in JSON?
    • Is this going to cause problems extending the existing JSON schema? Will this need a breaking change in the REST API?
    • For each example file generate the expected JSON representation that will be produced by the REST API. Attach these to an implementation ticket.
    • Expected JSON output for every sample XML file.
    • Reviewed by tech lead of REST API.
  • Review all Schematron rules, both advisory and compulsory. Write implementation tickets, along with test cases.

Effects on how our system parses affected XML documents

The following questions should be addressed:

  • Will adaptations to the code required to implement the new schema mean that they become incompatible with the old schema, forcing a branch in the code?
    • Deposit system
    • REST API
  • Will the change in container, type and subtype require any retroactive changes?
    • Deposit system
    • REST API
  • Will any entities we keep track of in a SQL database (such as authors or journals) need to be updated?
  • Will this require changes to the Metadata Repository Pusher to enable them to be indexed in the REST API?
    • Does it need to be updated to push new documents? Depending on the decision, add to the immediate implementation plan or for later.
    • If we don’t expect to immediately push into the REST API, do we need to add functionality to retroactively push missed items?
  • Any patching Tools that are used on a regular basis shoudl be reviewed.
  • Review all reports to check if they will be affected.

How will this be queried?

  • Prefer query, filter or search exclusively via the REST API.
  • List all fields that might be queryable.
    • Consider every new field and how it might be queried / searched / filtered / facet.
    • Write user stories for each type of query. Prototype those queries to at least sense-check.

Step 2: Implementation

Implementation in CS Deposit System

  • Copy the schema files into the CS repository in the /web/schemas directory.
  • Copy the example and test XML files into /web/test-resources.
  • Write (or extend) unit tests that load the test XML files. That may involve a refactor.
    • At the least exercise org.crossref.xs.util.CrossRefXmlUtil's assertValidXsdFile and validateXml.
  • Make any SQL changes if they’ve been identified.
  • Update the list of schema files in org.crossref.common.xml.ValidateTool.
  • Update the list of schema files in org.crossref.common.xml.CrossRefXmlParserPoolImpl.
  • Update the CURRENT_SCHEMA_VERSION property. This will be reflected in the following. (Note: In the current CS code prior to the release of 5.0.0 these are hard-coded to reference the latest schema. From 5.0.0 all references to the schema in code that emits XML will use the most recent version. The below list will need to be updated to use this parameter as part of initial roll out.)
    • org.crossref.qs.view.UnixsdView
    • org.crossref.qs.view.DoiInfoView
    • org.crossref.xs.notifications.formatters.CrossrefResult2Formatter
    • org.crossref.qs.view.QSErrorXmlView
    • org.crossref.qs.controllers.CoaccessController
    • org.crossref.qs.controllers.GuestQueryController
    • org.crossref.qs.view.ForwardLinkview
  • If new relation types are being added, update org.crossref.ds.relations.RelationTypes
  • Review org.crossref.xs.utils.CrossRefXmlUtil.
  • Make any changes identified to depositing code, e.g. Metadata manager, Web Deposit Tool, patch tools.
  • Make any changes identified to code that uses the new schema changes.

Implementation of Query in REST API

  • If any new schema features are required, implement them, probably in cayenne.formats.unixref.
  • Ensure all of the test and example XML files can be parsed correctly, without error and as expected by Cayenne.
    • Copy all test and example files into /dev-resources/parser-regression.
    • Follow the procedure (documented in Cayenne) to produce Item Tree EDN files for parser regression.
    • Follow the procedure for producing Item Tree to JSON files.
    • For each EDN and JSON file carefully check against the files produced in the planning stage.
  • Implement and test all facets, filters and search queries.
  • If necessary review the list of relation types in cayenne.data.relations.
  • If necessary review the list of types cayenne.ids.type.
  • Review API documentation in Swagger and related text.

Education Site

  • If needed, Update the education site to include the new content type.
  • Update wording in https://www.crossref.org/education/content-registration/crossrefs-metadata-deposit-schema/schema-versions/
  • Update table in https://www.crossref.org/education/content-registration/crossrefs-metadata-deposit-schema/crossref-xsd-schema-quick-reference/
    • Schema XSD link in the table should point to the schema file in the master branch.
    • ‘Further Info’ link should link to the Oxygen Docs.

Step 3: Release

Pre-release

  • Deploy the feature branch of CS in test sandbox and production.
  • Deploy the new version of the REST API .
  • Merge, tag, release the feature branch of the Schemas repo. Check the master branch has the latest version numbers.
  • Smoke test with at least some of the more representative test XML files.
    • Smoke-test deposits in test sandbox. Deposit through to retrieval via e.g. OpenURL.
    • Smoke-test deposits in production. Deposit through to retrieval via e.g. OpenURL.
    • Ensure that production-deposited files are indexed in the REST API. Retrieve directly by DOI and / or query and filter.

Release

  • Publish changes to the education site.
  • Make some kind of noise on social media.
  • Invite members to try depositing.

Finally

  • If there are bugs to iron out after roll-out, this constitutes a change to the schema and should be changes following Semantic Versioning. The previous schema version must remain as-is.