Tags |
Every change to the schema is unique, but this checklist is a good skeleton procedure to follow. Not all points will apply to all changes, but you should consider each point. They are in chronological and dependency order, so later points rely on earlier points having been completed. Different tasks should be done variously by tech leads and product owners of a number of services across the Crossref’s biome.
Hint: Checkboxes are included in this page. Copy and paste the source Markdown of this page into an Epic issue to tick them off. You can find this easily by clicking ‘Edit this page’ at the bottom.
Principles
This checklist is predicated on these principle:
- Never add a retroactive constraint, for example reducing the maximum length of a field. Doing so would break backwards compatibility meaning that we wouldn’t be able to automatically bump the schema bundle version for all emitted XML.
- Our Schema Bundle comprises:
- Deposit Schema: full metadata deposits.
crossref<version>.xsd
- Common: common elements used by other schemas.
common<version>.xsd
- Fundref: funding data.
fundref<version>.xsd
- Access Indicators: license data.
AccessIndicators<version>.xsd
- Relations: relationships between DOIs and other identifiers.
relations<version>.xsd
- Clinical Trials: relationships between publications that report on a clinical trial.
clinicaltrials<version>.xsd
- DOI Resources: used to append or update specific sets of metadata to an existing record.
doi_resources<version>.xsd
- Core: The whole metadata schema, used by the Deposit Schema, UniXref and query schema. New as of 5.0.0.
crossref_core<version>.xsd
- Query Input: Used to input XML queries to the system.
Crossref_query_input<version>.xsd
- Query Output: Returns Query Results.
crossref_query_output<version>.xsd
- UniXref: Returns query results in the UNIXML format.
unixref<version>.xsd
- OAI-PMH: Accommodates differences between Crossref’s OAI-PMH implementation and the published 2.0 schema. Unversioned.
OAI-PMH.cr.xsd
- Deposit Schema: full metadata deposits.
- The Schema bundle is versioned as a whole, according to Semantic Versioning.
- For each version of the bundle, every versioned Schema within it has the same version.
- e.g. for version
5.0.0
,crossref5.0.0.xsd
will referencecrossref_core5.0.0.xsd
, which will referencerelations5.0.0.xsd
. - Any change to any schema file will result in the whole bundle being bumped. So a new relation type would result in
relations5.1.0.xsd
, along with every other file using that version, and released as that version.
- e.g. for version
Step 1: Preparation
Planning
- Create an implementation plan, based on this framework.
- The list here should form an Epic, with sub points as user stories.
- Are we planning to index new-version XML deposits in the REST API from the point in time of accepting them, or are we going to index at a later date?
- If we plan to index immediately, REST API steps should be part of the whole project plan.
- If not, they must still be considered so that we don’t introduce changes that will be subsequently problematic.
- Are we planning to update our deposit tools to emit the new schema?
- Web deposit form
- Metadata Manager
- POST Deposit
- Synchronous deposit v1
- Synchronous deposit v2
What kind of change, what does it represent, version number?
Aim to roll out schema changes one feature at at time, not as a big-bang release. This will mean splitting a big changes up into small versions of useful clusters.
- Each change to the Schema bundle follows Semantic Versioning. Decide on the new version number.
- The whole schema bundle is versioned as a whole.
- Each schema file within it will have the same version number.
- Entirely new schema files will incur a bump to the Major version of the bundle.
- Update to existing schema
- Does this constitute a backward-incompatible breaking change to the schema? I.e. will documents that succeed with this schema fail with previous ones?
- Does this introduce new enumerations (e.g. new relation types) that will require changes in code (either CS or Cayenne)?
- Decide on new version number for the Schema. Referring to Semantic Versioning.
- Does this affect content types or container types in the Deposit System and/or REST API? Consult
Content Types
. Does this require:
- a new container type / subtype from the perspective of Content System’s
CitationInfo
vocabulary? - a new type within the REST API vocabulary?
- a new REST API type (alongside
/works
) in the REST API? - changes to the Query System / OAI-PMH tables?
- changes to OpenURL?
- changes to the list of types in ORCID Auto-update?
- a new container type / subtype from the perspective of Content System’s
- Does this affect constraints? For example, changing the length of a field, or checks we perform? Review:
- Deposit System database storage for any field lengths updates. Pay attention to character encoding / collation.
- Deposit System database storage to see if we need to update foreign key constraints or other relational issues in the SQL schema. e.g. CitedBy table.
- Query System database storage for field lengths. Pay attention to character encoding / collation.
- REST API Elastic Search schema and whether any mappings will need to be updated.
Finalize Schema and data files
- Create a feature branch of the Schema repository named after the version, e.g.
5.0.0
.- This will contain the schema files as they are being worked on. All work for this schema should be done on the branch.
- Don’t merge this until everything is finalized at the end of this checklist.
- Create a merge request against the master branch prefixed WIP (Work In Progress), e.g.
WIP: 5.0.0
. - Create a version-bumped copy of every schema file Schema Repository’s
/schemas
directory, e.g.:crossref-core<version>.xsd
(this will be introduced out as a common piece between the deposit schema and the query schema as of version 5.0.0)crossref<version>.xsd
common<version>.xsd
fundref<version>.xsd
(will be versioned as of 5.0.0)clinicaltrials<version>.xsd
(will be versioned as of 5.0.0)doi_resources<version>.xsd
Crossref_query_input<version>.xsd
crossref_query_output<version>.xsd
unixref<version>.xsd
(will be versioned as of 5.0.0)crossref_output<version>.xsd
(will be versioned as of 5.0.0)
- Place Release Notes in the
/releasenotes/<version>.md
. - Merge Request description should reference the release notes files.
- Generate the Help documentation manually using Oxygen. Upload to
inf7
so that it’s served fromhttp://data.crossref.org/reports/help/schema_doc/<version>/
- Prepare a production release-ready version of the new version of the relevant XSD Schema file.
- This doesn’t prevent it from being iterated and improved in response to testing.
- Prepare a number of complete sample XML files in
/examples/<version>
that implement every useful feature of the schema. This will be used for member education. - Prepare a set of test XML test files in
/test/<version>
that exercise every corner of the whole schema, or every change. These will be public and checked into source control, so must be of the right quality. Corners:- Every field present.
- Presence / absence of optional fields.
- Representation differences due to XML mixed content representation.
- Exercise the maximum and minimum length of any fields that have limits, using a selection of multibyte characters.
- Code review example files with e.g. the tech lead of Deposit System or REST API to ensure that every feature / new feature is exercised.
- Ensure that the Continuous Integration tests run correctly to indicate that the schema is well formed and that both the test files and the example files validate using the new schema.
- Work with Product Owner and Tech Lead of REST API.
- How to model in JSON?
- Is this going to cause problems extending the existing JSON schema? Will this need a breaking change in the REST API?
- For each example file generate the expected JSON representation that will be produced by the REST API. Attach these to an implementation ticket.
- Expected JSON output for every sample XML file.
- Reviewed by tech lead of REST API.
- Review all Schematron rules, both advisory and compulsory. Write implementation tickets, along with test cases.
Effects on how our system parses affected XML documents
The following questions should be addressed:
- Will adaptations to the code required to implement the new schema mean that they become incompatible with the old schema, forcing a branch in the code?
- Deposit system
- REST API
- Will the change in container, type and subtype require any retroactive changes?
- Deposit system
- REST API
- Will any entities we keep track of in a SQL database (such as authors or journals) need to be updated?
- Will this require changes to the Metadata Repository Pusher to enable them to be indexed in the REST API?
- Does it need to be updated to push new documents? Depending on the decision, add to the immediate implementation plan or for later.
- If we don’t expect to immediately push into the REST API, do we need to add functionality to retroactively push missed items?
- Any patching Tools that are used on a regular basis shoudl be reviewed.
- Review all reports to check if they will be affected.
How will this be queried?
- Prefer query, filter or search exclusively via the REST API.
- List all fields that might be queryable.
- Consider every new field and how it might be queried / searched / filtered / facet.
- Write user stories for each type of query. Prototype those queries to at least sense-check.
Step 2: Implementation
Implementation in CS Deposit System
- Copy the schema files into the CS repository in the
/web/schemas
directory. - Copy the example and test XML files into
/web/test-resources
. - Write (or extend) unit tests that load the test XML files. That may involve a refactor.
- At the least exercise
org.crossref.xs.util.CrossRefXmlUtil
'sassertValidXsdFile
andvalidateXml
.
- At the least exercise
- Make any SQL changes if they’ve been identified.
- Update the list of schema files in
org.crossref.common.xml.ValidateTool
. - Update the list of schema files in
org.crossref.common.xml.CrossRefXmlParserPoolImpl
. - Update the
CURRENT_SCHEMA_VERSION
property. This will be reflected in the following. (Note: In the current CS code prior to the release of 5.0.0 these are hard-coded to reference the latest schema. From 5.0.0 all references to the schema in code that emits XML will use the most recent version. The below list will need to be updated to use this parameter as part of initial roll out.)org.crossref.qs.view.UnixsdView
org.crossref.qs.view.DoiInfoView
org.crossref.xs.notifications.formatters.CrossrefResult2Formatter
org.crossref.qs.view.QSErrorXmlView
org.crossref.qs.controllers.CoaccessController
org.crossref.qs.controllers.GuestQueryController
org.crossref.qs.view.ForwardLinkview
- If new relation types are being added, update
org.crossref.ds.relations.RelationTypes
- Review
org.crossref.xs.utils.CrossRefXmlUtil
. - Make any changes identified to depositing code, e.g. Metadata manager, Web Deposit Tool, patch tools.
- Make any changes identified to code that uses the new schema changes.
Implementation of Query in REST API
- If any new schema features are required, implement them, probably in
cayenne.formats.unixref
. - Ensure all of the test and example XML files can be parsed correctly, without error and as expected by Cayenne.
- Copy all
test
andexample
files into/dev-resources/parser-regression
. - Follow the procedure (documented in Cayenne) to produce Item Tree EDN files for parser regression.
- Follow the procedure for producing Item Tree to JSON files.
- For each EDN and JSON file carefully check against the files produced in the planning stage.
- Copy all
- Implement and test all facets, filters and search queries.
- If necessary review the list of relation types in
cayenne.data.relations
. - If necessary review the list of types
cayenne.ids.type
. - Review API documentation in Swagger and related text.
Education Site
- If needed, Update the education site to include the new content type.
- Update wording in
https://www.crossref.org/education/content-registration/crossrefs-metadata-deposit-schema/schema-versions/
- Update table in
https://www.crossref.org/education/content-registration/crossrefs-metadata-deposit-schema/crossref-xsd-schema-quick-reference/
- Schema XSD link in the table should point to the schema file in the master branch.
- ‘Further Info’ link should link to the Oxygen Docs.
Step 3: Release
Pre-release
- Deploy the feature branch of CS in test sandbox and production.
- Deploy the new version of the REST API .
- Merge, tag, release the feature branch of the Schemas repo. Check the master branch has the latest version numbers.
- Smoke test with at least some of the more representative test XML files.
- Smoke-test deposits in test sandbox. Deposit through to retrieval via e.g. OpenURL.
- Smoke-test deposits in production. Deposit through to retrieval via e.g. OpenURL.
- Ensure that production-deposited files are indexed in the REST API. Retrieve directly by DOI and / or query and filter.
Release
- Publish changes to the education site.
- Make some kind of noise on social media.
- Invite members to try depositing.
Finally
- If there are bugs to iron out after roll-out, this constitutes a change to the schema and should be changes following Semantic Versioning. The previous schema version must remain as-is.