UN/CEFACT Code Management Project Submission

G. Ken Holman

Crane Softwrights Ltd.

Status

This is a personal Canadian UN/CEFACT Expert contribution to the document titled “Code Management Project” dated 2016-10-17 for the consideration of UN/CEFACT regarding its project proposal. Although the author of this contribution is the chairman of the OASIS Code List Representation Technical Committee ([CLRTC], this contribution is not to be considered a formal submission from OASIS [OASIS].

$Date: 2017/01/27 18:00:13 $(UTC)


Table of Contents

1. Introduction
2. Project scope
3. Conclusion
Bibliography

1. Introduction

Values from a coded value domain are important information items in electronic information interchange. Standalone validation of electronic documents is an important step in information processing, relieving applications from the burden of having to perform validation themselves. Expressing the validation constraints declaratively provides the opportunity to leverage the implementation of this step across all applications.

There is a dichotomy in electronic information validation: structural validation and value validation. Moreover, structural validation happens at two levels itself: the hierarchical structure of content components and their supplemental components, and the lexical structure of the values of the components themselves. And there is a dependency: only when the hierarchical and lexical structures of information have been confirmed does it make sense to validate the values found within those structures.

Structural validation is further distinguished from value validation by fluidity, or more appropriately, the lack thereof. Structural validation has a very long shelf life. Once the structure of information is properly ascertained, overcoming its inertia takes a lot of effort. Changes to structure have far-reaching implications in information representation and in deployment. Prohibiting changes to allowed structure improves the stability of a deployment. Trading partners can invest in the processes used to validate the structure of the information and be comfortable that those processes will not change frequently.

Value validation, on the other hand, can be quite fluid. Not only do coded value domains grow and shrink, but specific coded values get ambiguously redefined. For example, the “Romanian New Leu” currency code “RON” in 2005 represents a single value worth over 200,000 times the value of a single “RON” in 1951, after the code for the country’s currency changed twice in the meantime. Moreover, users of a business vocabulary may very well change their own requirements for trading partners over the course of a business day. Perhaps a change in a business relationship governs that the payment means codes applicable to documents in the morning are different from those applicable to the same trading partner in the afternoon due to a change in the trustworthiness of that partner. Perhaps legislative impacts from outside the trading relationship change the codes that are applicable within the trading relationship.

In the past, typically for XML documents [XML 1.0], structural and value validation has been conflated into a single W3C XML schema [W3C Schema] (XSD) expression attempting to manage both the inertial components and the fluid components simultaneously, while each have their own very different management life cycles and applicabilities. This can be a challenge to implementers.

2. Project scope

These comments regarding the UN/CEFACT project scope are framed by the work done in OASIS [OASIS] in developing business vocabularies such as OASIS Universal Business Language (UBL) [UBL-2.1], also standardized as ISO/IEC 19845:2015. The validation artefacts for UBL are governed by the specific application [UBL-NDR] of general business document naming and design rules [BDNDR-v1.0].

All six points can be addressed from the UN/CEFACT project proposal in light of the work done with UBL:

The project will define the procedures, rules and methodologies for the following identified issues. Existing rules, such as those defined in the Core Components Technical Specification, CCTS, should be taken into account and if applicable be respected.

The project should take into account any UN/CEFACT deliverable that apply codes.

The primary target audience is UN/CEFACT Experts developing deliverables using coded representations but guidance to end users should be added when appropriate.

  1. Version compatibility

    The ability to use the latest possible version of a code list in association with any version of a message, i.e. decoupling the versioning of code lists from the business message versions

  2. Extending code lists

    Evaluate if permanent extensions are possible and desirable

  3. Restricting code lists

    Provide rules and methodology for restricting code lists for use within specific context. Users of the UN/CEFACT libraries may identify any subset they wish from a specific code list for their own community requirements.

  4. Code list validation rules

    Provide rules and methodology for how to validate instance documents against an XML Schema or UN/EDIFACT message type in respect to code lists

  5. Temporary codes

    Provide rules and methodology for the inclusion of temporary codes that will be replaced by a permanent code at the next UN/CEFACT standardised release.

  6. Externally maintained code lists

    Define rules and procedures for referencing code lists maintained by organisations external to UN/CEFACT, e.g. ISO, ICC, W3C.

Regarding (1) “version compatibility” and the stated need to decouple code lists from business message versions, the OASIS BDNDR expressly decouples value validation from structural validation. This recognizes the dichotomy of inertial structural validation and fluid value validation. Structural validation is normative in the UBL specification. Value validation in UBL is not normative, recognizing that users will have many and varied and contextual requirements for validating the values found in a document. Nevertheless, the semantic definitions of various components in UBL are described with data type qualifications citing the coded value domains that are characteristic for the information item. This informs a demonstrative environment in UBL illustrating the application of value constraints to information items and the value validation of instances that include these items. Users are welcome to use the demonstrative environment as is, or they can adapt the value validation artefacts to their own needs without any hint of violation of conformance as conformance is restricted only to structural validation.

Regarding (4) “Code list validation rules” and the stated need to validate the codes found in instance documents, having both the normative and non-normative sections of the UBL specification dictates a two-pass validation regime. Structural validation is executed independent of value validation. W3C XML Schema is used with the normative and unchanging-for-all-users structural constraints in a community using UBL. Trading partners then execute value validation using whatever tools are appropriate to their environment. The UBL demonstration validation environment uses XSLT for second-pass value validation process. Should value constraints need to change for whatever reason, the sacrosanct XML Schema expressions remain untouched. All community members use the same schemas for the inertial structural constraints, while the many and varied and contextual requirements for value validation agreed upon between trading partners, perhaps even in real time, are realized as needed. These two validation artefacts are labeled (1) and (2) in this diagram:

Figure 1. Two-phase validation

image/svg+xml

Regarding (6) “Externally maintained code lists”, the UBL distribution uses OASIS Genericode [genericode] to publish all of the code lists used in its demonstration environment. The genericode specification describes an XML vocabulary expressing arbitrary code list metadata, coded values and coded value metadata in a sparse matrix. The directory of genericode files for UBL 2.1 is at http://docs.oasis-open.org/ubl/os-UBL-2.1/cl/gc/default/. The preference in the UBL project would be to have custodians of code lists publish their own “master copies” of genericode versions of coded value domains, rather than be obliged to recreate them within the project in order to be utilized by the committee and by users.

This is an excerpt from the payment means genericode file at the above directory, showing the list metadata (<Identification>), the possible value metadata (<ColumnSet>) and the first two of 75 rows:

<gc:CodeList xmlns:gc="http://docs.oasis-open.org/codelist/ns/genericode/1.0/">
   <Identification>
      <ShortName>PaymentMeansCode</ShortName>
      <LongName xml:lang="en">Payment Means Code</LongName>
      <LongName Identifier="listID">UN/ECE 4461</LongName>
      <Version>D10B</Version>
      <CanonicalUri>urn:un:unece:uncefact:codelist:standard:UNECE:PaymentMeansCode</CanonicalUri>
      <CanonicalVersionUri>urn:un:unece:uncefact:codelist:standard:UNECE:PaymentMeansCode:D10B</CanonicalVersionUri>
      <LocationUri>http://docs.oasis-open.org/ubl/os-UBL-2.1/cl/gc/default/PaymentMeansCode-2.1.gc</LocationUri>
      <AlternateFormatLocationUri MimeType="text/xml">http://www.unece.org/fileadmin/DAM/uncefact/codelist/standard/UNECE_PaymentMeansCode_D10B.xsd</AlternateFormatLocationUri>
      <Agency>
         <LongName xml:lang="en">United Nations Economic Commission for Europe</LongName>
         <Identifier Identifier="http://www.unece.org/trade/untdid/d11a/tred/tred3055.htm">6</Identifier>
      </Agency>
   </Identification>
   <ColumnSet>
      <Column Id="code" Use="required">
         <ShortName>Code</ShortName>
         <Data Type="normalizedString"/>
      </Column>
      <Column Id="name" Use="required">
         <ShortName>Name</ShortName>
         <Data Type="string"/>
      </Column>
      <Column Id="description" Use="required">
         <ShortName>Description</ShortName>
         <Data Type="string"/>
      </Column>
      <Key Id="codeKey">
         <ShortName>CodeKey</ShortName>
         <ColumnRef Ref="code"/>
      </Key>
   </ColumnSet>
   <SimpleCodeList>
      <Row>
         <Value ColumnRef="code">
            <SimpleValue>1</SimpleValue>
         </Value>
         <Value ColumnRef="name">
            <SimpleValue>Instrument not defined</SimpleValue>
         </Value>
         <Value ColumnRef="description">
            <SimpleValue>Not defined legally enforceable agreement between two or more parties (expressing a contractual right or a right to the payment of money).</SimpleValue>
         </Value>
      </Row>
      <Row>
         <Value ColumnRef="code">
            <SimpleValue>2</SimpleValue>
         </Value>
         <Value ColumnRef="name">
            <SimpleValue>Automated clearing house credit</SimpleValue>
         </Value>
         <Value ColumnRef="description">
            <SimpleValue>A credit transaction made through the automated clearing house system.</SimpleValue>
         </Value>
      </Row>
      ...
   </SimpleCodeList>
</gc:CodeList>

Regarding (2) “Extending code lists”, a permanent extension to a code list is realized by publishing the genericode code list with the old and new values and associated revised version information. That version information would distinguish each code list when referenced from the supplemental components of a BIE (if provided by the user).

Regarding (3) “Restricting code lists”, this is addressed in the UBL project by the use of genericode and the application of OASIS Context/value Association [CVA] files. A CVA file specifies the XPath [XPath 1.0] contexts of an XML document and the union of genericode code lists applicable to each context. A restricted code list is a shorter genericode version of the applicable full-list genericode file, but the shorter file has unique code list metadata reflecting the subset of values (after all, the subset list is not the complete list and so should have different metadata). To accommodate user data BIE metadata citing the complete list, but the business environment mandating the restricted list, the CVA file employs the concept of a masquerade. The masquerade overlays the complete list’s metadata in place of the restricted list’s metadata during the validation process in real time. This prevents confusion and ambiguity regarding the identity of the restricted list which is not and should not be identified as a complete list in its metadata. A BIE citing the full list will successfully validate against the restricted list using the masquerade of the full list. This ensures multiple restricted lists of the same full list can be uniquely identified and managed by their respective distinguished metadata. Different trading partners can use different restricted subsets of code lists.

Regarding (5) “Temporary Codes”, this is also addressed using a CVA file. The new temporary codes would be expressed in a dedicated genericode expression of only the new values. This dedicated file would have its own identity separate from the published base genericode list because it is, in fact, not part of that base list. The CVA file would express in real time in any given context the union of the published base genericode list with the temporary codes list, and the masquerade would make the entire list appear to have the base list’s metadata. In this way at no time is there an ambiguous publication of a mixed list with metadata that could be confused with the metadata of the published list. When the published list is revised, the new temporary values are incorporated as in (2) extending a code list.

Some final notes regarding the use of CVA and genericode reflect the flexibility offered by the specifications.

Unlike XSD enumerations binding the same enumeration to all contexts of a globally-declared and reused BIE in a document, the use of XPath in CVA provides for specifying different unions of code lists at different contexts of the one BIE. Perhaps the user needs to validate against different lists of country codes at different country code BIE locations of a single document.

Both genericode and CVA specifications are declarative and, thus, can be implemented by arbitrary means and approaches, at least one of which is openly available to all. The data type qualification second-pass value validation artefact UBL-DefaultDTQ-2.1.xsl in UBL 2.1 at http://docs.oasis-open.org/ubl/os-UBL-2.1/val/ is published from the previously cited genericode files and the CVA file found at http://docs.oasis-open.org/ubl/os-UBL-2.1/cva/ through the application of ISO/IEC 19757-3 Schematron [Schematron] using the free “cva2sch” software that is publicly available on the author’s web site[Crane Resources]. The Schematron expressions leverage any code list metadata found in the BIE’s supplemental components to ensure the appropriate genericode expression of codes is used in the given document context.

Finally, these XML expressions can be processed by applications creating visual interfaces in order to tailor drop-down lists of coded value domains presented to users.

The use of CVA and genericode is illustrated in the following figure. Note the union of two genericode files expressed in the second of the three contexts in the CVA file.

Figure 2. Contextual application of genericode code lists using a CVA file

image/svg+xml

3. Conclusion

The UN/CEFACT proposal is for an important project related to an old issue of code list management, publication and application. And it is important to consider new ways of looking at old problems. If the new project is open to considering any or all of the concepts presented in this submission, the author is anxious to participate in the project’s progress. The genericode and CVA specifications have been demonstrated successfully to apply data type qualifications found in published code lists to contexts of XML documents. More details and answers to questions can be provided on request.

Bibliography

[BDNDR-v1.0] Business Document Naming and Design Rules Version 1.0. Edited by Tim McGrath, Andy Schoka and G. Ken Holman. 18 January 2017. OASIS Standard. http://docs.oasis-open.org/ubl/Business-Document-NDR/v1.0/os/Business-Document-NDR-v1.0-os.html. Latest version: http://docs.oasis-open.org/ubl/Business-Document-NDR/v1.0/Business-Document-NDR-v1.0.html.

[Crane Resources] Crane Softwrights Ltd. Free developer resources

[genericode] Tony Coates OASIS Code List Representation (Genericode) Version 1.0, 2007-12-28

[UBL-2.1] Jon Bosak, Tim McGrath, G. Ken Holman Universal Business Language Version 2.1. 04 November 2013. OASIS Standard. http://docs.oasis-open.org/ubl/os-UBL-2.1/UBL-2.1.html. ISO/IEC 19845:2015 International Standard http://www.iso.org/iso/catalogue_detail.htm?csnumber=66370

[UBL-NDR] UBL Naming and Design Rules Version 3.0. 20 July 2016. OASIS Committee Note 01. http://docs.oasis-open.org/ubl/UBL-NDR/v3.0/cn01/UBL-NDR-v3.0-cn01.html. Latest version: http://docs.oasis-open.org/ubl/UBL-NDR/v3.0/UBL-NDR-v3.0.html.

[UBLTC] Jon Bosak, Tim McGrath OASIS UBL Technical Committee 2001

[XPath 1.0] James Clark, Steve DeRose XML Path Language (XPath) Version 1.0 1999-11-16

[XSLT 1.0] James Clark XSL Transformations (XSLT) Version 1.0 1999-11-16