DPV as a SKOS vocabulary: Analysis

Analysing options for expressing DPV as a SKOS vocabulary
published:
by Harshvardhan J. Pandit
is part of: Data Privacy Vocabulary (DPV)
is about: Data Privacy Vocabulary (DPV)
DPV DPVCG semantic-web
skos reference
https://www.w3.org/TR/skos-reference/
skos primer
https://www.w3.org/TR/skos-primer/

1 SKOS Basics

1.1 Concept

  • skos:Concept
  • Concept is the equivalent of both class and instance within SKOS.
  • All concepts within DPV which are now provided as classes would become skos:Concept

1.2 Labels

  • The skos:prefLabel provides a way to state this is the preferred method of referring to a concept
  • It necessitates that prefLabel be unique i.e. no two concepts should have the same preferred labels
  • In DPV, prefLabel is what we used to state this is recommended way of referring to the concept, and also what we use in the IRI i.e. dpv:<prefLabel>
  • For other labels, there is skos:altLabel and skos:hiddenLabel
  • skos:altLabel provides alternate ways of referring to the same concept
  • In DPV, altLabel provides a way to incorporate other ways of referring to the same concept. For example,

    dpv:PrivacyNotice a skos:Concept ;
        skos:prefLabel "Privacy Notice"@en ;
        skos:altLabel "Privacy Policy"@en .
    
  • In DPV, altLabel provides a way to have different labels arising from other standards or common uses. For example, instead of creating an entirely separate vocabulary for ISO, the equivalent concepts can be indicated using a customised property for ISO labels.

    dpv:labelISO rdfs:subPropetyOf skos:altLabel .
    dpv:DataController skos:prefLabel "Data Controller" ;
        dpv:labelISO "PII Controller" .
    
  • hiddenLabel is a way to have labels for convenience and other uses, but which usually would be hidden from general use of that concept.

1.3 Hierarchy Relationships

  • In RDFS and OWL, hierarchy is specified using the unidirectional property subClassOf
  • In SKOS, the properties skos:broader and skos:narrower are used to indicate relationship between two concepts bidirectionally
  • This is used as follows: A skos:broader B means A has B as a broader concept of itself. For example cat skos:broader mammal

    I always find it confusing to remember how broader and narrower are to be used in terms of their direction. As a mnemonic, I use has as the prefix of these properties to make sense of what they are supposed to mean. Therefore, saying A <has>broader B helps understand A has B as its broader concept.

  • In SKOS, the properties skos:broaderTransitive and skos:narrowerTransitive are properties used for transitive inferences, and are super properties of skos:broader and skos:narrower respectively..
  • In DPV, the hierarchies are expected to be transitive, which necessitates the use of SKOS transitive properties. Otherwise we lose inferences. See example:

    # given this data (example concepts)
    dpv:PersonalData skos:narrower dpv:SpecialCategoryPersonalData .
    dpv:SpecialCategoryPersonalData skos:narrower dpv:GeneticData .
    # this inference would be wrong because of non-transitive properties
    dpv:GeneticData skos:broader dpv:PersonalData .
    
    # however, using transitive versions like this would work as expected
    dpv:PersonalData skos:narrowerTransitive dpv:SpecialCategoryPersonalData .
    dpv:SpecialCategoryPersonalData skos:narrowerTransitive dpv:GeneticData .
    # these two inferences are now possible and correct
    dpv:GeneticData skos:broader dpv:PersonalData .
    dpv:GeneticData skos:broaderTransitive dpv:PersonalData .
    
  • The property skos:Related is used to indicate another concept is related without establishing any specific relationship between the two.
  • SKOS thus provides a way to indicate a hierarchy using broader and narrower and related to associate unidirectionally another concept.
  • SKOS specifies that concepts related via a hierarchy should not also be associated through the related property. This means related is a strictly specified relationship between concepts not present in the same hierarchy.

1.4 Definitions, and notes for Examples, Changes, and Scope

  • SKOS has the general property skos:note to associate a note with a concept. A note can be anything: a literal, a string, another node.
  • SKOS provides subproperty skos:definition to indicate the definition of a concept.
  • In DPV, we currently use dct:description to indicate definitions. Instead, the skos:definition property is better suited for explicitly indicating definition.
  • SKOS provides subproperty skos:scopeNote to provide information about the scope of a concept.
  • In DPV, the scopeNote property can be used to indicate how the concept is to be interpreted, whether there are any specific considerations regarding the use of interpretation of that concept, or additional information not provided within a definition. This information could be what we usually put in notes alongside a concept in the definition.
  • historyNote, editorialNote, and changeNote are subproperties used to describe the historical, editorial, and provenance related information for a concept.
  • skos:example can be used to provide an example of a concept
  • In DPV, skos:example can be used in two ways: first in providing textual examples of a concept as it occurs in the real world; or secondly by providing examples of how that concept is used in code and use-cases. The first is better for documentation, while the second is better for adoption and use. However, other vocabularise (e.g. dct, vann) exist that could be used to indicate code examples. So this property can be used to help humans understand what the concept is about through example(s).

1.5 Structuring concept hierarchies using SKOS

1.5.1 Replicating OWL/RDFS hierarchy with SKOS broader/narrower

  • Currently, the hierarchy in DPV is expressed using rdfs:subClassOf property usage in the following manner: A -subClassOf-> B
  • An intuitive conversion of this would be like this: A -narrower-> B and B -broader-> A
  • While this will work in that it will provide a taxonomy of concepts structured in a hierarchy, it is not the best method nor the only one SKOS provides.
  • If this is used, there is no good reason to migrate DPV from OWL or RDFS to a similar structure within SKOS.
  • For one, there is no way to indicate a relationship between a concept and a top-concept. For example, consider the following example: EmailAddress is a subclass of PersonalData with some 4 or 5 levels of abstractions between them. To indicate EmailAddress is a category of personal data, one would have to either travel up the chain of subclass relationships of use a reasoner to add statements that directly state EmailAddress is a subclass of PersonalData. This is a lot of non-intuitive usage.

1.5.2 Concept Schemes in SKOS

  • SKOS provides the skos:ConceptScheme class to group related concepts together in a concept scheme or a thesaurus.
  • ConceptScheme can have annotations dct:title and dct:creator
  • Concepts can be indicated to be a part of a scheme using skos:inScheme
  • To indicate hierarchies and the top-concept within that hierarchy, the property skos:hasTopConcept is used
  • The same concept can be part of different concept schemes
  • The entirety of DPV can be a skos:ConceptScheme with each of its core concepts and modules providing the top concept. This results in a single collection of concepts with multiple hierarchies defined by the top concepts.
  • Another alternative is to define each module or concept collection as a skos:ConceptScheme and to define the concepts within it as top concepts. However, there is no way to collect concept schemes within a package to create DPV.

1.5.3 Collections in SKOS

  • skos:Collection is a way to group related concepts together under an arbitrary label which is not itself a concept. The example given in the primer refers to milk and types of milk (cow, goat, buffalo) and a collection for milk by source animal that includes only the concepts for cow, goat, and buffalo milk.
  • Collections specify inclusion of a concept or another collection using the property skos:member
  • The primer describes where collections may be necessary, and that the same pattern could be replicated by declaring the collection label as a Concept and using broader and narrower properties to construct a hierarchy.
  • It concludes with the decision being based on whether the collection should be a concept or not. If yes, then ConceptScheme may be more suitable. If not, then Collection would be more suitable.
  • For DPV, using skos:Collection seems to incur additional complexities without any apparent benefits. So far, we do not have any specific hierarchy or collection that cannot be represent using ConceptScheme.

2 DPV as SKOS vocabulary

2.1 Requirements

  • We have Concepts that have a hierarchy; this can be specified using skos:Concept and skos:broader and skos:narrower relationships
  • We have properties that relate concepts, e.g. dpv:hasPersonalData whose range we want to have as an instance of dpv:PersonalData.
  • It should be possible to use multiple concepts as types, for e.g. to declare something is an instance of two purposes as:

    ex:MyPurpose a dpv:Marketing, dpv:Personalisation .
    

    which is an issue as SKOS concepts cannot be 'combined' in a similar manner to what we assume RDFS/OWL2 classes can be.

  • If possible, we would like to keep meta-modelling and OWL-DL compatibility. This would means having the T-box and A-box be disjoint sets. While not affecting the SKOS usage in any major manner, this has implications on use of DPV in OWL2 and more specifically reasoner-oriented tasks and use-cases.
  • We want a way to package all concepts and hierarchies within DPV. While currently we don't explicitly declare this in the RDFS/OWL2 vocabulary, if there is a way to express this formally, we could do it.

2.2 Proposal for providing DPV using both SKOS and RDFS/OWL

  • The top-level classes are declared as an instance of both owl:Class and skos:Concept. This permits creating instances of that class that are compatible with both OWL (as an instance) and SKOS (as members of concept scheme). This also keeps the T-box and A-box separate by not having them mixing together.
  • In the example below, we have PersonalData as the top-level concept which is declared as also a class. This permits the following:

    <dpv> a skos:ConceptScheme ;
        skos:hasTopConcept dpv:PersonalData .
    dpv:PersonalData a owl:Class, skos:Concept ;
        dct:title "Personal Data"@en ;
        skos:inScheme <dpv> .
    dpv:Email a dpv:PersonalData, skos:Concept ;
        skos:prefLabel "Email"@en ;
        skos:narrower dpv:EmailAddress .
    dpv:EmailAddress a dpv:PersonalData, skos:Concept ;
        skos:prefLabel "Email Address"@en ;
        skos:broader dpv:Email .
    dpv:hasPersonalData a owl:ObjectProperty ;
        rdfs:range dpv:PersonalData .
    
  • This has the following implications:
    • The property hasPersonalData can be defined with range PersonalData and can correctly refer to both Email and EmailAddress
    • This use of PersonalData is okay, because we never expect the following: <Something> hasPersonalData PersonalData
    • Email and EmailAddress are related using the SKOS hierarchy instead of OWL
    • Email and EmailAddress cannot be resolved using subclass mechanism anymore. For this a separate OWL equivalence ontology would have to be created which specifies subClassOf relationships instead of broader. As the semantic implications of this OWL iteration are different from those of DPV, it would be better to provide it using a separate IRI.
    • Note that mixing SKOS and OWL for both classes and instances would turn this into OWL-Full and cause issues when using a reasoner, like this:

      dpv:PersonalData a owl:Class, skos:ConceptScheme .
      # issue1: instances of concept scheme are incorrect
      # issue2: a class as instance of another class
      dpv:Email a owl:Class, dpv:PersonalData .
      # issue3: property assertions are complex
      # issue4: skos:Concept and skos:ConceptScheme as disjoint
      dpv:Email a skos:Concept ;
          skos:inScheme dpv:PersonalData .
      ex:MyEmail a skos:Concept ;
          skos:inScheme dpv:PersonalData ;
          skos:broader dpv:Email 
      # the range of dpv:hasPersonalData cannot be stated
      # unless we use [ skos:inScheme dpv:PersonalData ] as path
      ex:PDH dpv:hasPersonalData ex:Email .
      ex:PDH dpv:hasPersonalData ex:MyEmail .
      
  • To create further instances of a concept provided in DPV, such as EmailAddress and a specific email address, SKOS could still be used.

    dpv:EmailAddress a dpv:PersonalData, skos:Concept .
    ex:MyEmail a dpv:PersonalData, skos:Concept ;
        skos:broader dpv:EmailAddress ;
        skos:prefLabel "[email protected]"^^xsd:string .
    # okay to use property like this
    ex:PDH dpv:hasPersonalData ex:EmailAddress .
    ex:PDH dpv:hasPersonalData ex:MyEmail .
    

3 Summary

  • DPV as an ontology also becomes a skos:ConceptScheme
  • Core and other top-level classes become skos:Concept with skos:inScheme <DPV>
  • Core and other top-level classes are instances of owl:Class
  • Taxonomies are created using instances of skos:Concept and using skos:broader and skos:narrower relationships.
  • Properties are declared with domain or range as the appropriate top-level class, for example dpv:hasPersonalData rdfs:range dpv:PersonalData
  • What used to be instances of specific concepts are now represented as instances of skos:Concept and whatever top-level concept they represent. For example, as: ex:MyEmail a dpv:PersonalData, skos:Concept ; To declare what is their closest concept within DPV taxonomy, SKOS properties are used thus: ex:MyEmail skos:broader dpv:EmailAddress, dpv:Identifier .
  • T-Box and A-box are kept strictly separate thus making this OWL-DL compatible. However, SPECIAL and TRAPEZE's reasoners won't work any longer because there are no sub-class relationships. To remedy this, a separate serialisation using OWL and using a separate IRI is provided.
  • For other general uses, SKOS and OWL mixed like this provide a better possibility for using as needed, whether requiring property domains and ranges, or for further extending concepts and creating instances at arbitrary levels of abstractions.
  • SKOS provides a lot of useful organisational tools, like ConceptScheme which can be further used to group concepts without declaring hierarchies. For example, in LegalEntity, concept schemes can be created to separate what is essentially a legal role such as a controller from what is a type of organisation such as SME. Through this, the actual legal entity taxonomy would be clean and not include these categorisation, since ConceptScheme is disjoint from Concept within SKOS.

Example RDF for dpv-skos consistency checking

I created the following minimal set of information to test whether such usage of SKOS and OWL is okay or if a reasoner might throw errors and inconsistencies for using it.

Protege with reasoners FACT++ and Pellet produces no errors or inconsistencies. The OWL profile checker indicates issues for OWL2 QL and EL profiles based on SKOS's use of transitive properties and property domain/range assertions. Other than that, this use has no issues for OWL2 DL, RL, and Full profiles.