DPV as a SKOS vocabulary: Analysis Part 2

Refining DPV's expression in OWL and SKOS using ConceptScheme
published: Tue Jan 25 2022
by Harshvardhan J. Pandit
is part of: Data Privacy Vocabulary (DPV)
is about: Data Privacy Vocabulary (DPV)
DPV DPVCG semantic-web

I was cleaning the spreadsheet we use to maintain DPV concepts, and while on the newly added location fields (Location, SupraNationalUnion, Country, City) I wondered how they might be used with SKOS, especially since this is one case where there will always be 'instances'. So I wrote down my thought process here to have some clarity and start with an use-case to motivate finding a solution.

tldr; Top-concept is declared as owl:Class and skos:ConceptScheme and subsequent concepts are declared as instances of that top-concept, and also as a skos:Concept with skos:broader/skos:narrower used to indicate relation; and skos:inScheme to denote collections or groupings of different concepts.

Atlantis is a Country. Atlantis City is the capital of Atlantis. Data of Atlantians is permitted to be stored only in Atlantis. If the data is stored within Atlantis City, we need to demonstrate how this is permitted to be stored because Atlantis City is within Atlantis.

First question is how do we model location. If we only use this:

dpv:Location a owl:Class .
dpv:Country a owl:Class .
dpv:City a owl:Class .

Then we lose the relation between Location, Country, and City. Then if we do this:

dpv:Location a owl:Class .
dpv:Country rdfs:subClassOf dpv:Location .
dpv:City rdfs:subClassOf dpv:Country .

It is useful if we want to say everything that is in the set City is also in the set Country. But we get incorrect inferences as well:

ex:AtlantisCity a dpv:City . # assertion
ex:AtlantisCity a dpv:Country . # inference

We could take the horrible road of declaring both Atlantis and AtlantisCity as subclasses instead of instances. But that would not be a good model because we want to use them as instances. So we create a property isPartOf, and declare it to be transitive to derive relations.

ex:isPartOf a owl:TransitiveObjectProperty .
dpv:Location a owl:Class .
dpv:Country a owl:Class .
dpv:City a owl:Class .
ex:Atlantis a dpv:Country .
ex:AtlantisCity a dpv:City ; ex:isPartOf ex:Atlantis .

Now checking whether storage is permitted only requires checking whether there is a path from storage location (AtlantisCity) to Atlantis, which there is because of the isPartOf relation.

Now when we use only SKOS, we have all these as concepts, with broader and narrower relations between them to indicate the isPartOf relations:

dpv:Location a skos:Concept .
dpv:Country a skos:Concept ; skos:broader dpv:Location .
dpv:City a skos:Cnocept ; skos:broader dpv:Country .

However, we want to create 'instances' of countries and cities, which we can either do using skos:Concept and using skos:broader:.

ex:Atlantis a skos:Concept ; skos:broader dpv:Country .
ex:AtlantisCity a skos:Concept ; skos:broader dpv:Atlantis .

Now checking whether storage is permitted can be done by checking if there is a path to Atlantis through skos:broaderTransitive instead of ex:isPartOf. Note that skos:broader is a sub-property of skos:broaderTransitive, so every assertion of skos:broader produces inferences chained through skos:broaderTransitive in a hierarchy. SKOS recommendes using broader to link directly related concepts and to use broaderTransitive to associate relations between linked concepts.

# inferences
ex:Atlantis skos:broaderTransitive dpv:Country .
ex:Atlantis skos:broaderTransitive dpv:Location .
ex:AtlantisCity skos:broaderTransitive ex:Atlantis .
ex:AtlantisCity skos:broaderTransitive dpv:Country .
ex:AtlantisCity skos:broaderTransitive dpv:Location .

So our design is veering towards having both owl:Class and skos:Concept to use the best of both worlds. But if we mix OWL and SKOS like this:

dpv:Location a owl:Class, skos:Concept .
dpv:Country a dpv:Location .
# or someone does
ex:Atlantis a dpv:Country .

Then we create an overlap between skos:Concept and owl:Class, which seems to be a big no-no in this document about SKOS and OWL. Even if OWL2 does provide punning and several other capabilities, it would be better to not freely mix OWL2 classes and instances to avoid punning and overlaps between classes and instances everywhere. This is just to reduce the complexity of graphs, make sensible use of concepts, and have reliable inferences.

We want to declare something is a OWL class and as a SKOS concept is either because we want to create instances of it in OWL or link it in a hierarchy within SKOS. Both of these can be achieved through other SKOS mechanisms, such as use of skos:ConceptScheme which is disjoint from skos:Concept.

dpv:Location a owl:Class, skos:ConceptScheme .
dpv:Country a dpv:Location, skos:Concept ; 
skos:inScheme dpv:Location .
dpv:City a dpv:Location, skos:Concept ; 
    skos:inScheme dpv:Location ;
    skos:broader dpv:Country .
# use-case
ex:Atlantis a dpv:Location, skos:Concept ;
    skos:broader dpv:Country .
ex:AtlantisCity a dpv:Location, skos:Concept ; 
    skos:broader ex:Atlantis .

Now to get a list of countries, one can query the path

[ a skos:Concept ; skos:broader dpv:Country ]

To identify whether something occurs within a country, such as to check whether data storage is permitted; one can use the transitive path

[ a skos:Concept ; skos:broaderTransitive ex:Atlantis ]

To create a separate OWL-only variant, all the SKOS annotations are removed and replaced with their equivalent OWL variants:

dpv-owl:Location a owl:Class .
dpv-owl:Country a owl:Class ; rdfs:subClassOf dpv-owl:Location .
dpv-owl:City a owl:Class ; rdfs:subClassOf dpv-owl:Location ;
dpv-owl:isPartOf a owl:TransitiveObjectProperty .
ex:Atlantis a dpv-owl:Country .
ex:AtlantisCity a dpv-owl:City ;
    dpv-owl:isPartOf ex:Atlantic .

Deriving list of countries is easy because of instances. Checking whether something occurs within a region is also easy because of deterministic inferences drawn from either subclass or another property like isPartOf.

A good reason for this separation through namespaces is that it enables interoperability between the SKOS and OWL versions by not having them mix unless someone wants to (they can explicitly declare equivalence). Now even if this graph is merged with the main DPV vocabulary, there are no issues because the namespaces are separate and therefore the two can be safely used alongside each other (two variants of same vocabulary) or by aligning/mapping the two, one could be transformed to another. This provides compatibility for work that uses only OWL with another one which is based in SKOS. The assumption that most number of uses will be based in SKOS is the argument for why 'SKOS' should be the primary serialisation instead of OWL.

The nice thing about this use of OWL and SKOS is that we get properties that can safely have domains and ranges without worry about how their use might happen with what used to be classes as well as instances. Which means there is no risk of suddenly finding a mixture of classes and instances (if you don't mix owl:Class and skos:Concept). So both the following uses below are okay in OWL and SKOS with the new model:

dpv:hasPersonalData a owl:ObjectProperty ;
    rdfs:range dpv:PersonalData .
dpv:PersonalData a owl:Class, skos:ConceptScheme ;
    skos:hasTopConcept dpv:EmailAddress .  # for convenience here
dpv:EmailAddress a dpv:PersonalData, skos:Concept ;
    skos:inScheme dpv:PersonalData .
ex:MyEmailAddress a dpv:Personal Data, skos:Concept ;
    skos:broader dpv:EmailAddress .
ex:PDH dpv:hasPersonalData dpv:EmailAddress .  # as "class"
ex:PDH dpv:hasPersonalData ex:MyEmailAddress . # as "instance"