An Ontology Design Pattern for Describing Personal Data in Privacy Policies
Workshop on Ontology Design and Patterns (WOP) - co-located with International Semantic Web Conference (ISWC)
✍ Harshvardhan J. Pandit* , Declan O'Sullivan , Dave Lewis
publication 🔓copies: harshp.com , TARA , zenodo
📦resources: repo , An Ontology Design Pattern for Describing Personal Data in Privacy Policies
Provides an ODP for representing information about personal data within privacy policies
Motivation & Scope
The rest of this paper is structured as follows: Section 2 provides a description of the pattern with an example provided in Section 3. Section 4 concludes the paper with a discussion regarding future work.
The pattern aims to answer the following competency questions:
- What personal data is collected? e.g. email
- Does the data have a category? e.g. contact information
- What was its source? e.g. user
- How is it collected? e.g. given by user, automated
- What is it used for? e.g. creating an account, authentication and verification
- How long is it retained for? e.g. 90days after account deletion
- Who is it shared with? e.g. name of partner organisation(s)
- What is the legal basis? e.g. given consent, legitimate use
- What processes/purposes was the data shared for? e.g. analytics, marketing
- What is the legal type of third party? e.g. processor, controller, authority
- How can personal data be rectified or corrected?
- How can personal data be deleted or removed?
- How can a copy of the personal data be obtained?
- How can personal data be transferred to another party?
- How can information about the personal data be obtained?
The pattern uses the GDPRtEXT and GDPRov ontologies for defining concepts relevant to the GDPR. GDPRov is an ontology for describing the provenance of consent and personal data lifecycles using GDPR relevant terminology, and is an extension of PROV-O and P-Plan. GDPRtEXT provides definitions of concepts and terms used within the text of the GDPR using SKOS.
The pattern is available online along with its documentation8 and has been submitted to the ontology design patterns collaborative wiki9.
Concepts & Relationships
A visualisation of the pattern is presented in Fig. 1, and was created using the yEd graph editor10 with the Graffoo  palette.
PersonalDataCategory ⊑ PersonalData (1)
Data is collected through a gdprov:DataCollectionStep, and is represented using the property gdprov:collectsData. The data provider is represented using prov:Agent through the property gdprov:collectsDataFromAgent.
DataCollectionStep ⊑ ≥ 1 collectsData.PersonalData (2)
DataCollectionStep ⊑ ≥ 1 collectsDataFromAgent.Agent (3)
PersonalData ⊑ ∀ hasCollectionMechanism.CollectionMechanism (4)
PersonalData ⊑ ∀ hasDuration.Duration (5)
Data Usage & Processing
Process ⊑ ≥ 1 usesData.PersonalData (6)
Legal Basis for Data Usage
Every use of personal data within a process must have a legal basis under the GDPR. Examples of such legal basis defined within GDPRtEXT include consent, legitimate interest, compliance with the law, and performance of contract. To represent this, the pattern uses the property gdprov:hasLegalBasis with the range gdprtext:LawfulBasisForProcessing. Since every data use must have at least one legal basis, this provides the axiom:
Process ⊑ ≥ 1 hasLegalBasis.LawfulBasisForProcessing (7)
The sharing of data involves the entity the data is shared with, the purposes for sharing, and their legal basis. This is represented within the pattern through the use of gdprov:DataSharingStep and the property gdprov:sharesData. The entity the data is shared with is represented using the gdprov:sharesDataWith property with the domain as gdprov:DataSharingStep and the range as a type of gdprov:Agent, such as another Data Controller, Data Processor, or an Authority. The purpose of sharing is represented using gdprov:Process and the property gdprov:sharesDataForProcess to model the data being used in that process after sharing. The legal basis of processes for which the data is shared is represented using gdprov:hasLegalBasis as specified earlier. Since it is mandatory to inform who the data is being shared with, along with its intended purposes, and the specific legal obligation, we have the following axioms:
DataSharingStep ⊑ ≥ 1 sharesData.PersonalData (8)
DataSharingStep ⊑ ≥ 1 sharesDataWith.Agent (9)
DataSharingStep ⊑ ≥ 1 sharesDataFor.Process (10)
The example use-case is illustrated in Fig. 2 using Graffoo  and shows the classes, properties, and instances. The corresponding code is presented in Listing. 1 using the Turtle12 notation for RDF. The answers to the competency questions corresponding to the use-case are provided below.
- What personal data is collected: Email Address
- Does the data have a category: Account Information
- What was its source: User
- How is it collected: Given by user
- What is it used for: Platform Services, Payments
- How long is it retained for: indefinitely (no end duration)
- Who is it shared with: Payments Controller
- What is the legal basis: Legitimate Interest, Contract
- What processes/purposes was the data shared for: Identity Verification
- What is the legal type of third party: Data Controller
Based on the intended motivation, the pattern provides a way to share the relevant information regarding personal data, and provides further avenues for research regarding similar patterns or meta-patterns related to privacy policies.
We consider our work an initial effort towards consolidating information within privacy policies. Using the pattern to reflect information from several distinct real-world privacy policies will demonstrate its feasibility and applicability in real-world scenarios. This presents a challenge as the pattern currently assumes the presence of all required information which may not be the case for some use-cases, particularly where interpretations of information are ambiguous. However, capturing such ambiguities through a meta-pattern can possibly aid in flagging them for review by legal experts.
In addition to the above, the pattern faces other challenges for the modelling of information it aims to represent. For example, it is not clear what level of abstraction should be represented in the pattern regarding concepts such as storage and sharing. Should there be a DataStorageStep which can be further annotated to represent various pieces of information relating to the storage of personal data? Abstractions can help to represent different storage duration and formats for the same instance of personal data, such as storing the actual data for 6 months while a (pseudo-)anonymised copy is stored for 2 years. However, tacking on such abstractions in to the pattern can make it rigid (in terms of modelling) and complex. More work needs to be undertaken to evaluate whether such abstractions are necessary in the pattern, and how they should be represented.
Another challenge is the representation of storage duration (or retention period). Concrete values such as 6 months or 2 years can be represented using appropriate ontologies, but ambiguous statements are difficult to represent using such ontologies. An example of this is the statement "data may be stored for as long as necessary..." in which there is no end to the duration for storage. Representing this as a time:Duration instance is problematic as there is no clear method to represent its end period. Not defining an end period is also not a solution due to the open world assumption. Our approach towards solving this issue is to abstract the storage activity as described earlier. However, we are open for other approaches and solutions towards this problem.
Some of this information was presented in this paper as additional competency questions. These help evaluate information regarding how the personal data can be changed (rectified), deleted, and obtained (download a copy). Additionally, GDPR allows the data subject to change their consent, thereby affecting the processes involving personal data. Capturing this information is essential towards quantifying the privacy policies into machine-readable data, with the paper demonstrating the suitability of ODP for this task.
Listing 1: Example Use-case in Turtle format presenting Email Address as an instance of personal data along its collection, storage, and sharing
@prefix dct: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix gdprov: <http://purl.org/adaptcentre/openscience/ontologies/gdprov#> . @prefix gdprtext: <http://purl.org/adaptcentre/openscience/ontologies/GDPRtEXT#> . @prefix : <http://example.com/personaldata#> . :PaymentProcess a gdprov:DataSharingStep ; rdfs:label "Payment Process"^^xsd:string ; gdprov:sharesData :EmailAddress ; gdprov:sharesDataForProcess :IdentityVerification ; gdprov:sharesDataWith :PaymentsController . :PlatformServices a gdprov:Process ; rdfs:label "Provide, Improve, and Develop Platform"^^xsd:string ; gdprov:hasLegalBasis gdprtext:LegitimateInterest ; gdprov:usesData :EmailAddress . :Registration a gdprov:DataCollectionStep ; rdfs:label "Registration for new users"^^xsd:string ; gdprov:collectsData :EmailAddress ; gdprov:collectsDataFromAgent :User ; gdprov:hasCollectionMechanism gdprtext:GivenByUser . :AccountInformation a rdfs:Class, owl:Class ; rdfs:label "Account Information of an User"^^xsd:string ; rdfs:subClassOf gdprov:PersonalData . :IdentityVerification a gdprov:Process ; rdfs:label "Identity Verification"^^xsd:string ; gdprov:hasLegalBasis gdprtext:Contract ; gdprov:usesData :EmailAddress . :PaymentsController a gdprov:Controller, prov:Agent ; rdfs:label "Payments Controller"^^xsd:string . :User a gdprov:DataSubject, prov:Agent ; rdfs:label "User of Service"^^xsd:string . :EmailAddress a :AccountInformation, :PersonalData ; rdfs:label "Email Address"^^xsd:string .
This work is supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
 G. Contissa et al., “Claudette Meets GDPR: Automating the Evaluation of Privacy Policies Using Artificial Intelligence,” 2018.
 B. Fabian, T. Ermakova, and T. Lentz, “Large-scale Readability Analysis of Privacy Policies,” in Proceedings of the International Conference on Web Intelligence, 2017, pp. 18–25, doi: 10.1145/3106426.3106427 [Online]. Available: http://doi.acm.org/10.1145/3106426.3106427. [Accessed: 15-Apr-2018]
 C. Jensen and C. Potts, “Privacy Policies As Decision-making Tools: An Evaluation of Online Privacy Notices,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2004, pp. 471–478, doi: 10.1145/985692.985752 [Online]. Available: http://doi.acm.org/10.1145/985692.985752. [Accessed: 02-May-2018]
 A. Oltramari et al., “PrivOnto: A semantic framework for the analysis of privacy policies,” Semantic Web, vol. 9, no. 2, pp. 185–203, Jan. 2018, doi: 10.3233/SW-170283. [Online]. Available: http://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/SW-170283. [Accessed: 15-Apr-2018]
 H. J. Pandit, K. Fatema, D. O’Sullivan, and D. Lewis, “GDPRtEXT - GDPR as a Linked Data Resource,” 2018, p. 14.
 H. J. Pandit and D. Lewis, “Modelling Provenance for GDPR Compliance using Linked Open Data Vocabularies,” in Proceedings of the 5th Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017) (PrivOn), 2017 [Online]. Available: http://ceur-ws.org/Vol-1951/#paper-06
 R. Falco, A. Gangemi, S. Peroni, D. Shotton, and F. Vitali, “Modelling owl ontologies with graffoo,” in The semantic web: ESWC 2014 satellite events, 2014, pp. 320–325.
The impact of GDPR on the readability of privacy policies is yet to be determined↩︎