Personalised Privacy Policies
Workshop on Technical and Legal aspects of Data Privacy and Security (TELERISE) - co-located with European Conference on Advances in Databases and Information Systems (ADBIS)
✍ Harshvardhan J. Pandit* , Declan O'Sullivan , Dave Lewis
publication 🔓copies: TARA , zenodo
📦resources: code , slides
How privacy policies can be personalised to the individual for more relevance and transparency.
Under the General Data Protection Regulation (GDPR) , data subjects are provided the right to information about their personal data. Service providers (Controllers) are required to provide this information to the users (Data Subjects) upon request, which necessitates some technical implementation capable of recording and providing the required information. Such an implementation must be capable of distinguishing individual requests from each data subject and providing only the required information pertaining to that particular individual.
The related work is presented in two sections. The first, Section 2.1, presents work related to the systematic studies and categorisation of privacy policies. This work is relevant towards understanding the composition of information in privacy policies, and how it can be extracted and represented. The second, Section 2.2, presents work relevant to the visualisation of information associated with GDPR rights. This work is relevant towards understanding what information is required to be presented to the user and the various approaches associated with it.
Study of Privacy Policies
Visualising information for GDPR rights
Dynamic Metadata in Privacy Policies
We focus on change in information describing personal data collection, storage, usage, sharing, and deletion. At the time of writing this paper, GDPR has not yet entered into force, and few organisations have public policies related to the provision of various rights. We discuss our work and approach using privacy policies publicly provided by Airbnb Ireland3 and Twitter,4 with archived copies made available5 in case of changes to the policy in future. We selected these examples due to their prominence as known commercial enterprises and their suitability for purposes of this research.
Structure of Information
The analysis of these policies requires identification of what information may change or is ambiguous and could be resolved using information provided from resolution of GDPR rights. For this purpose, the selected examples of privacy policies have a suitable structure which is ordered into contextual sections. Such organisation of information not only helps the reader better understand and navigate information, it also helps in categorising the different types of information represented within the policy. We primarily discuss this structure in relation to the policy provided by Airbnb Ireland, though the discussion is also applicable to the policy from Twitter.
The policy follows a very structured approach towards presenting information to the user. The broad sections of the policy provide information about data collection, usage, sharing, and rights. These are further classified based on the context of activity. We focus on the first section which deals with data collection (termed “Information we collect” in the policy). The policy provides two sources of data - collected directly from the data source (section 1.1 and 1.2) and obtained from third parties (section 1.3). The information collected from the data subject is further categorised based on whether it is necessary (section 1.1.1 and 1.1.3) or opt-in (section 1.1.2). Information about the nature of the data collection mechanism is also provided, whereby some of it is collected via automated systems (section 1.2). This structure is presented visually in [fig:data-categories].
While the above information reflects the structure of the policy, the contents within each section provide information about the personal data involved. For example, the information in section 1.1.1 describes the categories of data involved. Each category is further described with the specific types of data that fall under it. For example, Account Information is the first category within the section, which contains information about data types such as first name, last name, email address, and date of birth. Additionally, the sentence also mentions the specific process (account sign-up) used to collect this information.
This information is distinct from the earlier structuring of information in that it can change (is dynamic) based on the operation and provision of services. For example, it is possible that additional information such as nationality may be added as essential account information in the future. In such a case, it will be listed along with the other data types under the “Account Information” data category. Similarly, the mechanism for data collection may change as well to some other new or existing process or step.
We distinguish between metadata representing the ‘structure’ of information and the representation of the underlying system. While the former will be common to all services and policies, the latter reflects information specific to organisation or service (and to the data subject). From the example, all privacy policies will have a section for describing the data categories, but the specific categories mentioned within the policy are unique and associated with the organisation and service it provides, and is updated based on changes to the system and operations. We term such information as ‘dynamic metadata’ to reflect this.
Storage and Representation of Metadata
For our work, we focus on the use of semantic web technologies due to their open and extensible nature. For representing the metadata related to processes and the data they use, we use the GDPRov ontology  which extends PROV-O10 and P-Plan11. PROV-O is a W3C recommendation, which provides interoperability of provenance information. P-Plan is an extension of PROV-O that allows representation of abstract workflows. For annotating information with concepts and terms from the GDPR, we use the GDPRtEXT resource .
We describe here a more detailed technical description of the implementation of this system to demonstrate the particular use-case.
html <body vocab="http://example.com/use-case" prefix="gdprov: http://purl.org/adaptcentre/openscience/ontologies/gdprov# rdfs: http://www.w3.org/2000/01/rdf-schema#"> <p resource="#AccountInfo"> <span property="rdfs:label">Account Information</span></p> <ul> <li><label resource="#first-name" typeof="gdprov:PersonalData #AccountInfo"> <span property="rdfs:label">First Name</span> </label></li> </ul> </body>
The model of the system is defined using GDPRov and GDPRtEXT to represent activities and how they interact with personal data. This is stored as RDF data in a triple store.
As data subjects or users use and interact with the system, relevant metadata is stored using GDPRov in the triple store as RDF data.
The work described in this paper has broader applications apart from personalising privacy policies such as addressing various rights and access requests (such as for GDPR) and to automate other similarly structured documents.
Address GDPR Rights and SARs
Automate Reports and Documentation
Documentation related to compliance and other processes is often structured and refers to information in a specific way. A similar approach as the one described in this paper where stored metadata is used to dynamically populate a structured document can be used to automate this process. This can be used for generating reports that describe the various processes and how they relate with personal data based on the underlying model of the system. It can also be extended to create various technical reports regarding the use of internal processes.
Conclusion & Future Work
The primary future work is the implementation of such a personalised policy using the approaches and technologies described in this paper. With the advent of GDPR, we expect to see more examples of similarly structured privacy policies, which will need to be analysed to identify relevant metadata. This also presents an opportunity to assess the information provided by various organisations as part of the various rights and SARs; and to modify this work to better reflect real-world use-cases.
This work is supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.