Towards Knowledge-based Systems for GDPR Compliance

Full Paper, peer-reviewed
Workshop on Contextualized Knowledge Graphs (CKG) - co-located with International Semantic Web Conference (ISWC)
Harshvardhan J. Pandit* , Declan O'Sullivan , Dave Lewis
publication 🔓copies: TARA , zenodo
📦resources: slides
Discussing creation of knowledge-based systems for compliance and information management regarding GDPR

Abstract Legal compliance is traditionally seen to be sufficiently demonstrable using legal documents that describe how various operations and activities follow a given set of obligations. The General Data Protection Regulation (GDPR) enforces larger responsibilities upon organisations and provides motivation for the use of technological measures that can ease its compliance. While there is no legal requirement to collaborate on compliance technologies or to use a common mechanism for defining knowledge, doing so has several benefits to the larger community. Through this paper, we describe how open and shared technologies targeted towards GDPR and its compliance can be used to create knowledge-based systems. Our approach uses semantic web technologies due to their open and flexible nature towards describing concepts and relationships. We present a model for such a knowledge-based system along with work published to date.

Introduction

The General Data Protection Regulation (GDPR) is an European data protection legislation that introduces changes to the way consent and personal data need to be managed by organisations. A large part of the motivation towards efficient adoption of the regulation is the significant amount of fines that could be levied for non-compliance. In this regard, solutions towards its compliance have seen a large amount of interest in the industrial as well as academic community.

Semantic Web provides a common base of technologies and data representation formats that are both open and expressive. Their adoption can aid in the building of common solutions and foster interoperability by virtue of commonly understood knowledge forms. By using the semantic web to combine compliance related data, it is possible to develop knowledge-based systems that can cater to a large area of compliance tasks based on commonality in requirements. In this paper, we present work done to date towards such a knowledge-based system.1

Work done to date

We have worked [1] on exploring the information flows between different organisations in the context of the GDPR with the goal of identifying a data model for GDPR-related interoperability. We identified entities and the nature of relationships between them using an analysis of the text of the GDPR to categorise relevant articles based on points of interactions between the information flows. Through this, we identified five information categories, which are provenance, data sharing agreements, consent, certification, and compliance along with the dependencies between them. We also presented an evaluation of the available standards based on maturity and recommendation for representation of identified information categories.

To date, we have developed and published representations for provenance called GDPR Provenvanve Ontology (GDPRov) [2] for describing the provenance of consent and data lifecycles using GDPR terminology. We also have investigated possible approaches towards representations for consent [3] and data sharing agreements called Data Protection Rights Language (DPRL) [4]. GDPRtEXT [5] provides a way to refer and link information related to specific articles, terms, and concepts within the GDPR in a machine-readable manner.

This is a crucial aspect towards our aim in building a knowledge-based system. Information representations for certification and compliance are part of our planned future work.

Knowledge-based System for Compliance

We primarily express knowledge in the form of RDF triples expressed using suitable OWL ontologies. It is stored within a triple-store, with querying provided by SPARQL2. The knowledge-based system and its infrastructure is depicted in Fig.1, and is based on the consent and data management model previously published [3]. Depending on the requirements of usage, access control mechanisms [6] can be used to ensure authorised accesses for security purposes.

Fig.1 An overview of the knowledge-based model for GDPR compliance

In the context of GDPR compliance, the knowledge base stores facts, assertions, records, and logs pertaining to tasks associated with the maintenance and demonstration of GDPR obligations. The five categories of information, mentioned previously, are used to categorise the information stored within the knowledge base. The information represented by the categories comes from the following data sources:

Data Subject: The data subject provides consent and personal data, for which the knowledge base would store information related to how the consent and data were acquired, and record subsequent changes to consent. Along with this, the exercising of rights would also be recorded as actions involving the data subject.
Data Controller: The bulk of information in the knowledge base relates to and comes from Data Controllers and Data Processors. This includes information about the various activities associated with consent and personal data such as collection, storage, sharing, archival, and deletion. This information includes provenance metadata about the activities and how they interact with data, including the provision of various rights and handling data breaches.
Data Processor: A Data Processor acts on the documented instructions of a Data Controller. The knowledge base therefore would contain these instructions in a form that can be queried or combined with other information.
Certification Authority: A Certification Authority awards certifications and seals to organisations based on certain criteria. The criteria and its evaluation mechanisms would be part of the knowledge base for introspection and for demonstration of compliance.
Supervisory Authority: Supervisory Authorities may define additional obligations apart from GDPR towards its compliance. In addition, any communication from or to the Supervisory Authority, such as in the case of data breach, also needs to be stored and maintained for compliance purposes.

Query Interface

We envision a web-based interactive interface that allows users to query and explore its results. The interactive aspect of the interface is important as it allows the user to explore more information about the chosen query result. For example, a query for steps that collect consent returns results as a list of items. The user can then click on an item to get more information about that particular step, such as whether it is part of a larger process, or what version of terms and conditions it uses. Having interactive systems allows information to be combined in more dynamic ways, which leads to better interfaces for the underlying knowledge base. The interface would act to allow users to specify SPARQL queries without knowing the technical complexities of the underlying system.

Inference Engine

The quantification of GDPR obligations into inference rules is a complex task. One possibility we intend to explore is the use of SHACL3, which is a constraint expression language for RDF, to define sets of constraints related to GDPR obligations. This allows the system to check whether the required set of information is present in order for higher-order rules to be executed. For example, using SHACL, it is possible to check whether steps for sharing personal data always have reference to a valid consent or a legal basis as justification. The task of determining whether the sharing itself is compliant with the given consent can then be evaluated using other forms of rule-based inference such as using SWRL4 with the assumption that all required knowledge exists. This allows inferencing compliance based on constraints while ensuring the data itself is present in the required and correct format.

Linking knowledge using GDPRtEXT

The information in the knowledge base coming from different sources would have differing identifiers and may not be related to the required GDPR concepts. Additionally, defining compliance-related information requires a way to uniformly refer to GDPR obligations so that it can be analysed, queried, and retrieved effectively. GDPRtEXT provides a ‘glue’ layer for the linking of related information using GDPR concepts. For example, information related to handling the right to data portability can use the appropriate GDPR terms and concepts to state their relation to this obligation. Queries and results can then retrieve this information to display the intended actions to be taken in the model, the log of what actually happened upon requests, as well as the inference engine’s compliance information using the same concepts and terms as mentioned in the text of the GDPR. In future, we plan to extend the list of terms and concepts, as well as to create additional resources for defining compliance-related terms and concepts specific to the GDPR. The use of GDPRtEXT along with other components of the knowledge-base is provided as an overview in Fig.2.

Fig.2 Semantic Web Technologies used in Knowledge Base

Applications

The nature of a knowledge-based system changes based on who the intended user(s) are. If the system is targeted towards data subjects, its aim will be to provide information about their personal data and consent, and how it is being collected and used. If the system is developed for controllers and processors, its use will be to assist in the management of compliance information. This involves checking whether the controller or processor fulfils certain obligations such as having systems in place for handling of data breaches and various rights, as well as providing exploration of the consent and data lifecycles within activities. For instances where privacy or access is a concern, only the metadata can be stored in the system. We describe specific use-cases of such applications below for controllers and processors.

Automated Compliance Checks: Due to the significant amount of potential fines under the GDPR, the maintenance of compliance is an essential activity for organisations. A system that can assist in this process must be scalable to handle a large number of data subjects, which can only be done efficiently through automation of most of its tasks. The knowledge-based system described in this paper provides for such automation through its machine-readable data and query system. Additionally, it is possible to record the entire process and show that due diligence was taken when important changes were made to the system as part of the DPIA process mentioned within the GDPR.

Compliance Documentation: Generation of compliance documentation will be an important activity under the GDPR. Additional information may need to be queried or accessed as part of this process that can sufficiently demonstrate adherence to obligations. For example, showing that personal data is not shared without prior consent can be done by using the abstract model of the system where the activities that share data are shown to depend on the permissions specified within the representation of given consent. Using the knowledge-based system, it is possible to express dynamic queries over the obligations, whose results can be used as a form of compliance documentation. Therefore, the system can help an organisation show adherence to relevant obligations of the GDPR in a comprehensive manner. A periodic review of such documentation by the organisation itself can help in the requirement for periodic assessment of compliance.

Related Work

Ontologies An initial work [7] addressed a draft version of the GDPR and presented an OWL2 ontology for data controller duties from GDPR obligations which can be used to structure compliance related information.
Impact Assessment & Visualisation There are existing works that address Data Protection Impact Assessment [8] and Privacy Impact Assessment [9]. Both aim to provide a methodology and a template for assessments in the context of GDPR. There has been work on creating interactive dashboards [10] for data subjects that can show the information flows of their consent and personal data as well as provide features for the handling of various rights. Visualisation has also been applied for representing contracts [11] and legal rules [12]. These are useful as requirements for querying of information within the knowledge base.
Smart Contracts There has been work on developing smart contracts [13] for data sharing agreements between organisations. Such smart contracts can be self-fulfilling and can be automated. The use of Artificial Intelligence techniques [14] has also been explored towards supporting the compliance process.
SPECIAL project The Scalable Policy-aware Linked Data Architecture For Privacy, Transparency and Compliance (SPECIAL) project5 is an European H2020 project that aims to provide a technical solution involving big-data innovation and privacy-aware data protection. Apart from the publicly available deliverables6 that describe their findings and reports to date, they have also published their work on building a compliance model for GDPR [15], [16]. Our work will be influenced by their approach of modelling consent and compliance as a set of verifiable components, with a focus on query answering.
Knowledge Graphs Building legal knowledge graphs has also seen work in the areas of multilingual services [17]. Such knowledge graphs are expected to assist in the provision of compliance by/through design [18] for them to integrate efficiently into existing legal workflows. An overview of semantic web technologies in the areas of privacy, security, and policies published in the semantic web domain [6] discusses the various problems along with potential solutions and approaches. These are influential for the work discussed in this paper in terms of practical approaches and concerns.

Conclusion

GDPR presents motivation and opportunities to apply technological solutions for the compliance of legal obligations. Through this paper, we presented our approach towards building a shared knowledge-based system to assist in compliance related tasks using semantic web technologies. The knowledge-base is based on a GDPR model previously published, and is designed based on the identified information from our work on GDPR interoperability model. The representation of the knowledge is discussed through our work published to date regarding metadata representations for provenance, consent, and data sharing agreements in the context of GDPR. The paper discusses the approach towards implementing the presented knowledge-base based on its data sources, inference engine, and usability in a query interface. The paper also discusses potential applications of the knowledge base in automating compliance checks and generating compliance documentation. For future work, we look towards implementing a proof-of-concept knowledge-base from a real-world data to demonstrate the feasibility of the approach.

Acknowledgements

This work is supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

References

  1. H. J. Pandit, D. O’Sullivan, and D. Lewis, “GDPR Data Interoperability Model,” in 23 rd EURAS annual standardisation conference (in-press), 2018 [Online]. Available: http://purl.org/ADAPT/pub/E18EURAS
  2. H. J. Pandit and D. Lewis, “Modelling Provenance for GDPR Compliance using Linked Open Data Vocabularies,” in Proceedings of the 5th Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017) (PrivOn), 2017 [Online]. Available: http://ceur-ws.org/Vol-1951/#paper-06
  3. K. Fatema, E. Hadziselimovic, H. J. Pandit, C. Debruyne, D. Lewis, and D. O’Sullivan, “Compliance through Informed Consent: Semantic Based Consent Permission and Data Management Model,” in Proceedings of the 5th Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017) (PrivOn), 2017 [Online]. Available: http://ceur-ws.org/Vol-1951/#paper-05
  4. E. Hadziselimovic, K. Fatema, H. J. Pandit, and D. Lewis, “Linked Data Contracts to Support Data Protection and Data Ethics in the Sharing of Scientific Data,” in Proceedings of the First Workshop on Enabling Open Semantic Science (SemSci), 2017, pp. 55–62 [Online]. Available: http://ceur-ws.org/Vol-1931/#paper-08
  5. H. J. Pandit, K. Fatema, D. O’Sullivan, and D. Lewis, “GDPRtEXT - GDPR as a Linked Data Resource,” in 15th european semantic web conference (in-press, 2018 [Online]. Available: http://purl.org/ADAPT/pub/E18ESWC_GDPRtEXT
  6. S. Kirrane, S. Villata, and M. d’Aquin, “Privacy, security and policies: A review of problems and solutions with semantic web technologies,” Semantic Web, vol. 9, no. 2, pp. 153–161, Jan. 2018, doi: 10.3233/SW-180289. [Online]. Available: https://content.iospress.com/articles/semantic-web/sw289. [Accessed: 18-Apr-2018]
  7. C. Bartolini and R. Muthuri, “Reconciling Data Protection Rights and Obligations: An Ontology of the Forthcoming EU Regulation,” in Workshop on language and semantic technology for legal domain, 2015.
  8. F. Bieker, M. Friedewald, M. Hansen, H. Obersteller, and M. Rost, “A Process for Data Protection Impact Assessment Under the European General Data Protection Regulation,” in Privacy Technologies and Policy, 2016, pp. 21–37, doi: 10.1007/978-3-319-44760-5_2.
  9. J. Reuben, L. A. Martucci, S. Fischer-Hübner, H. S. Packer, H. Hedbom, and L. Moreau, “Privacy Impact Assessment Template for Provenance,” in Availability, Reliability and Security (ARES), 2016 11th International Conference on, 2016, pp. 653–660.
  10. C. Bier, K. Kühne, and J. Beyerer, “PrivacyInsight: The Next Generation Privacy Dashboard,” in Privacy Technologies and Policy, 2016, pp. 135–152, doi: 10.1007/978-3-319-44760-5_9.
  11. S. Esayas, T. Mahler, and K. McGillivray, “Is a Picture Worth a Thousand Terms? Visualising Contract Terms and Data Protection Requirements for Cloud Computing Users,” in Current Trends in Web Engineering, 2016, pp. 39–56, doi: 10.1007/978-3-319-46963-8_4.
  12. S. Seppala, M. Ceci, H. Huang, L. O’Brien, and T. Butler, “SmaRT Visualisation of Legal Rules for Compliance,” in Proceedings of the 1st workshop on technologies for regulatory compliance co-located with the 30th international conference on legal knowledge and information systems (JURIX 2017), 2017.
  13. M. Corrales, P. Jurcys, and G. Kousiouris, “Smart Contracts and Smart Disclosure: Coding a GDPR Compliance Framework,” Social Science Research Network, Rochester, NY, SSRN Scholarly Paper ID 3121658, Feb. 2018 [Online]. Available: https://papers.ssrn.com/abstract=3121658. [Accessed: 18-Apr-2018]
  14. J. Kingston, “Using artificial intelligence to support compliance with the general data protection regulation,” Artificial Intelligence and Law, vol. 25, no. 4, pp. 429–443, Dec. 2017, doi: 10.1007/s10506-017-9206-9. [Online]. Available: https://link.springer.com/article/10.1007/s10506-017-9206-9. [Accessed: 18-Apr-2018]
  15. S. Agarwal, S. Steyskal, F. Antunovic, and S. Kirrane, “Legislative compliance assessment: Framework, model and GDPR instantiation,” in Privacy technologies and policy - 6th annual privacy forum, APF 2018, barcelona, spain, june 13-14, 2018, revised selected papers, 2018, pp. 131–149, doi: 10.1007/978-3-030-02547-2_8 [Online]. Available: https://doi.org/10.1007/978-3-030-02547-2_8
  16. S. Kirrane et al., “A scalable consent, transparency and compliance architecture,” in The semantic web: ESWC 2018 satellite events - ESWC 2018 satellite events, heraklion, crete, greece, june 3-7, 2018, revised selected papers, 2018, pp. 131–136, doi: 10.1007/978-3-319-98192-5_25 [Online]. Available: https://doi.org/10.1007/978-3-319-98192-5_25
  17. E. Montiel-Ponsoda, V. Rodríguez-Doncel, and J. Gracia, “Building the Legal Knowledge Graph for Smart Compliance Services in Multilingual Europe,” in Proceedings of the 1st workshop on technologies for regulatory compliance co-located with the 30th international conference on legal knowledge and information systems (JURIX 2017), 2017.
  18. W. Mayer, P. Casanovas, and M. Stumptner, “Semantic Workflows in Law Enforcement Investigations and Legal Requirements,” in Proceedings of the 1st workshop on technologies for regulatory compliance co-located with the 30th international conference on legal knowledge and information systems (JURIX 2017), 2017.

  1. Copyright © 2018 for this paper by its authors.
    Licensed under CC-by-4.0 (https://creativecommons.org/licenses/by/4.0/)↩︎

  2. https://www.w3.org/TR/sparql11-query/↩︎

  3. https://www.w3.org/TR/shacl/↩︎

  4. https://www.w3.org/Submission/SWRL/↩︎

  5. https://www.specialprivacy.eu/↩︎

  6. https://www.specialprivacy.eu/publications/public-deliverables↩︎