The RISKY Project

Exploring privacy risks of technologies using knowledge graphs
by Harshvardhan J. Pandit
is about: Risky
consent knowledge-graph privacy risk


Privacy is an important topic as rapid technological development risks misuse of personal data at a large scale. Laws such as General Data Protection Regulation (GDPR) enforce responsible use of personal data in technologies through the process of Privacy Impact Assessment (PIA) or Data Protection Impact Assessment (DPIA) where organisations must investigate potential risks and impact to individuals. Conducting and auditing DPIAs is challenging for stakeholders (organisations, individuals, and regulators) due to lack of resources for investigating and reusing existing for privacy risks and mitigations.

A DPIA involves identifying parameters (technologies, scale, automation, domain) affecting a scenario and exploring privacy risks and mitigations. Stakeholders (companies, individuals, regulators) have the difficult task of investigating impacts of use-cases for which relevant resources are not available or difficult to use, with existing sources (news articles, research publications) rarely exploring potential application of prior work in another domain due to lack of domain knowledge about the relationships between them. Combined with the possible use of novel technologies or uncertainty in its application, existing work regarding privacy risks and mitigation presents challenges for adoption and reuse due to lack of domain knowledge.

At the same time, a large body of work already addresses 'privacy' as a concept as well as a domain. Existing approaches regarding privacy taxonomies provide structured representations of terms and are based on specific domains or sociological viewpoints. Some of the seminal work in this area includes taxonomies developed by Solove, Wilson, and the work surrounding contextual integrity by Nissembaum. These provide the foundational work in developing privacy frameworks based on the notion of risk. In addition to this, domain-specific taxonomies have also been developed for mHealth, online services and websites, as well as standardisation efforts such as P3P and ISO/IEC 29100. Formalisation of privacy concepts in the form of ontologies has also seen work by Sacco et al., along with those addressing legal compliance with privacy laws such as GDPR which includes H2020 projects SPECIAL, MIREL, and standardisation efforts such as Data Privacy Vocabulary. There has also been work on the analysis of privacy policies, including automated extraction and representation of information for understandability. There also exists investigative work around identifying attack vectors for unauthorised access to data in the cloud. From the sociological body of work, arguments by Lyon explore the relationship between surveillance and digitisation of services.

A preliminary analysis of such existing work reveals the breadth of approaches and variance in the intended meaning of privacy as used within the state of the art. Most work published as addressing privacy in reality addresses only a particular aspect of privacy, which is most commonly security or protection of information, or legal compliance with privacy laws such as GDPR. Furthermore, approaches within a domain are isolated in their modelling of privacy and do not incorporate or relate to other domains even though both share the context of modelling privacy. One potential cause could be the difficulty in understanding and adopting work from another domain or use-case, as well as the inability to understand and apply it to a given scenario. Thus, even though the privacy domain sees a large amount of work, there is very less cross-domain utilisation of knowledge.

The RISKY Solution

The solution to this situation is to provide a way for stakeholders to identify risks associated with their scenarios and use-cases - while simultaneously addressing the challenge of existing literature and resources being siloed into domain-centric discourses. Such a solution must be able to identify the applicability of an existing risk based on its relation to the situations context such as technologies, participants, and regulations. In addition to this, the situation must be future-proof in preparation for future technologies and risks as they arise.

The RISKY project aims to enable stakeholders to explore privacy risks and mitigations through a knowledge graph that stores direct and indirect relationships between parameters identified in existing work and identifies relevant applications for given scenarios. These parameters consist of concepts relevant to assessment of privacy risks (e.g. technology, scale, automation, AI) and are associated with risks through relationships between them. For example, exploring risks of using AI in healthcare devices involves breaking down the scenario into domain (healthcare) and technology (AI) and identifying relevant mitigations in the knowledge graph.

The construction of knowledge graph will involve researching relationships between privacy risks and parameters through analyses of news articles and research publications. The knowledge graph will use open and interoperable standards for information representation (RDF/OWL2) and retrieval (SPARQL) to encourage adoption and community contribution. The project will reuse and consolidate existing privacy taxonomies and mitigation methods into a knowledge graph, and apply it for DPIA using my PhD work regarding GDPR compliance and data privacy vocabularies.

The project intends to consolidate the privacy taxonomies from state of the art into a knowledge graph by using ontologies to represent information. This will address the lack of resources for exploring privacy risks in a scenario, and more importantly, for adopting existing work within the privacy domain to relevant and emerging areas, particularly regarding sensitive use of technologies. Furthermore, the intended application of assisting with the DPIA process also fills an important gap as there is currently no work connecting the DPIA requirements with work within the privacy domain. The methodologies and best practices developed by the community will be utilised for this and will provide quality indicators for use in evaluating the work.

The novel aspects of the project include a comprehensive ontology that can be used to relate existing work across different areas of privacy risks and mitigations by identifying commonalities between them, and thus facilitating their reuse across scenarios. This will be a step above the state of the art by providing a framework on which future privacy research can be conducted by leveraging prior work and finding interesting applications of existing risk mitigations for new technological challenges. In particular, the queries to retrieve related risks and mitigations will demonstrate how existing work can be broken down into components and reused for other use-cases, thereby providing motivation to construct such knowledge bases for other areas.

Stakeholders will be provided with an online service where specifying concepts associated with a use-case will retrieve relevant privacy risks and mitigation methods from the knowledge graph. The project will help reduce costs associated with data protection, particularly for SMEs and government organisations. While the project specifically addresses DPIA requirements of GDPR, the research and its outcomes will be useful in the exploration and reuse of existing privacy risk mitigations across domains and legislations.

Aims, Goals, and Outcomes

The main aim of this project is to assist stakeholders in conducting DPIAs by exploring privacy risks and mitigation measures through the use of a knowledge graph. Following this, the central research question is - "To what extent can a knowledge graph of privacy risks and mitigation methods assist stakeholders in conducting a DPIA?"

The answer to this enables understanding whether existing work on privacy risks can be represented and explored as a knowledge-graph, and whether existing privacy risk mitigations can be applied across domains and use-cases using relationships between them. The research also enables exploration of existing privacy risks in connection with innovative future technologies.

The project contains the following goals:

  1. Formalise privacy risks and mitigation methods as a knowledge graph
  2. Enable identification of privacy risks for given scenario from existing resources stored in knowledge graph
  3. Enable stakeholders to explore privacy risks and mitigation measures through an online service

Through this process, RISKY aims to produce the following deliverables:

  1. Privacy Risks Management Ontology
  2. Knowledge-graph of privacy risks and mitigation measures
  3. Queries for retrieving relevant privacy risks and mitigations based on parameters of given scenario
  4. Online service for exploration of privacy risks and mitigation measures

Committment to Data Quality and Open Access

  • Data, in terms of results and resources (published and unpublished), produced as part of this project will be released under an open and permissive license (CC-by 4.0) on open repositories such as Zenodo, which provide a DOI. In addition, the source code for developed tools and services will be shared via open repositories such as GitHub under an open license (MIT, Apache v2, LGPL v3).
  • Quality of data will be ensured by adopting FAIR metrics, and where possible, Linked Data Principles. In addition, ontologies developed will be documented and provided online with accessible IRIs to enable their intended use by the community.
  • The source code and resources for the services will be provided on public repositories such as Github and Zenodo along with implementation documentation, so that stakeholders can host it on their own servers.
  • All publications will be deposited in open access archives and repositories such as TARA (Trinity‚Äôs research archive) and Zenodo.

Funding Acknowledgements

RISKY starts on 01-OCT-2020 and will run for 2 years (30-SEP-2022). RISKY has been funded under the Irish Research Council Government of Ireland Postdoctoral Fellowship award #GOIPD/2020/790 and will be conducted at the ADAPT SFI Research Centreat Trinity College Dublin. The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant #13/RC/2106.