Relevant Research Questions For Decentralised (Personal) Data Governance
to be presented Trusting Decentralised Knowledge Graphs and Web Data Workshop (TrusDeKW) - co-located with Extended Semantic Web Conference (ESWC)
✍ Anelia Kurteva* , Harshvardhan J. Pandit*
This article outlines several relevant questions from legal, privacy and technology standpoints that need to be considered regarding lawful decentralised data processing.
Abstract Protecting and preserving individuals' personal data is a legal obligation set out by the European Union's General Data Protection Regulation (GDPR). However, the process of implementing data governance to support that, in a decentralised ecosystem, is still vague. Motivated by the need for lawful decentralised data processing, this paper outlines several relevant questions from legal, privacy and technology standpoints that need to be considered.
The rapid growth of the data economy and the privacy implications accompanying it have motivated a new paradigm shift towards decentralisation of data on the Web. This digital transformation aims to foster data sovereignty and interoperability and to empower individuals by allowing them to take back control over their personal data. Decentralised technology such as SOLID  has shown promising results and has slowly started to replace centralised digital infrastructures in several organisations across the European Union (EU). Motivated by the need for lawful decentralised data processing, this paper outlines several relevant questions from legal, privacy and technology standpoints that need to be considered.
How to describe and catalogue data for decentralised interoperability?
In a decentralised ecosystem, resources (e.g. data) should be described in a way that supports their interoperability by different services and machines to facilitate their discoverability, reuse and correct interpretation of their use policies. Utilising a consistent RDF vocabulary such as the Data Catalog Vocabulary (DCAT) 1, which is a W3C recommendation for describing datasets and online services in a machine-readable format, can be a starting point. DCAT’s support for semantically representing resources, their role within a system, the associated agents and the ability to classify them in catalogues based on themes can help structure decentralised data sharing and direct a service to a specific available resource that can be used for the specific purposes. To use any such vocabulary to describe a decentralised resource it should be clear what data exists in the resource and its availability in terms of what agent, when and for what purpose can use it and it what way. Mechanisms that automatically ensure resource’s quality and completeness (e.g. specification of its availability) based on the set standard resource description format are needed as well.
How to establish, trust, and verify identities in a decentralised system?
Merely using the Web’s domain-based identity may not be sufficient or even feasible in all cases. For example, cases where identities may always need to be known - such as a company’s legal identity for accountability purposes, while in other contexts the identity may need to be hidden - such as to create a safe space for marginalised communities that use pseudonyms or identifiers instead of their real names. The issue of how to issue and manage identity useful for ‘contextual identification’ therefore also becomes an issue of trust to show or hide identities, to not misuse it, counter malpractices such as fraud - without surveillance or exposing sensitive information regarding private lives.
How to identify and ensure security of data and processing in decentralised systems?
Decentralised systems distribute the responsibilities for security mechanisms to be ensured and enforced across three levels. First for data - which could be encrypted, hidden from discovery, or be spread across locations. Second for data storage and transfer infrastructure, such as through encrypted communications or access control. Third in the secure processing of data, where involvement of multiple systems and actors establishes requirements for each actor to identify and ensure security of data and processing taking place elsewhere to avoid to detect lapses in security, such as failure to validate correctness or data breaches. While decentralisation reduces the scale for affected data, it increases the severity as all sensitive data relating to a context or individuals would be present within the single breached resource. Establishing accountability is a challenge under the current cybersecurity and legal frameworks due to lack of precedent and knowledge.
How to support individuals in making sense of decentralised data sharing?
How to establish responsibility and foster accountability across actors in decentralised settings?
Currently in centralised systems service providers are responsible for storing and processing individuals’ data in a legally compliant way. In a decentralised system, data subjects are given control and ownership of their data, which can be a burden . The responsibilities of agents should be clearly defined, agreed upon and described within each resources’ metadata to establish accountability and transparency. For example, each resource can be catalogued and licensed (e.g. use of technology such as Data Licenses Clearance Center (DALICC) ). Machine-readable contracts ) and consent , outlining each actors’ duties and responsibilities, can also be defined with specific service providers to minimise consent fatigue.
How to balance the legal obligations and responsibilities for decentralised actors?
Regulatory frameworks such as the GDPR are based on the conventional notions of centralised organisations collecting and managing data (as Controllers) that may utilise other actors to process it on their behalf (as Processors). The interpretation of such regulations towards decentralised solutions is unknown, which creates uncertainty, which ultimately hinders progress. Blindly charging forth with innovation may therefore end up not only harming the individuals involved, but also the service providers who want to develop new markets. A pragmatic and proactive solution therefore is to create solutions that function within the existing established boundaries of law, while developing new interpretations or legislations to facilitate further decentralisation. For example, there are no circumstances where users shouldering all the legal responsibilities of being a Controller have greater ‘freedom’, and instead will only face ‘burdens’ and ‘exploitation’ from lack of knowledge or willingness. Therefore a reasonable path forward is to identify mechanisms that either establish responsibilities, such as through model contracts for decentralised infrastructure and service providers, or to share responsibilities, such as through community bargaining and gatekeepers. While these happen, we should also engage with lawmakers and authorities to provide formal guidelines for the same and to develop future legislations. Of note, the European Union has already passed the Data Governance Act and has proposed Data Spaces that advance this conversation.
How to develop infrastructure and tools for decentralised systems?
In order to set up decentralised systems and services, an essential requirement is the availability of necessary infrastructure and tooling. For example, identity providers, data and processing associated resources - such as for storage, querying, computing, etc. - as well as specific tools for developers to create and users to consume and manage these resources. Before researching new methods to achieve the intended functionality, it is also necessary to enquire whether any of the existing tools and services can be reused or repurposed to provide all or some of the requirements. Where the market ecosystem has well established practices based on formal or de-factor standards, its reuse would be beneficial to increase the penetration and adoption of decentralised systems. For example, cloud technologies have reached the stage where they are widespread, are the subject of extensive standardisation, and have regulatory frameworks guiding responsible usage. Can we identify the "innovation" such existing technologies require to realise the decentralised vision and push market actors to developed these based on new markets and values? In parallel, existing infrastructure also has useful governance structures that can aid in resolving some of the pending issues with decentralisation. For example, rather than thinking of decentralisation as separation of independent nodes, we can establish decentralisation as communities where trust of services could be managed with gate-keeping or certification mechanisms such as that used within the app stores. For all the above, the existence of standards or common specifications is not a strict necessity, but will certainly accelerate development and adoption.
What is required to develop effective tools for automation in decentralised systems?
Automation requires machine-readable information, which also needs to be interoperable if it is to be shared between systems. While we are a community that propagates semantic interoperability to achieve decentralisation, the key question to ask ourselves is this: “Can we ever reach an agreement to develop a standard?” for any of the described topics here. While we have a variety of W3C recommendations as standards, and several tools and ontologies - often arising from large projects, we have neither seen their wider adoption and thus effectiveness, nor their acknowledgement as being superior. So the first requirement for the community is identifying what “standards” exist and what standards should exist - and from this creating a roadmap for achieving those. The second requirement is engaging with stakeholders to establish the minimum requirements agreeable to all, and codifying those as a standard to provide a guiding framework for interoperable solutions. The third requirement is then extending this standard with opinionated tools and methodologies to create operational services.
Decentralised systems are not identical replications spread over multiple locations, but instead facilitate diversity and variance while relying on commonality to communicate and inter-operate. Therefore, as long as we have a common vocabulary (e.g. Data Privacy Vocabulary (DPV)2) that can be semantically expressed and whose interpretation is well-defined, we can have decentralised solutions that act in a predictable manner while being free to perform with any technology or tools that they prefer to use. All of the above research questions that we have outlined should therefore be reframed to ask how to reach an agreement on communication of that topic between decentralised systems.
Anelia Kurteva is financially supported by the RePlanIT project funded by a Topsector Energy subsidy from the Ministry of Economic Affairs and Climate Policy in the Netherlands. The author thanks Ruud Balkenende and Alessandro Bozzon for their support and supervision. Harshvardhan J. Pandit’s research was conducted with the financial support of Science Foundation Ireland at ADAPT, the SFI Research Center for AI-Driven Digital Content Technology at Dublin City University 13/RC/2106_P2. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.