Making Sense of Solid for Data Governance and GDPR

Workshop
inviting feedback unpublished work
Harshvardhan J. Pandit*
🔓copies: harshp.com , OSF
This article explores Solid as a new technology, and provides a framework to describe its implementations and use-cases using cloud-technology terminology. It also explores GDPR's application and identifies existing issues and solutions that also apply to Solid.

Abstract

A single paragraph of about 200 words maximum. For research articles, abstracts should give a pertinent overview of the work. We strongly encourage authors to use the following style of structured abstracts, but without headings: (1) Background: place the question addressed in a broad context and highlight the purpose of the study; (2) Methods: describe briefly the main methods or treatments applied; (3) Results: summarize the article's main findings; (4) Conclusions: indicate the main conclusions or interpretations. The abstract should be an objective representation of the article, it must not contain results which are not presented and substantiated in the main text and should not exaggerate the main conclusions.

Introduction

Solid1 is an ongoing effort to decentralise the control of data by moving its storage away from centralised systems and into Pods that are controlled by individuals [1]. The Solid specifications [2] define the implementation of Pods as an architecture containing identity management, access control, and communication. Individuals using the Pod control how and when apps can access data by granting or revoking access at any time. By controlling access to their data within Pods, individuals also gain the ability to (re-)use it elsewhere for competing services or for other features – which are not possible through conventional methods where data is locked and controlled by service providers.

Solid was initiated in 2016 and is led by WWW-inventor Tim Berners-Lee [1]. It has gained interest due to its radical approach to move away from centralisation, lock-ins, and privacy. Solid represents a realisation of the data sovereignty philosophy where individuals ‘control’ their data and how it is used. Since this involves use of personal data, the existing laws regarding data, privacy, and data protection, such as EU’s General Data Protection Regulation (GDPR) [3] also apply to Solid. However, the radical deviation of Solid’s implementations from conventional methods where data is collected and centrally retained by companies has resulted in uncertainty regarding how laws such as GDPR should be interpreted in light of the new use-cases, specifically regarding their sufficiency and potential for non-beneficial implications [4][6].

To further this discussion, we ask the question: “What assumptions from GDPR (and their interpretations) are still valid and applicable for a Solid user?” In order to answer this, first the use of Solid must be understood as defined by Solid’s own specifications. This requires identifying what a Solid Pod is, how it is created, how it is obtained and used by individuals, and how apps interact with the data stored in it. This also requires understanding the possible variations that may emerge as a result of applying existing practices for service provision by market actors, or as part of Solid’s organic push towards re-implementing conventional models. Through these, we should be able to answer who will implement and provide the various resources required by Solid and apps, how will users be involved in these processes, and who retains what degree of control. After establishing the basics of who does what in an implementation, the use-case must be interpreted through the investigative lens of GDPR, where the roles of controllers, processors, data subjects, and third-parties must be assigned to entities in a use-case. Then, the obligations associated with each entity must be assessed for applicability and fulfilment in order to determine compliance. Finally, for ensuring progress and benefits, we need to explore practical implications arising from Solid implementations by exploring how existing issues also apply to Solid-based use-cases, and identify specific paths for improvements and problem mitigation through developments involving Solid itself.

To enable these processes for Solid, we explore the primary questions as follows. To assist with understanding Solid, we summarise its specifications and relevant work in Section 2 with references for further information.

  1. How to describe Solid Pods using existing ‘cloud’ terminologies (Section 3), and distinguishing implementation of functionalities? (Section 4)

  2. What use-cases are possible from variations in resources and actors? (Section 5)

  3. How does GDPR apply to an implementation of a Solid Pod? (Section 6)

  4. What existing issues regarding privacy and data protection are also applicable for Solid Pods? And what are their potential impacts? (Section 7)

  5. What avenues are feasible for mitigating known issues through systemic extensions of Solid specifications? (Section 8)

The outcomes of this work are intended to benefit Solid stakeholders in understanding and applying GDPR (and other legal) concepts to their use-cases and thereby inform future technical and legal developments in the use of Solid. The intention of this article is to also highlight the risks involved in use of Solid. While there is no necessity or obligation for the Solid community to fix identified issues, especially where they are broader and universal in the context of web and applications. However, we hope that in highlighting them, Solid’s vision of using decentralisation and machine-readable information to empower users in controlling their data can benefit from identification of potential paths to mitigate known issues and innovate on better privacy mechanisms through developments within, using, and led by Solid itself.

Background Information and Relevant Work regarding Solid

What is Solid?

Solid describes itself as a ‘specification’ for decentralised ‘data stores’ called ‘Pods’ that act as ‘secure personal web servers for data’ with controls for accessing and using stored data. The Solid protocols [7] define use of web-based technologies for creating and using applications that act on data within Pods. They also define functionalities related to identity [8], access control authorisations [9], [10], handling of requests and notifications, governance of app requests and data access [11], and security considerations for implementations. The Solid project’s website2 showcases existing applications and development tools.

In Solid, users and applications are represented as Agents whose identity is represented by a URL that also provides information (e.g. profile, metadata) and is used in records and authorisations. Apps request access using pre-defined methods, to which users can grant access and also revoke it later. Access to data within Solid specifications is determined based on: (1) data in question; (2) operation to be performed; and (3) entity requesting access - where the entity can be a process, an app, or user. These are expressed using Access Request, which are decided upon by the user, and the decision stored within registries as Access Authorisations [11]. Details of access are provided to requestors (e.g. apps) as Access Grants with specifics of what the access entails. The data within scope of an access is specified using Data Grants. The Access Need concept specifies information on necessity of information (required or optional), scenario (personal or shared access), description (human-intended text), with possible grouping into sets for collective reference and application [11]. We summarise this current model as the tuple: {data, operation, necessity, justification, agent}.

Known Implementations and Use-Cases

Solid as a specification has implementations3 that are open and community-led, as well as closed-sourced commercial variants. These are utilised by Pod Providers to provision services with varying levels of freedoms (e.g. control over infrastructure) where either they or the users can choose providers for domains and servers, with choice of locations (as jurisdictions). In addition, Solid can also be self-hosted, e.g. by manually installing it on a user-controlled server.

Notably, the Flemish government in Belgium has embarked on an abitious project whereby all citizens will be provided a Solid Pod for the storage and control over their government-issued documents, and where apps can request access to this data based on the user’s consent only after establishing legal agreements with the government-established Data Utility Company outlining permitted purposes and processing [12], [13]. An earlier article from involved researchers outlines further similar use-cases for citizens [14], [15].

State of the Art regarding analysis, applications, and explorations of Solid

Researchers have proposed and explored the extension of Solid specifications to support policy management and its use in exercising more complex constraints over use of consent and data in Pods [16][19], as well as using them to control the subsequent use of data beyond access [20]. Further extensions of Solid Pods have explored mechanisms through which possession of data (e.g. educational degree) can be verifiably demonstrated without sharing it [21], changing infrastructure to a local environment of a smartphone [22], and moving beyond documents to using ‘knowledge-graphs’ to achieve richer data utility [23].

Solid Pods have been demonstrated to be practical in implementation of GDPR’s Right to Data Portability [24] and Right of Access [25]. The legal considerations arising from use of Solid have been explored in the context of use-cases (e.g. business to business) and applicability of GDPR’s obligations [26] which provide invaluable understanding of relevant questions to ask regarding Solid implementations. Similar explorations have also explored the purported value derived from decentralisation, its legality under GDPR, and the existence of issues [5], [6], including the issue of a data subject being a controller [6]. There have also been security-focused investigations of that explored the relevance of GDPR’s obligations in implementations of Solid [27] that emphasise the need for further investigations of this topic. Similarly, the European Data Protection Supervisor (EDPS) has outlined Solid [28] amongst other ‘Personal Information Management Systems’ as a topic of interest regarding GDPR, with specific emphasis on risks, consent management, transparency and traceability, exercising of rights, data accuracy, data portability and interoperability, and security.

Solid as ‘Cloud Technology’

Motivation for explicitly defining Solid as a Cloud technology

ISO/IEC 17788:2014 Information technology — Cloud computing — Overview and vocabulary [29] defines cloud computing as a “Paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand” where resources can be servers, networks, software, applications, storage, etc. It defines cloud services as “One or more capabilities offered via cloud computing invoked using a defined interface”. This definition suits implementations where a Solid Pod is a form of cloud service that enables applications and (cloud) services to utilise stored data. Since the Solid specifications do not place limits on how storage and computing is built and provided, we can apply the full extent of existing cloud infrastructure and provision methods to describe possible implementations.

By defining Solid as a Cloud technology, we benefit from identifying and applying relevant cloud terminologies, standards, guidelines, legal requirements and obligations, and utilise existing domain expertise. For example, using ISO 35.210 Cloud Computing4 standards for security, handling of sensitive data, interoperability, portability, policy management, and data governance; or ENISA’s Cloud Computing Risk Assessment [30] and cybersecurity resources. Another important example is the reuse of GDPR guidance on Controllers and Processors [31] that outlines requirements and responsibilities for use of cloud-based technologies through market providers.

The analysis of use-cases first requires accurately representing the specifics in terms of actors, processes, and information flows. For Solid, we reuse the existing Cloud technology concepts adapted from ISO/IEC 22123-1:2021 Information technology — Cloud computing — Part 1: Vocabulary [32] to provide a framework through which implementations of Solid as cloud technologies can be documented for common understanding. For this exercise, we first checked whether each term had relevance to Solid and the topic of this paper, and then rephrased their definitions to specify relevant descriptions of information associated with use of Solid Pods. The findings are summarised as a collection of entities in Fig [fig:cloud].

Actors

Actors refer to entities that have specific roles within the Solid ecosystem. Currently Solid specifications only refer to Agents, Social Agents, and Applications. We use the cloud-based terms from ISO/IEC terminology since they have standardised definitions and interpretations and also have relevance in legal compliance investigations.

  • Provider - entity that makes available Pods or relevant resources (storage, computing, identity, application, domain, etc.), with prefixes indicating contextual concepts such as – pod provider for the entity that provides Pods, app provider for providing an app, storage provider for providing storage, and so on. Note that this concept only refers to the provision, which is separate from development.

  • Customer - entity that has a ‘business relationship’ for the purpose of using Pods or its relevant resources. This includes purchasing, leasing, subscribing, or establishing any form of contract or agreement, with or without monetary transactions. Customers typically will have a direct relationship with a provider, which may be facilitated by one or more broker entities. For example, a pod broker is an entity that negotiates a relationship between a customer and a pod provider.

  • Developer - entity with the responsibility for designing, developing, testing, and maintaining implementations of Solid Pods, applications, or relevant resources. Similar to provider, this concept can also be prefixed to indicate contextual roles, such as – pod developer for the entity that is responsible for development of Pods (as platforms, software, or apps), app developer for apps, and so on. Providers and developers are important concepts to distinguish since they determine accountability and responsibility for resources, and may have obligations depending on the extent of their role in provision and development respectively.

  • User - natural person that uses Pods or relevant resources. Note that the ISO/IEC definition here also includes what Solid considers Agents as users, e.g. devices and applications. We make the distinction between User and Agent so as to distinguish between human and machine-based agencies - which is necessary to later analyse processes such as consent. We also distinguish between customer and user since they may not be the same - for example when a Pod customer is a company that provisions Pods from a provider, customises it, and provides it to (end-)users. In this case, the company is a provider for the end-users but a customer for Pod providers.

  • Data Subject - the individual that is the subject of the data within a Pod, and who may be different from the user. While ISO/IEC uses the terms PII Principal and Data Principal, we use data subject for its accuracy in this context as well as for consistency with GDPR investigations.

Solid specifications use the term Owner with the definition as: “An owner is a person or a social entity that is considered to have the rights and responsibilities of a data storage. ”. However, this term is problematic in that it may get interpreted as referring to data ownership which is a specific legally relevant concept, and may produce unintended applications in terms of copyright and intellectual property rights. It also creates issues through use of responsibilities which are based on legal obligations and rights, and which do not necessarily fall upon the individuals. To avoid such implications and to restrict interpretations to well-established common practices, standards, and legal norms - we only use terms from ISO and GDPR.

Functionalities

Functionality refers to the capability or feature exhibited by a Solid Pod or its related resources such as storage, computing, identity, or applications. Functionalities can be categorised based on capabilities, interoperability, and portability of data and applications. Pod capabilities is the classification of capabilities that an implementation of a Pod supports in terms of letting customers or users manage them. For example, support for adding additional storage, a computing or execution environment, or pre-configured applications. Capabilities can be described in relation to three broad concepts:

  • Application - how users manage applications. This relates to how users discover application, interact with requests for data use, ‘install’ apps, and perform configuration or other management and governance related tasks.

  • Platform - whether users can deploy, manage, and run processes. This relates to whether the relevant tasks necessary for a Pod - such as identity verification, authorisation, policy management, or anything that requires computing or execution can be managed by the users, or is provided via pre-configured environments, or can also be modified by users and/or applications for supporting other functionalities.

  • Infrastructure - whether users can provision and control resources related to a Pod or an app, such as storage, computing, networks, etc.

Pod service category is the classification of services supported by a Pod based on available capabilities. This defines how Pods are implemented using physical and virtual resources e.g. as Applications, Platforms, and Infrastructure, and how they are provided to customers or users. This section does not list the full extent of how cloud technologies and services can be provisioned in terms of ‘anything as a service’ (XaaS), but only considers the broad categories necessary to describe use-cases with concise and complete information.

  • Infrastructure as a Service (IaaS) - Users can control infrastructure (e.g. storage space, computing servers) directly, with potential limitations to use specific providers or offered choices e.g. operating systems, networking components (e.g. web server, firewalls), data stores (e.g. databases or triple-stores), and virtualisation capabilities. Solid Pods can be readily deployed as self-controlled servers in an IaaS environment5.

  • Platform as a Service (PaaS) - Users of a Pod are given a ‘platform’ through which they exercise their control over apps and resources without explicitly dealing with the underlying infrastructure. Platforms determine how users interact with their Pods, data, and applications, and are not currently defined by the Solid specifications. A platform can be a dedicated development over a Solid server instance, or be an extension of existing platforms to support Solid as additional protocols. Examples of platforms as both dedicated services (e.g. Inrupt Pod Spaces6 and extensions (e.g. NextCloud 7 are available in Solid’s documentation.

  • Software as a Service (SaaS) - Users use Pods via controlled interfaces (e.g. web-browsers, smartphone apps) and do not control resources. See ‘Pod Providers’ on Solid’s website8 for examples.

  • Compute as a Service (CaaS) - Providers provide dedicated computing environments controlled by the user (server or serverless) for process execution, e.g. to generate their own data or to enforce the requirement that data does not go outside controlled environments.

  • Storage as a Service (STaaS) - This is a hypothetical extension of current Solid specifications where different forms of storage are available as a service. For example, through dedicated SQL database, semantic-web triple-stores, binary or blob storages, or dedicated media storage services such as for photos and videos.

  • Data as a Service (DaaS) - For Solid Pods, this could be a service between data and apps that provides data-value and data-utility. For example, when companies and applications do not need to centrally collect, store, and manage data, but instead utilise the availability of data within a Solid Pod directly by using it on demand. These may include operations over (raw) data, invoke specific queries to get answers, be limited to only data collection or storage, or also involve storing ephemeral and persistent outputs from processes.

The ISO/IEC cloud standards define Portability as “ability to migrate an application from one cloud service to another”. Applied to Solid, portability refers to the extent to which data and applications can be migrated (i.e. moved) to another Pod, for example as:

  • Data portability - data can be migrated or moved outside of the Pod;

  • Application portability - apps can be moved between Pods;

  • Pods portability - Pods can be moved between providers;

  • Data Synctactic portability - data is ported using well-defined data formats;

  • Data Semantic portability - data is ported using defined semantics and data models;

  • Data Policy portability - data is ported while complying with relevant policies;

  • Application Synctactic portability - apps are ported by utilising well-defined formats;

  • Application Instruction portability - app instructions (i.e. executable code) can be ported;

  • Application Metadata portability - app metadata, such as profiles or established permissions and authorisations, can be ported to another Pod;

  • Application Behaviour portability - apps are ported without changes in functionality; and

  • Application Policy portability - apps are ported while complying with relevant policies.

The ISO/IEC cloud standards define Interoperability as “ability of two or more systems or applications to exchange information and to mutually use the information that has been exchanged”. Applied to Solid Pods, interoperability refers to how Pods and applications can exchange and mutually use data. In this, portability only refers to the ability to migrate or move data or apps outside of a Pod, while interoperability also refers to the usefulness of that data or app in the new Pod. For interoperability, the following interpretations for Solid are provided based on ISO/IEC defined forms of interoperability:

  • Pods interoperability - Data and apps in a Pod are interoperable with other Pods (same or different provider), and Pods support import/export features to achieve this.

  • Application interoperability - Data is interoperable across different apps (same or different provider), and apps support import/export features to achieve this.

  • Data interoperability - Data is interoperable with other data from the same or different provider. This means both data providers and/or consumers support the same (or a set of) data formats or schemas to achieve the interoperability.

  • Synctactic interoperability - Data and apps use interoperable formats that are understood by both providers and consumers (e.g. CSV, JSON) to achieve interoperability;

  • Semantic interoperability - Data and apps use well-defined schemas, ontologies, or data models that are understood by both providers and consumers.

  • Behavioural interoperability - Interoperability does not detriment functionality.

  • Policy interoperability - Interoperability takes places while maintaining compliance with legal, organisational, Pod, service, or user policies.

Data categories

While Solid Pods are defined as a data storage service for individuals to store and control their (personal) data, they can also contain several other categories relevant to the functioning of Pods, applications, and services – specified by ISO/IEC terminology as:

  • Personal Data - category for data associated with an individual (where exact definition depends on jurisdiction). Non-personal data is data that is not associated with an individual. Note that this concept is broader than PII which relates to identifiability of the data with an individual - for example, removal of identifiers may suffice to make the data non-PII but it would still be personal data, whereas its (complete) anonymisation results in non-personal data.

  • Customer Data or user data to refer to data being under the (legal, contractual, or other forms of) control by customers and users respectively. If customers, users, and data subjects are distinct - their respective data categories will also be distinct.

  • Derived Data - category for data produced as a result of interactions with services and applications. This can be logs such as those associated with data access, usage, or processes. It also includes data associated with authorisations, e.g. produced as a result of managing permissions for an app.

  • Provider Data - category for data under the control of providers - such as configurations for provisioned Pods, resources, or applications, or logs relevant to operational processes such as identities or used to calculate charges for use of resources.

  • Account Data - information about accounts regarding Pods, resources, and apps.

  • Protected Data - data needed to be protected by provider, user, or application.

  • Publicly Accessible Data - publicly accessible data9, where publicly does not imply visibility or accessibility, but should be interpreted as specifying access within contextual boundaries.

In these definitions, while sensitive information and data are not explicitly defined, they are covered through other ISO/IEC standards related to data security and governance, such as ISO/IEC 19944-1:2020 Cloud computing and distributed platforms - Data flow, data categories and data use Part 1: Fundamentals [33] which refers to sensitive data categories associated with children, finance, health, and medicine - and provides guidance on how these are to be managed within cloud-based data flows. Through these, we understand the necessity to categorise data based on relevance to operations, actors, as well as sensitivity, and to use these categorisations within the relevant processes associated with security and oversight. Solid specifications currently do not explicitly define such categorisations, though they do support declaration of categories for specific data within a Pod.

Contracts and Agreements

The use of Solid Pods, related resources, applications, and other services is governed through agreements and contracts10 between providers and customer or users. ISO/IEC terminology describes service agreement as an agreement between a provider and a customer regarding the provision of specific services. It also defines service level agreements (SLA) between providers, customers, and suppliers that identifies the services, how they are to be provided, their targets, commitment to specific objectives and their qualitative characteristics, and which can be part of another contract or agreement. The distinction between the two relates to the amount of details and specifics in terms of what the service entails and what should be provided and/or expected regarding its use.

Applied for Solid, agreements can be associated with Pods, resources (e.g. storage, computing, identity), Apps, or Users. These can relate to specific functionalities (e.g. Pod storage space, computing in hours, resources for a specific app), and can be individually managed by customers or be part of a common contract governing terms of use for Pods and apps. In addition to these, the specific contracts and consenting agreements established by users are separate concepts which do not feature within ISO/IEC cloud-based terminologies, but are instead covered in separate standards associated with privacy - such as ISO/IEC 29184:2020 Information technology — Online privacy notices and consent [34].

The Solid specifications currently do not support or specify any form of agreements regarding provisioning of Pods or resources, or for Apps. However, Pod service agreements are mentioned externally - for example as part of information on where to find a Pod provider. For interactions between Apps and users, the specifications only refer to ‘authorisation’ that are recorded and stored within the Pod.

Functionality Layers in a Solid Pod

In this section, we explore functionalities in terms of how data is stored, retrieved, and used through a Pod. In this, the relevance of cloud technologies is apparent in that each functionality can be implemented using a different and distinct set of technologies, and can be associated with a different cloud-based actor. These also relate to different capability types - such as where functionalities are implemented as infrastructures, platforms, or services.

The functionalities are represented as layers based on the systemic influence they have on each other where each layer depends on the implementation of the layers below it to define its own functionality. The layers are intended to assist in conceptual description of Solid’s implementations and provide a common basis for co-ordination of efforts associated with each ‘layer’. This is based on the OSI model consisting 7 layers describing communication of information between two systems [35].

The layers also represent a ‘separation of concerns’, and act as a framework for indicating the scope of developments by indicating the affected layers. For example, a data storage solution has immediate relevance for the data retrieval layer as a dependancy, but is not as strongly coupled with the interface layer. This separation of functionalities as layers also assists later in the establishment of processes regarding accountability - such as notices and consent, security measures, and identifying the role of entities in determining how data is being collected and used based on which layers they control and what limitations are imposed. In addition, the layers enable accurate analysis of cases where Pods may be provisioned with specific limitations on some of these functionalities which may result in restrictions on portability and interoperability of data and apps.

We identified the following layers (see Fig [fig:layers]) by distinguishing between implementations of Solid Pods and applications in terms of interactions with data: data storage as the bottom layer given Solid’s reliance on data storage as a central concept, followed by data retrieval to retrieve stored data, computing to (optionally) perform computations on it, access control to control retrieval of information, communication to share information between entities and resources – and finally interface for interactions and management processes by actors. Orthogonal to these are layers associated with logging, policies, security, and identity which have relevance to and are influenced from implementations of each layer.

Data Storage Layer

Physical and Virtual Views

The Data Storage layer refers to the physical and virtual arrangement of data within a Pod or its aligned resources. Here, physical refers to the actual data storage mechanism e.g. disks, bytes, or file systems, and virtual refers to non-physical or logical arrangements that provide alternative ways to represent and arrange data as an abstraction over underlying data mechanisms. An example of these is how a specific URL representing a path to a resource may or may not correspond to the actual location of that resource within a server. Another example is the encapsulation of information, such as a URL with a fragment identifier representing a specific element within the HTML document.

Defining virtual or logical views over physical resources enables data to be provided and stored using semantics or schemas - such as within databases. This enables the same physical data resource to be represented and used as different virtual resources, which opens possibilities for content negotiation and richer use of data within applications based on synctactic and semantic interoperability. For example, a collection of bytes on a disk can be interpreted as byte streams, documents, or semantic data (e.g. RDF triples), or they can be converted on-demand to requested formats and schemas.

The governance of access modes is based on how data can be stored, read, used, queried, and controlled – which affects how data and access are communicated. The current Solid specifications specifiy URLs for accessing data, which is similar to how documents are stored using file-paths. Virtualisation can be used to expand what the paths point towards, e.g. through the web server to map different paths to different data or retrieval methods. However, there are strong arguments to take advantage of the virtualisation to promote greater and innovative use of data beyond fixed documents and formats [23].

Cataloguing

The data storage layer also determines how data can be grouped together, such as in the form of folders, catalogues, registries, or collections. Such arrangements are crucial to create logical views that enable easier storage and access to related data within the same context. For example, apps such those for contact books and photo albums rely on such contextual arrangements [23]. Limitations on the ability to have more than one grouping for the same data also places limitations on how it can be reused, as well as creates issues regarding privacy due to lack of separation capabilities [23]. Solid specifications currently specify Data Instance as a specific and unique instance of data that conforms to a Shape Tree (a specific schema), which is stored within a Data Registry. However, these do no resolve issues of limitations on data reuse since such shapes can be different for each user and app without a way to support interoperability between data providers and data consumers.

Data Partitioning and Mirroring

Data partitioning is the separation of data, typically undertaken for considerations of efficiency, performance, control, or contextual needs. For example, data more likely to be requested from specific locations is stored closer to those locations to improve retrieval speeds. Another example is where sensitive data is stored in a separate physical or virtual location which can be supplemented with dedicated security measures. Data mirroring is the duplication of data - which can be performed separately or alongside partitioning for the same reasons of increased availability and efficiency, as well as to reduce costs arising from network egress. Both of these are common aims when using Cloud Delivery Networks (CDNs), typically provided by a third party.

For Solid, a Pod is considered as a holistic storage managed under a single identifier (i.e. an IRI for the Pod). Internally, it can be split into separate instances, each of which can be a virtual resource attached to a Pod, or be separate Pods themselves. While the specifications themselves do not explore this topic in detail, we later reuse these concepts in terms of how apps and users manage data, their usefulness towards having additional security for sensitive information, and using partitioning and mirroring to ensure (some) data is always stored or available within a jurisdiction.

Data Retrieval Layer

Retrieval Forms

The retrieval of data is based on data storage layer in terms of physical arrangements - such as retrieving data as bytes, blobs, or streams; and virtual arrangements - such as retrieving using specific formats or schemas. In addition, the data storage layer also affects retrieval by defining how data can be accessed in terms of identifiers, paths, and arrangements. The implementation of the data storage layer therefore affects the retrieval of data in terms of enabling or disabling the abilities through which users and apps can access that data. For example, if data can only be retrieved as binary documents and using paths to specify targets - apps must either be developed to use this arrangement or require additional efforts to convert it into required arrangements.

Data retrieval can be implemented directly by the Pod (i.e. as operating software), or by intermediaries (e.g. apps, services, manually) – where the support for specific forms and requirements for retrieval determines what capabilities are possible and feasible for use of data. For example, if retrieval supports entire documents rather than individual records or partial data, apps such as contact books will request access to all possible data in order to retrieve names and contact numbers [23]. In such cases, where data storage may not support virtualisation by itself, retrieval can utilise intermediaries to collect and strip the data of unwanted elements to only return requested information.

Solid specifications currently specify retrieval using IRI or URLs indicating the path of data, which is then handled by the Pod server to retrieve corresponding documents or data instances. Other forms of retrieval can also be added to this, for example as APIs over URLs to specify metadata such as requested formats, queries, or other pertinent information. Well-defined URLs can also be used to provide a common method to obtain data based on categories, formats, schemas, etc. For example, /data/contacts as a common way to retrieve contacts irrespective of how they are actually stored within the Pod, and using parameters to specify formats or limit number of records.

Querying Data

Retrieval can also be based on queries - where specific criteria and constraints are expressed with which to identify and retrieve selective information. Queries can also provide a solution to the selective information problem described above, and also enable clearer access to information by specifying only relevant information to be retrieved. Further, queries can be used to create ad-hoc documents, or graphs, in requested formats - thereby also assisting in issues related to interoperability of information. An additional utilisation of queries, as a broad concept and form of information retrieval, is to specify derivations (e.g. conversion of values) and inferences (e.g. derive statistical analytics) over data within the Pod. This allows apps to request information without accessing the data in question since the queries would only return the requested (generated) information.

However, compared to path-based document retrieval, queries are more expensive in terms of computation, more difficult to assess in terms of permission and privacy, and require more logic to be available for handling calls and results. Solid specifications currently do not specify any form of querying or mention support for its development, though approaches have been proposed to move Solid’s storage and retrieval mechanisms from document-based to knowledge-graph [23].

Computation Layer

Pods as Controlled Execution Environments

Alongside queries, computational resources associated with Pods also provide the opportunity to create user-controlled execution environments which can be used to run processes locally. In this, since the data never leaves the Pod or its associated (user-controlled) resources, the overall process can become more trustworthy than sending the data to an app. This is an especially powerful paradigm for several cases, such as: (i) where computations do not need large amounts of data and are infrequent; (ii) they involve sensitive data; (iii) the apps and actors cannot be trusted to keep data; (iv) the apps and actors cannot be trusted to delete data after use; and (v) the user wants to control the process themselves - e.g. to ensure there is no bias or inaccuracy in outputs.

The code or instructions required for computation can be provided by an app, or it can be retrieved by the user through methods provided by the app. For example, to calculate the statistical mean of the user’s walking distance in the past week, the app can either submit a detailed query with the mean calculation, or only indicate the method mean with parameters based on some common understanding on how it should be executed. Alternatively, the Pod itself can determine functions to execute based on information requested. From previous example, the app only requests ‘mean walking distance’ (e.g. using a URL, query, API), and the Pod interprets the query, identifies and performs the computations involved, and returns results.

Cost of Computations

Computations can be costly to execute based on their complexity and resource availability. Since Solid does not requires Pods to have computational capacities beyond those required to perform basic operations related to data storage, retrieval, identity management, and access control, computational resources cannot be assumed to be present in all Pods. Which means the support for computations would be added separately by users - for example through provisioned servers or serverless environments, which may vary between Pods and providers. And if these are restricted to using specific vendors or platforms - can affect the interoperability of information and behaviours associated with use of Solid Pods.

In addition to dedicated computations generally having some cost involved regarding processing resources or time, there may also be computations inherently involved in retrieval mechanisms – such as converting the stored data into requested forms. For example, there are computations involved in mapping request paths to underlying folder structures on a server, or looking up indexes for which data paths are to be retrieved for a given API, or executing queries (of various complexities) to selectively retrieve information. These may result in additional costs if such features are not part of the Pod infrastructure and services, or are metered based on quotas. For example, several cloud providers charge the use of network communications (ingress/egress) with some initial amount provided for free. If users are not aware of this, even seemingly simple retrievals may end up costing money.

Cloud services also enable providing virtual models over existing data through the Data as a Service (DaaS) paradigm. One way to utilise this within the Solid ecosystem, is to provide DaaS features where apps can request (raw) data, information (e.g. in a schema), or answers (e.g. as queries - see retrieval in next subsection). In this case, the cost of retrieval can be paid by apps or shared with the users. This can also be used to incentivise data providers to submit data in well-defined and supported forms, or to offset their non-conformance by enabling other data consumers to convert the data in order to use it. Another avenue is to federate processes so that not all retrieval related computations take place on Pods, and instead some data is processed on the apps’ servers or client-side (e.g. a web browser), such as through use of Triple Pattern Fragments (TPF) [36].

Access Control Layer

Access Control refers to the method by which access to data within a Pod is controlled. It relies on the data retrieval and data storage layers to define how data is identified (e.g. specifying its path) and the method by which it is accessed (e.g. also by path, or API, or query). Its scope is limited to operations over data within a Pod, i.e. read, write, and erase, and excludes control over how that data is used or disseminated or managed once it leaves the Pod (instead, this is specified by policies in a later section).

In the current Solid specifications, access control is implemented as a user accepting a request from an agent (e.g. app) for operations (read, write, erase) over data (as URLs). In this, the specifications do not clarify how such URLs paths are mapped to data categories, or express complex conditions as policies based on agent class, resource class, and origin information, constraints and conditions of use – which are important as they are parts of service agreements and have legal implications.

For a Pod to implement different data storage and retrieval mechanisms, it would also need corresponding support from access control mechanisms to specify who can access data. Similarly, if retrieval methods can utilise queries and computations, these would also need support from access control mechanisms to control who can perform them and who they relate to data. For example, an app is permitted to retrieve only statistical mean using queries can exploit lack of control over what data can be queried to perform inference attacks to retrieve data. Therefore, the correspondence between access control and data retrieval methods also becomes relevant to the implementation and investigation of security issues beyond simple interpretations of permissions and access to data.

Communication Layer

Communication here refers to the method by which data and access is communicated (i.e. provided, requested, retrieved) in the context of a Pod. Solid specifications utilise web protocols (HTTP) for communication of data and access control requests, where URLs are used to identify Pods, data within Pods, agents, and other resources. The HTTP protocols (GET, POST, PUT, DELETE, etc.) are used to express interactions with resources. These are then utilised by users and apps to interact with Pods and the data they contain. Once access authorisations have been established, subsequent communications as per the Solid specifications are required to provide a security token that proves prior relationship and which can be verified as being valid.

If the data retrieval and access control mechanisms can utilise other forms of identifiers for referring to data, or support operations not part of the current access control methods supported by Solid, the communication layer will also need to be modified to support these. For example, the current URL based method presumes corresponding access control permissions based on that specific path. If this is not the case, and paths are virtual views that are not the same underlying data storage or retrieval paths, then the communication layer would require to be aware of such changes so as to accept these requests.

Similarly, if communication includes additional content such as policies or other metadata related to use of data or apps, the corresponding aspects of how apps declare themselves or perform initial authorisation requests and subsequent data use requests also needs to incorporate these changes. Finally, other forms of communication methods such as well-defined or common established URLs, or APIs, can be established as wrappers around existing HTTP/URL based methods to provide the necessary abstraction to hide implementation details of pods and data (e.g. exact path for data) from external agents.

Interface Layer

The interface layer is where the users (i.e. humans) interact with a Pod and manage their data and access to it for apps. This includes features that use UI/UX in the form of panels, dashboards, or similar design elements. Currently, the Solid specification does not provide any such feature, and leave it to the Pod providers to implement one for their users. The Solid website lists applications11 providing interfaces as file managers, messaging clients, calendars, and inbox management for app requests and notifications.

An important consideration in the development of such interfaces is what functionality they depend on within Solid Pods. For example, file managers rely on data storage and retrieval methods to understand what should be presented to the users. Similarly, contacts and calendar apps rely on the ability to retrieve relevant data and metadata to populate their respective interfaces. If the data storage and retrieval forms are fixed, then these apps can only utilise that as the basis to provide their functionalities - which can result in unintended excessive data exposure as well as difficulties in getting exactly or only the data required and in the correct forms [23].

Another consideration for interfaces is that if they require some computation and are expected to be executed on the client side (e.g. by the Pod or a web browser), there is no way to express or distinguish this from those that are a front-end to an external server. Users of interfaces thus may not be aware that they are interacting with an external agent or a locally executed process (which may or may not be controlled by an external agent).

Interfaces also involve the interactions that users undergo in relation to the use of their pod, data, and apps. For example, an app’s request to use data may be presented external to the pod as a notice on that app’s website. This may involve information about why the app wants to access the data, what data is wants to access, and other pertinent information such as policies applicable. Such interfaces should also be considered when investigating Solid’s use-cases since they are how users make decisions on whether to grant access. In addition, these interfaces are also important for legal investigations, such as the provision of privacy notices or the validity of consent.

Other forms of interfaces include notifications, notices, correspondences, updates, notifications, or other communications between entities (i.e. providers, developers, brokers associated with pods, resources, and apps) and users are also included within scope. Solid’s inbox functionality is relevant for such communications, but is currently limited to only handle some interactions regarding other users and apps.

Actor Layer

Actors are entities associated with specific roles in relation to the Pod, data, and apps. They use the interfaces and communication layers to send and receive data and requests. These are distinguished from what Solid terms as Agents to refer to legally accountable entities so as to understand who is responsible and accountable for implementations, decisions, and processes. For example, each of Pod, data, and apps can have distinct actors associated with its development (i.e. who created it or maintains it), provision (i.e. who provided it), users (i.e. who has access to it), and investigative roles such as auditors and governance agencies. Understanding who the actors are is a crucial step in investigations associated with data. The actor layer only considers those actors associated with data interactions.

We distinguish agents from an actors and entities for the purposes of establishing accountability. This requires identification or indication of actors with sufficient details so as to enable legal processes to apply. A good example of this is to specify the legal name of a company along with its address - which enables identification of appropriate jurisdictional obligation and authorities. Solid currently does not mandate such disclosures, but instead leaves it up to each app to provide pertinent details via its identity profile document. It considers domain-based identity as a strong form of identification. It also does not explicitly support or require storing legal identity or other relevant information associated with actors within its Agent Registries, for example so that the user can introspect actors and their identities at any time. Actors are also important to explicitly identify to form an agreement (e.g. contract, consent). Without explicit or implicit identification of actors, the establishment of agreements between the user and an app (for example) would not be valid or sufficient to trigger legal obligations and remedies. Note, this only refers to cases where the information about actors is not provided.

Another important consideration of actors is related to data provenance. This refers to actors that provide specific data, and may be required to be recorded as the source of that data. Solid’s specifications support specifying who can write or edit data, but do not record its provenance in a direct manner. Data source can have impact on the validity and authentication of information - such as for official documents, and have legal rights associated with it - such as to request rectification of inaccuracies.

Logging Layer (Orthogonal)

Logging refers to generating and storing information about processes and interactions in relation to the Pod, data, or apps as a form of record-keeping. Logs can be distinguished as data logs (read, write, edit, erase), access control logs (recording creation and use of permissions), policies, identity management (e.g. registration, verification), and security (e.g. incident reports). In addition to these, logs may also be kept by pod, app, and data providers for contextual reasons such as to assist in resolving operational issues. We distinguish logs as being supplementary from intentional persistent information such as an index of access authorisations.

Logging is expressed as an orthogonal layer as it can apply individually or collectively to all other layers. For example, logs can be limited to data storage mechanisms e.g. kept by the storage provider, or data retrieval methods e.g. kept by a server handling requests, or computing layer e.g. recording computations being executed, or access control layer e.g. record success/failure of requests.

Logs can stored as data within a Pod, and can constitute as personal data based on their contents. They may be mandatory and beyond control of the user, for example - as provider data to keep track of resource usage. Logs may be protected due to their sensitivity or importance in establishing accountability, for example - strong limitations on who can write and view access control logs in a Pod.

Solid currently supports an extremely limited form of logging, where decisions related to authorisations are recorded in a registry, and an Access Receipt is provided as a success response after authorisation. Such receipts do not specify particulars about scope or contents of access, how or where to store them, and are not necessary to be exchanged.

Policy Layer (Orthogonal)

Policies are a broad term that refer to documented conditions, constraints, or agreements regarding data, operations on data, and actors with specific roles. In the context of Solid, policies can be associated with each layer, and are thus expressed as an orthogonal layer. We distinguish between the terms policy and agreement, where a policy is considered an agreement if it can be interpreted as a formal and binding document between actors. A policy that is not an agreement is used to refer to documented information regarding conditions, preferences, requirements, requests, offers, or other similar notions associated with Pods, resources, data, or apps.

Policies go beyond access control as expressed in Solid specification as they can support conditions related to data categories, actors, locations, jurisdictions, technical and operational requirements, as well as richer contextual expressions such as limitations on duration, frequency, and also involve risks and rights related concepts. These can be expressed as abstract or universally applicable information, or refer to specific jurisdictional interpretations, such as those explicitly mentioning GDPR.

Policies can also refer to contexts outside the Pod. This is a crucial distinction to put in scope the operations on data outside of a Pod. For example, if an app’s policy only concerns what data it requires initially when asking for access, but not what it can do subsequently - this presents an issues regarding privacy and security as the app may reuse or share that data for undeclared purposes.

Solid currently does not support policies beyond those associated with access control and grouping of agents into categories. It also does not require an app to declare what it intends to do with the data in the form of a policy, or to record such policies within the Pod. While policies are referred to as additional links in an app’s profile, there are no conditions for what information should be present in them, their completeness, or how these should be expressed in relation to the use of data within Pods. It is important to distinguish these comments as referring to how to retrieve and apply such policies and information to the use of Solid Pods with the understanding that these may be externally regulated e.g. through legal obligations.

Policies can also be used for governing the usage of data and influencing computational executions. For example, data can be shared with sticky policies that dictate the conditions under which that data can be used as well as specifying obligations to be fulfilled in return [16]. Similarly, usage policies can be used to ensure that any processing operations on data are performed in conformance and compliance with specific requirements and contraints [20].

Security Layer (Orthogonal)

Security is a broad topic that can be applied to every other layer. For example, data storage mechanisms can have security measures regarding data integrity, encryption, usage limitations. Similarly, data access methods can implement security measures in terms of placing limitations on what data can be retrieved, or incorporate access control and policy checking as a precursor to ensure valid access. Separately, the servers implementing Solid Pods or associated resources can utilise authorisation, DDoS protection, HTTPS, or other similar security measures as part of their infrastructure and software. Given the breadth of the topic and its universality to all aspects of Solid Pods, data, and apps - we consider security an orthogonal layer. For clarity, we distinguish between security and privacy as separate topics in the context of this article.

Solid specifications have dedicated sections to security and privacy considerations. For example, strong and weak identifiability of applications is defined to distinguish security in terms of an app’s identifiability via existing domain certification methods [11]. Similarly, specific risks are acknowledged12 with information on their relevance and mitigation.

Identity Management Layer (Orthogonal)

Identity Management refers to how identities of actors and agents are managed through identifiers, credentials, authorisations, and authentication mechanisms. This is considered an orthogonal layer since identities may be associated with each layer. Though closely tied to security, policy, and access control layers, the management of identities on its own has enough significance to be considered separately. For example, the identity of a Pod is separate from the user’s identity - as reflected through their URLs, and which may have their own requirements for being considered valid or trustworthy. Identity management can also be separated based on the involvement of entities and the resources they control. For example, a storage provider’s identity could be a separate implementation with its own corresponding access control methods.

Solid currently specifies identity management in terms of users and apps through the use of WebID specification [8]. These may need to additionally managed for use of specific resources, for example to translate an app’s request to use some data in to a data storage provider’s identifier for accessing data.

Use-Cases Exploring Governance of Solid Infrastructure and Apps

The previous section established the terminology regarding actors, roles, functions, and processes associated with Solid. This section uses these term to describe various possible use-cases for the implementation of Solid Pods and apps. The use-cases are a collection of different permutations and combinations for how Solid Pods can be implemented in terms of infrastructure. They are not exhaustive, since there can always be new paradigms and methods that change how implementations are deployed and/or function. For relevance, see prior descriptions [12][15], [24] and explorations of legal relevance [6], [26], [37]. Instead, the categorisation of use-cases focuses on the aims of this article, which are: (1) to identify data governance patterns involving Solid Pods; (2) express use-cases in terms of freedoms and limitations on abilities and actors; (3) to explore issues of trust and security that arise from specific governance patterns; (4) to explore variations in responsibilities and accountability; and (5) to guide how legal compliance investigations should approach the use of Solid .

The use-cases are categorised based on specific topics they concern, i.e. as Infrastructure (Section 5.1), Apps (Section 5.2), and limitations or extensions to functionalities ((Section 5.3). Each of these categories reflects a separation of concerns in terms of implementations and behaviours, and can be combined with each other to further create more complex use-cases. For example, the use-cases related to infrastructure where Pods are managed by providers or users can be combined with other use-cases where apps can be installed from anywhere or only from an ‘app-store’. Each use-case is given a unique identifier (with prefix UC-) for convenience in referencing them within discussions.

Use-Cases based on Infrastructure Management

These use-cases are based on variations in the infrastructure used to implement and deploy Solid Pods. They concern the developers, providers, brokers, and consumers of Pods and associated resources (e.g. storage space, computing environments), and how these are provisioned to the (end-)users. These use-cases include limitations or restrictions placed on the use of (physical) resources (e.g. what storage providers are supported) but not those that concern how Solid as a software platform operates (e.g. API used to access storage, or software-based limitations). The latter categorisation is explored in Section 5.3.

UC-I1: Completely Self-Managed

The first use-case considers the user setting up everything themselves in terms of the necessary resources (server, storage disks, networking, domains). This could be a typical home-setup where the user runs the server within their home or a commercial setup where an organisation deploys servers from their premises. It provides a great degree of freedom in terms of choice of what hardware and software (including Solid implementations) are installed, including operating systems, firewalls, or any other applications and devices that are chosen by the users. At the same time, it also puts the onus of maintenance and responsibility of choosing technologies on the user as a (self-)provider.

UC-I2: Solid with IaaS

Subsequent use-cases divulge from UC-I1 by increasingly abstracting the management of resources. US-I2 uses the IaaS cloud paradigm where users provision infrastructure from a provider. Other than the management of provisioned resources, users retain the freedom and flexibility of choosing what software they install and use on their servers. Since the hardware is provisioned as part of IaaS, users have to abide by the availability and flexibility of chosen resources in terms of features, compatibility, and management options. Communications with resources are typically made through APIs defined by the resource providers. In some cases, a IaaS provider may only support or allow provisioning resources within its cloud service offerings, or provide incentives such as lower costs for using the same providers. IaaS offerings typically also offer complete flexibility in choosing the locations of where each resource is located - for example, having server and storage in different locations.

UC-I3: Solid as PaaS

Further abstracting the management of resources, utilising the PaaS cloud paradigm for Solid enables users to request a Pod as a platform where they receive a pre-configured infrastructure deployment where details and management of underlying resources such as servers, data storage disks, etc. are abstracted and hidden. Users may have the option to provision additional resources or change existing ones based on supported options. Providers may offer pre-configured choices of implementations based on Pod specifications, resource quotas (e.g. storage size), locations, and software (e.g. underlying operating system). In case of PaaS, users have lesser freedom since direct access to hardware is restricted and all interactions happen through a dedicated virtual environment or APIs. However, users get more convenience since they have to manage less resources.

UC-I4: Solid as SaaS

Applying the SaaS cloud paradigm for Solid would mean providing a Pod’s functionalities in the form of software that the users can access and manage, e.g. through apps on their devices or through a web browser. In this case, all the underlying resources (e.g. servers, storage) would be hidden and managed by the provider. This represents the least amount of freedom since users would not have any choice in how their Pods are actually implemented, while also offering the most amount of convenience as the users get a complete system that is managed and maintained by the provider. However, it also represents more responsibilities for the providers in managing the resources, deciding how they should function together, addressing security, as well as following any specific policy or legal obligation arising from such decisions.

The SaaS paradigm also enables capabilities such as integrating Solid specifications on top of existing products and services, such as that offered by NextCloud13, by using Solid as a thin-layer of APIs on top of the underlying native solutions. This approach can also be extended beyond SaaS to provide the resulting solution as PaaS or IaaS offerings. More importantly, if Solid specifications are constrained only to define how data is stored and accessed through URLs, along with some requirements on identities of users and apps, then the SaaS model enables any cloud-based storage provider (e.g. Dropbox, Google Drive, Microsoft OneDrive) to provide Solid-compatible solutions through their existing products and services that are popular and widely utilised.

UC-I5: Solid with CaaS

CaaS in combination with Solid enables the use of other use-cases with additional resource management for providing computing functionalities, either in the form of servers or as serverless computing environments and APIs. This could be for executing processes associated with users (i.e. self-managed computing) or apps (i.e. controlled environment for third-party computing). The management of CaaS itself could be via IaaS, PaaS, or SaaS paradigms, which has different implications on roles and responsibilities associated in controlling the execution of processes. CaaS can offer valuable trust and security accommodations where data does not leave known boundaries or is always under the complete control of users. In such cases, CaaS can be an additional cloud-based service provisioned by providers or brokers that can be mutually used by users and apps.

UC-I6: Solid with STaaS or DaaS

The data stored within Solid may contain specific value only through additional operations or interpretations. In such cases, rather than apps asking for the entirety of (raw) data and then processing it, Solid enables the use of STaaS or DaaS paradigms where apps can request specific information retrieval methods (e.g. SQL queries) or answers (e.g. via semantic reasoning) to be derived from data and provided in lieu of sharing data itself. This can be a more privacy-considerate model where apps never receive the actual data. The availability of information retrieval and question-answering mechanisms can be implemented as any of the other cloud paradigms (IaaS, PaaS, SaaS), with varying degrees of control possible on whether the users can control what data and features are exposed to apps.

The STaaS and DaaS functionalities also enable third-parties to provide additional services on top of Pod implementations by acting as information brokers between the users and the apps. For example, a STaaS/DaaS service provider can provide an API to calculate fitness metrics by retrieving data from a Pod, operating on it, and returning the derived information to an app. Separately, the STaaS/DaaS service can also be executed locally (i.e. within the Pod) through CaaS capabilities.

Use-Cases based on App Management

These use-cases are based on different methods by which applications can be managed in relation to a Pod, where their processes are executed, and how they are governed in terms of trust and security mechanisms.

UC-A1: Apps are Unmanaged

This use-case reflects the scenarios where any app can be used by the user without any checks and balances in place regarding their origin, conformance, or any form of control. It reflects the current situation in Solid where there are no mechanisms in place that require an app to declare necessary information (e.g. app provider, policies, legal compliance information) and which are checked before allowing the app to access data. Note that this refers to additional requirements as distinct from conformance to the Solid specifications which only require apps to utilise WebID profiles and exchange security tokens for data access requests.

UC-A2: Apps follow Conformance Protocols

This use-case adds requirements for apps in terms of how they should express conformance towards specific protocols, specifications, standards, or legal obligations, and which can be assessed or verified to ensure trust and accountability. A simple example of such conformance is the current practices regarding smartphone app stores that require applications to be packaged with specific metadata, which is checked before ‘installing’ it on a smartphone. In the case of Solid, since apps are not installed but rather ‘registered’, the same process can be performed as part of the registration. The criteria and conditions for conformance can be based on requirements derived from organisational policies, user or community guidelines, legal requirements, or other sources.

UC-A3: Apps are managed through App Stores

App Stores or Marketplaces are mechanisms for digital distribution of software, and are a widely utilised service across devices and platforms. This also includes package and code repositories, and distribution of pre-compiled binaries, code that is then compiled on device, and virtualised applications which are provisioned via cloud services. App stores enable convenience for both providers and consumers by providing a common interface which can also provide features such as search, curation, recommendation, updates, versions, dependencies, configurations, security notifications, and installation management. App stores can be established as commercial enterprises (e.g. Apple, Google), or be community maintained efforts (e.g. Linux package repositories), and can have dedicated tools and services associated with app installation, verification, and updates.

App stores may or may not have policies for applications to satisfy in order to be listed and provided to users. For example, Linux distributions such as Arch have separate repositories for packages that undergo some level of oversight by repository maintainers (i.e. official repositories) and those that do not (i.e. Arch User Repository14. In addition to these, app stores also require specific mechanisms to be in place to verify applications in terms of security and tampering. Apple’s and Google’s smartphone app stores also feature oversight bodies that audit applications and remove applications that do not conform to policies or are found to have violated terms.

Solid currently does not have an App store, but does maintain a curated list of apps on its website15. NextCloud’s implementation of Solid as an additional layer on top of its own services also provides some extent currently unknown) for usability of apps developed for NextCloud16.

UC-A4: Apps are Vetted or Certified

Vetting or certification is the process by which an app is audited or assessed and provided a demonstrable and verifiable certificate or seal that it can produce as a trust mechanisms. Certification criteria can be based on standards - such as use of ISO certification agencies, established codes and guidelines, and legal compliance requirements - such as GDPR’s use of certifications and seals as a measure of demonstrating compliance conformity.

Certification processes are common in the provisioning of software, where developers sign their created applications, and which are in turn verified by execution environments before permitting users to install or use them. Such mechanisms have also been integrated into app stores as part of security measures. In the case of Solid specifications, currently apps only have to have a verifiable identity attached to the (web) domain address under which they operate. An extreme form of certification is the use of legally enforceable agreements on what the app is permitted or restricted to do, which is the mechanism used by the Data Utility Company in the Flanders use-case .

UC-A6: Apps are installed ‘locally’ in a Pod

Tangential to how apps are provisioned is the question of how apps interact with data in the Pod. Solid’s specification consider only cases where the app requests use of data through URLs. However, following the CaaS paradigm, it is also possible for apps to be provided as entire or partial code bundles that can be installed within a locally controlled environment such as the Pod or associated server. This is akin to installing software on a device managed by the user. In such cases, the executions may all or partially take place within the local environment, with external communications for sending/receiving data and instructions as alternate mechanisms. The execution environment and control retained by apps are important considerations for investigating responsibilities.

UC-A7: Apps install ‘service’ within a Pod

Similar to how apps can be local executions in respect to a Pod, apps can also install or provide ‘services’ which are installed or executed locally within a Pod. The term service here refers to the architectural concept where a process is executed in the ‘background’ such that its execution happens without an active front-end or interface for the user. Examples of such services include those used for sync, updates, information retrieval, and managing protocol handling (e.g. communications). A service can also be an isolated installation separate from apps. For example, users may choose to only install services that operate as part of their pods without the corresponding necessity for them to be expressed as ‘apps’. Solid currently does not specify or allude to the use of such services, but their prevalence and use in other devices and platforms represents a possibility for them to be provided as part of implementations.

UC-A: Apps create a locked ecosystem

This use-case represents limitations or restrictions for only specific apps to be allowed to be used or installed for a given Pod. This is different from the use of an App store where any app that satisfies the criteria for inclusion can be used, and instead represents the case where a Pod provider specifies the apps that can be used. Users have no choice but to use only these apps, and nothing else can be used without the Provider’s support. Such measures can be enacted for purposes of ensuring rigid security, controlling data use, or to lock users into the Provider’s ecosystem. Currently, Solid places no such restrictions.

Use-cases based on Extensions and Restrictions to Solid’s Functionalities

These use-cases represent additional extensions or restrictions that deviate from the current Solid specifications. These are mentioned as they have impact on how Solid Pods function, are provisioned, their portability and interoperability, and have important considerations on responsibilities of both providers and users, especially under GDPR.

UC-L1: Limitations on Data and/or Apps

A Pod or App provider may place limits on data or apps whereby the user cannot exercise control in terms of editing or deleting it. For example, a Pod provider may provision the pod with some pre-configured apps that the user cannot remove, or some pre-stored data that the user cannot modify. The user may have the option of adding and managing other additional data and apps. An extreme extension of this concept is the case where all data within the Pod is not within the user’s control, and where the Pod is effectively a storage-only solution. Less severe iterations are also possible, such as only allowing certain types of data to be stored in the Pod, or for the data to only be shared with certain apps.

UC-L2: Shared Pods with Multiple Users

A Pod may have multiple users with or without separation of data amongst themselves. This is akin to multiple users using the same terminal or computer with separate accounts, who may be able to view, use, or control each others dedicated data, and may have common locations for shared data. Such separations can reflect private groups such as families or organisational environments such as departments and teams. The sharing of a Pod has important considerations in terms of responsibilities, security, and the necessary software support required. Currently, Solid does support multiple users to access data within a Pod with varying degrees of control, but only supports a single user to be associated with the Pod in terms of its identity.

UC-L3: Pod where User is not a Data Subject

This use-case considers Solid Pods where the user is not the Data Subject i.e. the data within the Pod is about an individual other than the user. This could be where the user is permitted to manage such data (e.g. as a parent or guardian or legal representative), or where Pods are used as dedicated storage areas for an individual’s data and the users are part of the organisation for managing it. In such cases, the user deciding who accesses the data has implications for the data subject, which means the users also share responsibilities depending on the specifics of their role and relationship with the data subject.

UC-L4: Virtualisation of Pods

A Solid Pod is presumed to be a single holistic resource. However, it is feasible for a Pod to be implemented as a virtualisation over several resources or even Pods that are hidden from external view and are used to separate and manage data and relevant concerns. The inverse of this is also feasible - where separate Solid Pods are in actuality the same underlying resource with virtual separation to represent data of different users. Such patterns require support from various implementation layers, and may entail obligations for the entities deciding how such separations should be implemented and managed. For example, if Pods are implemented on a single storage infrastructure with separation managed virtually through software, then a valid security concern is whether accessing one user’s data can accidentally retrieve another another user’s data.

Applying GDPR to Solid

The GDPR [3] is a relatively large and complex legislation with 99 Articles17 and 173 Recitals that define roles, compliance processes, obligations along with supplementary guidelines and case law. We utilise GDPR’s Art. 5 Principles as abstract and high-level goals through which other relevant articles and obligations are discovered. Our investigations and arguments have overlaps with similar prior analysis [6], [26], [37], which we use to focus on specifics for Solid’s implementations.

Lawfulness, Fairness, Transparency

GDPR Art.5-1a stipulates that all processing must be “processed lawfully, fairly and in a transparent manner”. The principle of Lawfulness refers to the necessity for any and all data processing to be justified through the use of one or more of the 6 lawful or legal basis defined by Art.6. These contain amongst others - consent (Art.6-1a), contract (Art.6-1b), legal obligation (Art.6-1c), and legitimate interests (Art.6-1f). In addition to these, certain contexts have separate legal basis which must be utilised, such as special categories of personal data requiring Art.9 legal basis, and cross-border data transfers requiring Art.45, Art.46, or Art.49. In addition to these, the principle of lawfulness also requires the processing to be lawful with other legal requirements outside of the GDPR.

The choice of which legal basis to use is dependant on the intended purposes, processing operations, personal data categories, as well as their necessity and importance. While the GDPR does not specify any particular order or preference for one legal basis over another, an incorrect application can result in compliance violation. The most common legal bases in use are related to contract (e.g. service provision), legitimate interest (e.g. fraud prevention), and consent (e.g. opt-in marketing).

Another point of note is that under GDPR, each (singular or joint) Data Controller must have its own separate and independent legal basis, i.e. if there are two Data Controllers with separation of purposes and concerns, and both require consent, they must collect such consent separately. Such conditions for the validity of consent are noted in Art.7, Rec.32, Rec.33, and Rec.43, along with the necessity to maintain records in Art.7 and Rec.42.

In Solid, while there is a record of access been granted to an app for specific data, it is not sufficient to meet GDPR’s requirements for either valid consent or a valid record of consent. For example, Solid’s mechanisms and records do not specify who the Data Controller is, the purposes for use of that data, the categories of data (especially for special categories), and if there are other legal bases also associated with that data.

Solid defines the concept Access Needs as “a specific explanation of why that data type is being requested” [11], which can be considered analogous to a purpose. However, the use of terms access and needs imply specific contextual connotations that may be interpreted ambiguously and in a manner not compatible with the use of purpose in data protection and privacy laws. For example, access can refer only to the initial access to data - which is only one type of possible operation and does not cover subsequent uses and sharing. Similarly, needs implies some notion of necessity which may be misleading in cases where the data is optional or not strictly necessary to fulfil a purpose but is useful to enhance it.

In order to be compliant with the GDPR, it is vital for a Data Controller to declare their legal basis, and to indicate this clearly to the data subjects. In the case of Solid, there is no corresponding concept or method by which an app can indicate the legal basis for how it uses the data. Since the model for data access is based on asking permission from the user, this may be misconstrued as being only based on consent, but in actuality may be accompanied by other legal basis - such as contracts and legitimate interests. As a consequence, Solid also does not feature or support recording the legal basis as part of establishing the access control authorisation from an app request, nor to record subsequent accesses to the data being performed using specific legal bases.

An important point to consider in terms of behaviour is that organisation’s may readily share data with other third-party organisations based on (assessed and validated) the third party’s legitimate interests or as part of a legal obligation (e.g. Banks and Know-Your-Customer). In the case of Solid, because the data is stored on a Pod and managed by a user - the responsibility of handling such requests also falls on the users. This represents an unexplored area of GDPR, such as for deciding what are the legal options for when Solid Pod users receive a request to access data but with the legal basis as legitimate interest or legal obligation instead of consent or contract. The answer may be simpler in cases where users are also data subjects, since this means they can exercise the right to object (Art.21) for legitimate interests, while handling legal obligation requests can be challenging regarding establishing the validity of such requests.

Fairness refers to the necessity for processing to be in line with reasonable expectations of data subjects and which does not have unjustified adverse effects on them. This principle provides data subjects with some degree of protection from being exploited, manipulated, or otherwise have detrimental impacts arising from the processing of their personal data. In order to assess fairness, considerations include specific justifications presented for collection and use of data, their comprehension by individuals and groups affected, whether there is discrimination, and if there are detriments or obstacles for individuals to exercise their rights.

Transparency, also related to fairness, refers to the availability and comprehensibility of information regarding processing of data (e.g. what, why, where, how, who) - which is typically associated with a privacy notice (or privacy policies). Such notices may be presented to individuals as part of their contract (e.g. terms and conditions), or in the consenting dialogue. GDPR necessitates provision of such notices for transparency in cases where data is collected (Art.13) as well as not collected (Art.14) from the data subject.

Solid does not support nor require apps to use or provide information via notices when making requests, except the ill-defined Access Needs descriptions. This means apps must use external mechanisms to provide such notices, such as through their websites or other interfaces, and the details and specifics of which are not accessible nor recorded to the users. While an app’s metadata can contain a link to the policy that contain this information, Solid does not mandate it to be present, resolvable, or assist in ensuring this information is available and accessible to users. In addition, the lack of acknowledgement of how further processing of data beyond initial access should be indicated or recorded as part of the data requests also compounds the detrimental impact. This results in difficulties for assessing the transparency of data processing operations.

Purpose Limitation

GDPR’s Art.5-1b requries data be “collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes”. This necessitates all data requests to be associated with a purpose that not only covers the initial justification, but also any subsequent additional uses of that data. The resulting obligation requires Data Controllers to inform about specific purposes, typically through notices, and places constraints on what is considered a ‘valid purpose’ based on its clarity, comprehensibility, and specificity.

The association between purposes and legal bases requires the use of a specific legal basis to cover all possible purposes and uses of data, which is at odds with Solid’s focus on only considering data operations within the context of a Pod. For example, if a service provider requests access to data for the purposes of providing a specific service with the lawful basis as contract, it cannot subsequently collect and use that data (externally outside the Pod) to also perform other activities that are incompatible with the initial purpose. In such cases, the service provider is required to ask for consent (or choose another appropriate legal basis).

In order to assess whether purposes are valid, and whether they are being correctly used as required by GDPR, a Data Controller is obliged to keep a Record of Processing Activities (ROPA, Art. 30). In addition, purposes also have to be made available to data subjects - typically through privacy notices. In case of Solid, while there are no specific obligations to keep a record of such purposes within the Pod, the existing use of Access Needs and their groupings is similar to the use of purposes and purpose limitation principle.

Data Minimisation

GDPR’s Art.5-1c specifies the data minimisation principle, which requires data to be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”. Along with the lawfulness and purpose limitation principles, the data minimisation principle also requires data to be specifically and explicitly declared and be associated with purposes. Through these, assessment of adequacy and relevancy are to be made based on whether the Data Controller is using the ‘minimum amount’ of data to fulfil a purpose or is engaging in (unlawful) excessive collection of data that is not justified, nor required, or is based on future presumptions of usefulness. For Solid, first there requires to be explicit acknowledgement of what data categories are being processed in relation to which purposes (or as currently used - Access Needs). Only then it can be assessed whether the data being processed is adequate, relevant, and necessary for the indicated purpose. As outlined earlier, the data processing operations include those that occur within a Pod as well as external to it - which is not currently supported by Solid’s specifications.

Accuracy

The principle of accuracy in Art.5-1d requires Data Controllers to ensure data is “accurate and, where necessary, kept up to date”. Through Solid Pods, in theory, data subjects having (edit) access to their own data also gain the ability to rectify any inaccuracies or deficiencies themselves. However, in practice this may be difficult in cases where data is obfuscated or not comprehensible by data subjects, or through lack of relevant tools, or because modifying such data may affect its integrity and validity for subsequent use (e.g. in official documents). In such cases, the ability for data subjects to exercise their right to rectify information (Art.16) can be facilitated by providing a communication mechanism which uses the Pod to store and share the rectified information with the Data Controller as well as any other parties (Art.19). Note that under GDPR, a Data Controller is obliged to accept a data subject’s changes to their data, which effectively makes the data within a Solid Pod that is under control of the data subject an authoritative representation of their data.

Storage Limitation

GDPR’s Art.5-1e stipulates personal data be “kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed;”. This reflects the conventional interpretation where data is collected and retained by the Data Controller, which needs to be re-interpreted for Solid’s decentralised paradigm. First, the storage limitation principle emphasises the necessity to associate specific categories of data being processed with the purposes of processing. Second, it refers to the storage duration for data, which in context of Solid can be interpreted as both duration of access to data in a Pod as well as storage duration of data retained external to the Pod.

Currently, the Solid specifications do not support expressing duration associated with data access authorisations, which means apps have access to data in perpetuity until access is revoked. This also reflects deviation from ‘good practice’ guidelines regarding consent [38] which suggests limiting the duration of consent and/or re-affirming it periodically. While Solid specifications provide a way to record the timestamp associated with authorisations, they do not have necessary mechanisms in place to evaluate this periodically - as required for storage limitation and consent ‘refresh’ requirements. An important consideration for cases where data is not under the control of the user, such as that outlined in UC-L1, is that the storage limitation principle applies in full as per existing conventions to the entity that has control of this data.

In addition, while the use of identification in the storage limitation principle does allow such data to retained after being made anonymous - it represents a exceedingly high-bar since under GDPR the criteria for anonymity cannot be satisfied by merely stripping identifiers [39]. In order to indicate this information, it is necessary to express both duration of storage and the processing operation that will be applied after this period, i.e. deletion or anonymisation. Where the storage limitation principle is instead implemented by limiting access control, the scope for continued use of data through anonymisation may not be possible without the Pod and/or users permission and support. Of note, merely storing the anonymised data within a Pod that is or can be associated with a data subject has a strong possibility for the data to be considered identifiable, and thus become non-anonymised personal data again, even if the app does not use this identification information.

Integrity and Confidentiality

GDPR’s Art.5-1e states the principle of integrity and confidentiality that requires data to be “processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures”. This means Data Controllers and Processors who process personal data (including collection and storage) should ensure appropriate technical (e.g. encryption, access control) and organisational (e.g. training, codes of conduct) measures are utilised based on specifics of processing operations, data categories involved, and other contextual information (Art.32, Rec.78, Rec.83). This includes performing appropriate risk assessment and implementing risk management and mitigation processes in a systematic manner integrated with the processing (Art.32, Rec.75, Rec.76. Rec.77). The use of ‘security measures’ is a shortened common for referring to both technical and organisational measures under GDPR based on their intention to protect and safeguard data as well as impacts in the broadest sense.

While the implementation of measures related to security are typically the responsibility of a Data Controller, for Solid this changes based on who controls the implementation of Pods, resources, data management, and apps. not the case since the storage is managed by the user. For example, if the user provisions a Pod using SaaS, they have no control over the underlying infrastructure, whereas if they provision it using IaaS, then they control the implementation resources. The integrity and confidentiality principle is important given that it affects aspects such as backup, integrity controls, protection against attacks (e.g. hacking, DDoS), data breaches, maintaining logs, ensuring effectiveness for access control, etc. It is also important for considerations related to risk management and impact assessments (Art.32, Art.35, Rec.75, Rec.84, Rec.90–93).

One important consideration for Solid is that the GDPR requires Data Controllers to make informed assessments of a Processor’s security measures before engaging them. Depending on who implements and controls what within a Solid-based use-case, the various responsibilities may need to be carefully considered and established to ensure appropriate safeguards are in place and ensure their legal relevance and compliance. While there are existing guidelines in place regarding implementations for resources (e.g. hardware, software), the new considerations relate to whether users will have the responsibilities to assess such measures for Pod and resource providers based on existing common practices for provisioning cloud services - especially in IaaS paradigms18.

Accountability

The last principle mentioned is accountability (Art.5-2), which states - “The controller shall be responsible for, and be able to demonstrate compliance” with the other principles. Depending on the role (Controller or Processor), the accountability principle requires planning and operational management along with documentation of specific obligations from GDPR. For example, implementing policies for implementing ‘data protection by design and default’ (Art.25), appropriate contractual measures between Controllers and Processors (Art.29), ROPA records (Art.30), handling data breaches (Art.33, Art.34), Data Protection Impact Assessment (DPIA, Art.35), appointing a Data Protection Officer (Art.37–39), and utilising appropriate codes of conducts and certifications (Art.40–42). These obligations are required to be maintained and re-evaluated on an ongoing basis, and are typically integrated into an organisation’s management frameworks and policies. The records kept as part of these processes (e.g. ROPA) are utilised as the first avenues for inspections and audits into compliance practices.

The Solid specifications provide rudimentary measures for supporting such records, through registries of access requests, but which do not satisfy the full extent of requirements required by GDPR (e.g. ROPA Art.30). This is an important deficiency in cases where the user (or provider) has to maintain records associated with (other) resource providers and apps, either because they are the Controllers in such relationships or to audit and ensure accountability of other entities. Since the current state of specifications do not provide any avenue for even establishing who the other entities are (i.e. their legal identities and applicable jurisdictions), their legal compliance investigations face a steep setback. In the ideal case the required information is recorded, or available through accessible links (e.g. app’s metadata). However, if this is not the case, and there is some mischief or malice being conducted, there is little potential for resolving this since authorities may have to conduct lengthy investigations to even establish the entities and their roles within a use-case. Pragmatism therefore necessitates such identities and relationships be established and recorded upfront to ensure appropriate levels of accountability are maintained.

Controllers and Processors

GDPR Art.4 defines a controller as the entity that “alone or jointly with others, determines the purposes and means of the processing of personal data”, and a processor as the entity that “processes personal data on behalf of the controller”. Identifying who the controller(s) are in a given use-case is an essential task towards establishing who has the responsibility to fulfil GDPR’s various obligations. For this, as per the definitions, the determination of a purpose i.e. who decided what purposes should the data be processed for, and the means for carrying out that processing i.e. how it should be implemented.

The nuance between a controller and processor is an important one given the implications of additional obligations that come with being a controller. In most cases, even where a processor seems to be specifying the means for how processing takes place (e.g. use of a specific technolgy), it is essential to clarify that such operational decisions are permitted for a processor as long as limit their processing to what has been agreed in the contract with the controller [31]. Therefore, merely determining the technological implementation of processing if not sufficient to become a controller if such implementations are backed by appropriate contractual terms, and where the processor is not the one that determines what data should be processed, its purposes, legal basis, etc. - which are to be decided by a controller.

Typically under GDPR the controller is the entity that decides what data should be collected, maintains it, and operates on it for some purposes. In Solid, it is tempting to assume that because the user has control over how their data is stored and the ability to manage its access for external entities (as apps), they automatically become full-fledged controllers under the GDPR [31]. However, in this there are several important considerations and complications that require careful legal analysis and investigations to establish the exact responsibilities of the users as well as to avoid unnecessary burdens on individuals from a presumption of controllership.

Further, GDPR Art.2 and Rec.18 specify exemptions for processing undertaken by individuals as a “purely personal or household activity and thus with no connection to a professional or commercial activity”. This means that a user maintaining their own data for personal reasons is exempt from GDPR’s obligations, but only to the extent that such as activities are considered private. Existing case law [4][6] establishes that where such boundaries are crossed, e.g. by making data public or undertaking activities for commercial reasons (e.g. running a business), the exemption cannot be valid and the user is considered a controller for processing of their own data.

In Solid, the determination of who is the controller therefore is based on what entity determines how data should be used and how it should be processed. Where users control the implementation of pods and resources, they determine the means of storage (as a processing operation) - therefore they should be considered as controllers for the purposes of storing their information. Similarly, when users do not control the Pod infrastructure and do not have access to the underlying implementations, they should not be considered as controllers - but instead the Pod and resource providers who retain control should be specified as controllers. In this, we presume that Pod users are the data subjects, because if they are not, then based on their relationship to the data subject (e.g. parents) they may end up being (joint-)controllers irrespective of the control over underlying infrastructure.

The issue of Pod-based controllers is entirely separate from the use of data by apps, which will have their own separate considerations for determining controller. Even if users (as data subjects) retain the ability to dictate which apps are permitted to access the data, and have control so as to revoke such access at any time - this cannot be sufficient for specifying them as controllers. This is because the determination of purpose would typically be done by the apps, with the users merely acting to provide the required data. In cases where the users are involved in also specifying the purpose and/or how it should be implemented - such as with the CaaS paradigm in UC-Ix, the decision of who are controllers is based on degree of control the users and apps have in deciding what should be done as well as who/where it is implemented.

Where users (as data subjects) are deemed to be controllers, the exact implications of such a role are ill-defined due to the presumption of GDPR as well as its associated guidelines that the controller is an organisation. For example, it is entirely unfeasible for data subjects to be tasked with maintaining ROPA or records of their own consent, and it is outright nonsensical to imply that they should serve themselves with notices for transparency and accountability obligations. Therefore, where data subjects do become controllers, it should be taken as an indication of pointing out who is responsible for specific aspects of a processing operations, such as determination of storage or computation environments. This is no different than placing the responsibility of leaving a hotel room’s door unlocked to the person using a hotel room.

While the arguments laid here regarding controllers are simplistic and lack any in-depth legal investigation, the intention is to prove sufficiently that investigations of determining controllers within Solid-based use-cases is a complex topic that warrants further research and investigations. In particular, to avoid excessive burdens on data subjects in managing their own data, and to offer a clear path forward by using the identified use-cases to determine who should be the controllers and processors.

Exercising Rights

GDPR Art.12–22 provide information on several rights that a controller has to make available to the data subject. These relate to transparency regarding processing actors and specifics (Art.13, Art.14), providing access to data (Art.15, Art.20), rectification (Art.16) and erasure (Art.17) of data, restriction (Art.18) and objection (Art.21) to processing, and to not be subjected to automated decision making (Art.22). Of these, the rights related to providing access to data and data portability can be readily implemented using Solid since the Pod already stores the data. Other rights that relate to rectification and erasure can also be readily implemented by modifying the appropriate data within a Pod.

Exercising the rights related to information provision requires such information to be record and accessible, which is currently not possible using Solid. For example, Solid specifications do not support indicating the source of data (e.g. user or third-party), or the identities of apps that collect this information, categories of data, who the data will be shared with, and several other key pieces of information required for the transparency obligation and typically covered by privacy notices. Similarly, exercising the rights related to restriction of processing, or objection to legitimate interests, or objecting to automated decision making also require corresponding support from Solid implementations.

While it is always possible to provide information and exercise these rights outside of the Pod, e.g. on an app’s website, these rights are closely tied to the access to data within a Pod. For example, when the user exercises their right to restrict processing based on perceived unlawfulness of an app, this can be the equivalent of revoking any access to data even if there were other legitimate and lawful uses of that data by the same app. Without communicating such decisions (i.e. access revocation happened because a right has been exercised), the app (or more importantly the app’s controller) may not be able to distinguish between revocation of consent and exercising of rights related to objection or restriction.

Cross-border Data Transfers

GDPR operates under the EU regimes where territorial scope is established for the free-flow of data within the EU. Any data transfer outside of these jurisdictions (called ‘third countries’) requires a corresponding legal basis from Art.45 (adequacy decision), Art.46 (data transfer safeguards), or in their absence the use of explicit consent, contract, or other legal bases as per Art.49. To assist with the determination of whether these obligations apply, it is essential to identify and record the locations associated with data processing – including storage, computation, and communication.

In Solid, a Pod represents a storage resource stored at one or more locations, and which are accessed by apps. Depending on whether the Pod or app’s resources involve locations outside the EU, and do not utilise any of the Art.45, Art.46, and Art.49 provisions, the resulting situation can be problematic with severe impacts to the ability of users to initiate an enforceable complaint (e.g. a non-EU based entity) and to exercise their rights (e.g. entites do not support EU rights). Such determinations require information about locations associated with the controllers and processors associated with the Pod, resources, and apps – which are currently not supported by the Solid specifications.

Handling Data Breaches

GDPR Art.4-12 defines a data breach as “a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed”. The EDPB’s guidelines on data breaches outline breaches to have one or more of the following categories based on whether there is an unauthorised or accidental action on data regarding: (1) confidentiality - disclosure or access; (2) integrity - alteration; and (3) availability - loss of access or destruction. GDPR Art.33 and Art.34 specify obligations for a controllers and processors to notify data breaches to the relevant controllers, authorities, or data subjects without undue delay, along with information about the extent of breach in terms of affected data categories, consequences, and mitigation measures being undertaken.

In the case of Solid Pods, a breach can happen at the underlying storage resource, the software environment hosting the Pod (e.g. operating system, web server), the processes associated with implementing a Pod feature (e.g. database), or through access granted to users and apps. This can be for resources managed by users (e.g. IaaS, CaaS) or without awareness to the users (e.g Saas). In cases where the Pod is managed by a provider, with the user not having control over the implementations, the handling of data breaches is the responsibility of the resource providers. However, where the user manages their own infrastructure and software implementing a Pod, the responsibility of handling data breaches also rests with them.

It is important to note the distinction between GDPR’s notion of responsibility and that of liability. Under GDPR, a responsibility means the duty to carry out an activity, such as to notify the authorities when a data breach takes place. By contrast, a liability is a typically pecuniary obligation that seeks to determine who is at fault. The handling of data breaches also includes ensuring appropriate technical and organisational measures are in place to avoid such breaches taking place, to minimise their consequences should they take place, and to have plans for addressing their impacts for when they do take place.

For users implementing their own Pods (e.g. IaaS), this may mean additional duties for ensuring the appropriate security measures are in place, and to carefully assess all of their software’s known vulnerabilities. While this may seem burdensome, this is no different than managing a personal server. Except that in the case of Solid, new attack surfaces emerge, such as cases where an app or malicious actor is provided access to all data on a Pod – through technical bugs, system vulnerabilities, or by convincing the user via social engineering and phishing attacks. Current Solid specifications have indicated some awareness of technical risks associated with breaches and security in general, but lack similar addressing of personal and social risks that lead to data breaches.

Existing Issues That Also Affect Solid

Transparency and Comprehension of Information

One of the biggest issues for privacy is that individuals do not understand what is happening in terms of who the actors are and how they are using their data [40]. The cliche of lengthy ‘privacy policies’ and privacy notices using complex legal language has been well-studied and analysed, with several approaches proposed to mitigate these through use of information extraction, summarising, and visualisation [41]. In addition, laws such as the GDPR are increasingly influential in determining the contents and provision methods of such information. This has been taken advantage of by developing methods that rely on legally required information being present within a document [41][43]. Communities have taken these approaches further by pooling and crowdsourcing information about privacy practices [44]. However, several issues persist - such as the information being (still) difficult to comprehend, understanding it in a contextually relevant manner [45], and their deviation from actuality of data processing activities [45][47].

In the case of Solid, first, there is no consistent or clear method for how users should be provided with this information. Apps are supposed to have a profile that may contain a link to their policy, but there is no acknowledgement of what such policies should feature or how these relate to legal obligations. Combined with a lack of record on who is the actual legally-accountable entity accessing their data , users are completely at the mercy of an potentially unknown entities with no transparency and accountability.

Where apps need to request access to data, it is unclear as to the necessity of providing information as a precursor to make the decision. This is a vital requirement where the basis for such data requests is consent - which Solid specifications also do not elaborate upon. The result is a vague guidance on apps being required to ask permission to use data without any oversight on how that permission is sought or its validity in being considered informed consent. Even if such information were provided to the users via a website or other document in full compliance with the law, the Solid specifications do not acknowledge or provide support for users to be provided with this information in a manner that does not repeat the existing issues of information overlead. Solid also does not record relevant information pertaining to such informed decisions, such as the location of where the user expressed their consent (e.g. a website), or links to privacy notices, or receipts [48] that users and tools can utilise later.

Manipulative and Deceptive Practices in Consenting

Further to issues regarding transparency and comprehension of information, the existing issues related to consenting being invalid, unfair, or outright deceptive also apply to Solid. For example, by keeping control of data with users, but control of the requesting mechanism with apps, the determination of what should be present in requests and choices offered to users is entirely controlled by external entities (e.g. app providers). This effectively has potential for existing wide-spread issues that take advantage of power imbalance with users to exploit them via various means – all stemming from control of consenting mechanisms. Well-studied, wide-spread, and demonstrably illegal examples of such practices are:

  • Dark Patterns: using UI/UX design to nudge, coerce, and manipulate users [46], [47], e.g. clicking ‘Accept All’ because other options are hidden and require time to exercise or configure, or using pre-configured choices, or nagging when consent is refused.

  • Consent Walls: witholding features and services until users give consent to excessive data use and sharing [47].

  • Hiding Impact and Extent: using UI/UX design along with user’s lack of domain knowledge to hide the true extent and impact of consent – such as where accepting will result in sharing of data and profiling by thousands of companies [45], [47], or that it involves sensitive or special data categories.

  • Assumption of Consent: where consent is either entirely assumed (i.e. users are not even asked), or where the choices made by the users differ from what the controller records as consent (e.g. controller may record ‘Accept All’ even when users have selected only some options) [45][47].

Since Pods do not keep records of what the purposes, data, and context (e.g. duration, frequency) of consent is, or where users can find this information, users have no means to contact an application or exercise their rights - such as to withdraw their consent or to ask for restriction in processing. In this, it is critical to understand that Solid’s use of access control authorisation cannot be treated as a fair equivalent of consenting and withdrawal on its own. This is because of the following reasons:

  • access control only governs access to data, therefore the permission it governs is only restricted to access of that data. The Solid specifications do not mention any requirement to interpret revocation of access to also restrict further processing of data by an app.

  • signalling consent withdrawal – revocation of access can be for several reasons, for example the user wants to stop providing access to that data, or they have identified a problem with an app, or some associated duration for the access has expired (e.g. access is valid for 3 months and must be confirmed periodically). By not distinguishing which of these decisions has resulted in the revocation of access, both users and apps are unaware of what has happened as well as what steps must be taken next. Therefore, where the user has decided to withdraw their consent, this should be a distinct action to enable the appropriate legal obligations to be triggered.

  • communicating withdrawal to third parties – users may give consent to more than just the app when sharing their data. For example, if the consent allows the app to access data, and to further share it with others, merely revocation will only be visible to the app. Since there was no record of what entities are involved in consent, the users cannot signal these entities itself, nor can they ask the app to further communicate their withdrawal.

  • granular withdrawal for some purposes – GDPR requires consent to be granular in regards of purposes, i.e. separate purposes must have separate consent, such that consenting and withdrawal can be managed in isolation for each separate purpose. In the case of Solid, there is no indication of purpose when accessing data. While separate Access Needs can be managed using separate identifiers for security, these concepts do not cover purposes for how the data may be used external to the pod. So revocation of an access need may be result in the necessary and corresponding withdrawal to other associated purposes the user agreed to when granting consent. Further, the specifications do not clarify the behaviour for when two authorisations govern the same data and one is revoked.

GDPR’s legal basis are strictly and rigidly interpreted in terms of their validity towards justifying processing. For example, it would be in violation of GDPR if a Solid app uses contract or legitimate interest where the legal basis should have been consent. In addition, a Solid app’s requests for consent should be separate from other matters and requests. This creates a potential for repeating existing problematic instances where controllers misuse legal basis (e.g. legitimate interest instead of consent [45][47], [49]) and where users are not aware of other uses of data because of the notice for consent also containing information about other legal bases . In Solid, since there is no concept of legal basis or consent, and all access requests are treated as a singular combined transaction, it is entirely possible for the consenting request to have specified that the data would be used for additional purposes through a separate legal basis, and which the users do not realise or misinterpret because of their understanding that Solid enables control of data regardless of legal basis.

Hidden Actors Exploitation

Even where entities across both side, e.g. users and app developers, are well-behaved, their use of other third-party vendors and services can be a source of unintended or negligent problems. For example, websites using a Consent Management Platform (CMP) to manage consent for their services have shown problematic uses where the CMPs decide what data is collected and use it for their own purpose [47], [49]. While this is manifestly illegal under GDPR, as seen prolificly for misuses arising from real-time advertising and IAB’s TCF framework [45], users and app developers may not be aware of such situations until the data has already been accessed and shared. In Solid, there are no checks and balances in terms of what an ‘app’ actually is in terms of legal entities, or a thorough analysis of the extent to which its design and mechanisms have issues that enable third-parties to cause such mischief.

Risk Management - Data Sensitivity

In a Solid Pod, there are no distinctions between data in terms of origin, sensitivity, legal obligations (e.g. mandatory information), or association with specific categories (e.g. to say data at a specific URL is health data). This is a problem because while neither users nor apps have this information, an app that uses data that is knowingly or unknowingly sensitive or special category may end up resulting in security issues or legal compliance violations. Such lack of awareness regarding data also enables problematic actors to operate in a hidden manner, for example, hypothetically - if a fitness device was storing data in a Pod regarding movement logs, an app accessing this data for the purposes of showing statistics on how active the user is may find and exploit the location data present in logs in order to surveil and profile the user and to sell this information to third-parties without the users knowledge [50], [51] – which are illegal under GDPR but not effectively enforced. The ambiguity on what access to data via a URL exactly means in terms of data categories further complicates the situation in the app’s favour as they can claim the user is the responsible entity to ensure only needed data was transferred and the Solid specifications do not place any restrictions on access being based on specific data categories or requiring them to ‘clean’ or ‘filter’ data to ensure no additional information is provided. Combined with potential misuses from information obfuscation and dark patterns, the app may be able to further hide its activities and claim the user has consented to them [45].

Not knowing what data categories are involved and their sensitivity also has implications for the risk management and mitigation for all entities involved. If a Pod is provisioned using PaaS or SaaS, then the provider may be obliged to provide additional appropriate features if the use-case requires sensitive data (e.g. health data). Without such knowledge, providers only have to support the bare minimum security and risk features, and may put the responsibilities and burdens on to users to manage their own data related risks.

Tracking and Profiling

Tracking and profiling are severe problems for web-enabled platforms and devices as they provide a convenient method for information exchange. Techniques such as fingerprinting [52] are commonly used to identify and individuals, and are associated with identifiers to profile them. Purposes of such activities can be justifiable e.g. combat fraud, or insidious e.g. to sell profiled data to third-parties. On the web, the use of cookies enables sharing of identifiers across providers, while on smartphones the device manufacturers themselves provide a unique identifier to enable such tracking and profiling.

In Solid, given that storage is offered without any oversight of what data is stored, who creates it, and why somebody uses it, it is possible that applications and actors develop new forms of fingerprinting and tracking techniques based on Solid’s functionalities. For example, applications or the advertising libraries they use may develop a well-formed URL such as /userID to store and retrieve data associated with identifiers and profiles. This can be cleverly obfuscated so that it remains hidden from users. In addition, the access of all kinds of data within a Pod can further enhance the profiling activities.

Lack of Effective Control via Limitations on Data Exploitation

The mission of Solid is to give control of data back to individuals. But if apps do not store data in usable forms, such as by utilising proprietary format, encrypting it, obfuscating it, or only storing partial data within the Pod – then the users have the data but no control to take advantage of it. While the GDPR tries to offset this through its right to data portability that requires data to be provided using machine-readable and commonly used formats, the actuality is that this right is not respected correctly [53], [54]. Solid Pods provide a convenient method to retrieve and store data using GDPR [24], but without appropriate methods to ensure the data is actually useful and usable to the user and other apps, a Pod becomes limited to the user acting as a controller for the storage of information, while only some or apps with the ability to understand the data are able to benefit from it.

A limited variant of this is where market actors create ecosystems based on availability to data, such as through APIs and restrictions on how data is accessed, which may not empower the user at all. For example, consider smartphone applications that are entirely hosted within the user’s personal device and also store their data on device, but where the user can neither access that data nor enable other apps to use it. This is designed as a security measure, with the operating system determining limited forms of interoperability for data (e.g. contact book, messaging, camera). If applied to Solid, this would mean Pods that are entirely controlled by providers, with limitations on how apps can use the data implemented as part of the Pod (e.g. provider’s APIs), and with users only having limited capability to store their data and accept its use by applications – similar to how smartphones operate today.

The interpretation of GDPR roles and obligations is a complex affair which takes time to go through the legal processes since authorities require certainty in their investigations before announcing violations, and the subsequent decision-making by courts (and higher courts) takes a lot of time. This issue is combined with lack of available resources for authorities (e.g. not being provided with required funds), and lack of domain expertise for increasing complexities in use-cases and technology means GDPR enforcement has identifiable and systematic problems [55]. This is not to say the law is ineffective or should not be followed – on the contrary, GDPR has resulted in benefits to individuals by raising the transparency of information and creating new rights that empower individuals [55], [56].

For Solid, the existing issues with enforcement are made more severe because of deviation from established domain and GDPR terminology - which first requires efforts such as this article to establish how GDPR should be interpret for Solid, and further because there are no systematic designs or implementations related to oversight, accountability, and technical/organisational measures – which requires legal compliance to be investigated and applied from basic principles.

Burden on Users to Manage Privacy

After all other issues, even where everything is valid in terms of laws and social norms, the resulting responsibilities for users to investigate each request and subsequent data uses as well as to manage their own data and its security/privacy can end up becoming a burden. Such cases are common in the form of ‘decision overload’ [57] whereby users choose seemingly self-detrimental outcomes because of perceived non-contextuality and the lack of relevant controls. Solid’s use of inconsistent terminologies, lack of user-side tools and services, and inability to have effective policies and agents that operate for and on behalf of the user increases the possibilities for users to perceive management of Pods as a burden, and creates opportunities for other issues to be exploited. It is not necessary to solve this issue only through technological means, such as developing an automated policy-reasoning service that acts on the user’s behalf. Instead, other forms of socio-technical approaches should also be considered that take advantage of the human-centricity of technologies. For example, crowd-sourced management of privacy concerns [44], [58], establishment of codes of conduct for Solid, and creating open app stores with vetting processes.

Path Forward Towards Responsible Innovation

Establish Consistent Vocabulary

Currently the Solid specifications present new terms and concepts that do not align well with existing well-established domains and regulations. For example, Solid’s use of ‘owners’ is inconsistent with cloud services where ownership and client/customers are well defined concepts with legal relevance and interpretations. Similarly, Solid’s use of ‘Agent’ is ambiguous and lacks relating Pods, resources, data, and apps to legal roles such as Controllers, Processors, and Data Subjects - which are necessary to understand and establish responsibilities and accountability within use-cases. Similarly, Solid’s ‘policies’ currently only contain rudimentary access control constraints, and lack the nuances and extent required for understanding and managing data practices based on legal (e.g. legal bases, purposes) and social norms (e.g. sensitivity, risks).

To resolve this disparity, we strongly recommend the establishment of a consistent vocabulary for expressing Solid’s use-cases in terms of actors, roles, processes – and to use these to define obligations, requirements, and other constraints on conformance. The outcome of this should offer clarity regarding implementations in terms of which entities are involved, how to hold them accountable, and to take advantage of existing legal obligations and rights. This certainty will assist all stakeholders (individuals, companies, authorities) in establishing how Solid should be used as a legally-compatible paradigm and enable realisation of its intended benefits.

Specifically, we envision the following concepts as needing consistent vocabularies:

  • Resources: Pods, storage (e.g. disks), computation, etc.

  • Entities: Providers, consumers, users, etc. for Pods, resources, data, apps, identity.

  • Legal Roles: Controllers, Processors, Data Subjects, Authorities

  • Agreements: Consent, Contract; but also agreements associated with provisioning Pods, resources, data, and apps. This can also be extended beyond current conventions, such as by enabling users to have preferences and requirements, or using policies to establish agreements with apps and services on the use of data.

  • Notices: for Pods, resources, data, apps, services, user, along with information on context such as specifics, ex-ante or ex-post, provider and recipient.

  • Data: establishing data categories for clarity of what data is actually being utilised, indicating sensitivity of data to understand risks and necessity of security, establishing when special categories of personal data might be involved to better understand impacts and legal obligations. In addition, separating data based on whether it is related to users, use of apps (e.g. configurations), pod management (e.g. data registries, app authorisations).

  • Processes: related to management and use of Logging, Policy Management, Identity Management, Data Management, Network Management, Data Storage, Compute, Data Query, App Management, Management.

  • Security: related to what security measures are in place for Pods (e.g. firewalls), resources (e.g. access control), data (e.g. encryption), apps, and users.

  • Logs: maintaining logs related to data (e.g. store, access, modify, source), apps (requests, authorisations), policies (agreements, preferences), identity (e.g. users actions), and security.

In creating these concepts, existing vocabularies can be utilised or extended to avoid re-creating entirely new terms with unknown interpretations. Sources for these can be ISO standards such as those related to cloud service, or legal thesauri such as that established within EU and based on GDPR, or community efforts such as Data Privacy Vocabulary19 (DPV) [17], [59], [60]. This step will enable common approaches to be discussed and developed by relevant communities, and is a precursor to enable approaches mentioned in following paragraphs that require machine-readable information for automation.

Solid’s deviation from established legal terminology creates difficulties in the interpretation and application of specific jurisdictional regulations, such as GDPR’s requirements to establish purposes and controllers. To avoid such differences from creating obstacles, along with using established vocabularies, a Solid implementation must also be capable of offering clarity on how it relates to specific legal compliance concepts and obligations. For example, when a user accepts use of an app from a provider, it should be able to understand the involved controllers and processors, as well as what rights are available and the information to exercise them. Similarly, it should be clear what legal basis and purposes the app is requesting to use the data for, what will happen once the data leaves the Pod (e.g. sharing with third-parties), and if there are any sensitive categories that the user should be aware of.

In order for this information to be available and accessible, it should be mandatory for an app to provide this information, either with the request or through a linked resource, in machine-readable form. An app that does not satisfy this requirement should not be permitted to initiate a request, or it should be discriminated as being untrustworthy. This information, and other relevant records should be maintained as logs within the Solid Pod. For example, logs recording the creation of an authorisation decision along with specifics of entities involved, and its scope (e.g. purposes, data categories).

It should be clarified within the specification, that an app’s request to use data and the corresponding authorisation is a form of consent, and therefore requires the necessary legally obligated criterion to be satisfied to be considered valid. Where this is not the case, the separate legal basis (e.g. contract) should be recorded and handled accordingly. For specifying that a Pod or an app requires GDPR compliance, the vocabulary should enable this information to be supplied in machine-readable form, such as part of the app’s request, or a Pod’s metadata (e.g. as region:EU). Where the legal basis is consent, the withdrawal of consent should be communicated to the app, either initiated by a Pod (e.g. sending a signal to the app’s specific URL) or indicated the next time an app accesses data (e.g. reinterpreting HTTP status code 451 Unavailable for Legal Reasons). In either case, it is vital for there to be no ambiguity in the indication that consent has been withdraw, so that the appropriate GDPR obligations are triggered regarding halting processing of data outside the Pod (if any) and communicating the withdrawal to other parties.

Enable Use of Policies

The current status-quo is the situation where all decision-making power is concentrated with controllers and service providers, because they are ones that decide what data will be processed and for which purposes. The individuals on the receiving side of requests only have the options to either agree to given choices, or lose out on whatever features are being provided. In addition, this means it is always the users who have to perform compatibility and risk assessments for a given request based on what is acceptable and necessary for them regarding privacy and security. It would be useful and empowering to both users, if they had mechanisms that assist them in making such decisions by taking their pre-configured choices and matching them with an app’s request to determine compatibility [16], [17].

A hypothetical implementation of such an assistive and decision-support systems can be created using two kinds of policies represented in Solid – (1) user’s preferences (that MAY be satisfied); and (2) user’s requirements (that MUST be satisfied). When an app request is initiated, the Pod would then check it for compatibility with requirements where no deviation is possible, and with preferences where some conditions may not be satisfied. The result of this (e.g. all requirements satisfied, some preferences not satisfied) can then be presented to the user (e.g. as part of the request, or in their dashboards) to help indicate whether the app satisfies their privacy requirements and to ease the burden of checking the app’s data practices.

For such policies to be effective, it is also necessary for apps to provide corresponding information as part of the request. The current norm is that an app provides a vague summary of information within a privacy notice [47], while the privacy policy page provides a large amount of information that is difficult to interpret for a specific request [42]. Both of these issues can be address by making it mandatory for an app to issue a request using machine-readable information, and to require that this information be limited to what the request is about. For example, rather than requiring the user to agree to all possible purposes that an app can implement across all of its services, the request must contain only the purposes and data required for specific services associated with the request. Such constraints can also encourage the use of dynamic consent or just-in-time consent that does not overburden users with excessive consenting at the start, but instead asks for consent in a partial and modular manner based on triggers such as start of a feature or service use [61].

The role of policies goes beyond user assistance in decision-making as they are also helpful to manage access control authorisations in terms of generating and checking them for validity, and for logging data accesses in terms of entities and purposes. In the new model where policies are used to establish an user’s preferences, requirements, and an app’s requests, the access control authorisations are generated as a result of a successful agreement being reached between the user’s and app’s policies by their respective agents. The logs for access control therefore only need to indicate under which agreement the app is requesting data, or to be more explicit - for which purposes the data is being accessed is indicated by specifying the agreement under which that purpose was agreed upon (note: this is possible because GDPR requires separate consent and hence separate agreements for distinct purposes).

The usefulness of policies over access control authorisations is also evident from policies permitting more complex conditions to be expressed and checked, such indicating requirements that any health data (categories) should only be used for medical research (purpose) by non-profit organisations (entities) within EU (location, jurisdiction). Policies are also more transparent and accountable forms that can be preserved for the user to introspect at their leisure. For example, users can revisit their decision to share some data with a specific app based on detection of corresponding changes in their preferences and requirements.

Programmatic and Machine-readable Notices

One of the issues we highlighted that is also applicable to Solid use-cases involves misuse of notices in consenting and transparency obligations. This is possible because such notices are generated by controllers who have incentives to maximise the user’s acceptance of choices regarding collection and sharing of their data. To avoid such manipulations and exploitation from taking place, a radical solution to shift the control of notices away from service providers and on to the user’s who have to make decisions [62]. In such cases, the notices are generated on the user- or client-side (i.e. by the Pod), and use information provided by the controller (i.e. in machine-readable form). While this has ample benefits for users in terms of customisation of content, interface, personalisation using preferences and requirements, and recommendations based on wisdom (e.g. community voted guidelines), it is detrimental to companies who would want to retain control of notices so as to have the ability to tweak their notices for purposes of marketing (i.e. for legitimate purposes). In such cases, a compromise would be to provide granular degrees of controls to controllers with the notice ultimately still being generated on the client-side [62].

User-side Risk Management

Another common norm that works against the interest of the users is that they are unable to assess the risks and impacts associated with a given request, either because of lack of knowledge or because the necessary information is complex or hidden from them. With decentralisation, use of machine-readable requests, and programmatic notices, it is also possible Pods to provide risk assessment and management features for their users. For example, by detecting and highlighting that a given request involves use of sensitive data, the notice will be generated with corresponding indication of higher risk. Another example, where the user-agent assists with handling requests by performing risk assessments, and identifying problematic patterns such as the final agreement involving excessive data being collected, or data being share with too many third-parties, and to suggest corresponding changes to preferences and requirements in order to mitigate risks, such as by deselecting purposes that involve excessive third-parties. The patterns can be established based on common interoperable vocabularies being used in policies, and using the existing state of the art as well as crowd-funded lists as sources. In this manner, the users get the convenience of not requiring excessive investigations and decision-making with the assurance of their privacy not suffering as a consequence.

Concluding Remarks

In this article, we established how to interpret Solid first as a cloud-based technology, and then in terms of different functionalities. Through these, we provided a framework to assist in representation of Solid-based use-cases and established how existing cloud services and their providers are associated with its implementations. This enables assessing the resulting behaviour of Solid Pods in terms of what resources are involved, who are the entities associated with them, who has control, and who is accountable, and what freedoms and capabilities are exhibited by a Pod. We explored such implications of some use-cases by considering the different arrangement of entities and resources, and determined the resulting implications on data control retained by the individual. This article thus establishes a framework for investigating whether a given use-case or implementation does indeed fulfil Solid’s vision to empower individual through data sovereignty, or continues current practices and its problems by only marginally changing the data storage mechanism.

We then investigation how GDPR applies to Solid by interpreting its principles in terms of Solid’s concepts and implementations. Our analysis shows a clear and severe deficiency between Solid’s specifications and the concepts, compliance obligations, and enforcement as envisioned by GDPR. More critically, we found potential difficulties in investigations of GDPR compliance for Solid-based use-cases due to lack of support in Solid specifications or its implementation in considering basic accountability and responsibility mechanisms afforded by legal processes. We also identified the necessity for further exploring the determination of controllers and processors based on the roles of entities within a Solid use-case regarding determination of purposes and retention of control. This is separate from the issue of apps’s being associated with controllers - which we highlight as a problem due to current lack of requirements for apps to declare who operates them. We have also highlighted the impacts arising from Solid in terms of exercising rights, cross-border data transfers, data breaches, and enforcement of GDPR’s principles.

Finally, we found that there are severe problems present in conventional use-cases that have known GDPR compliance violations, and which are also applicable to Solid. Their severity was increased due to lack of certain processes within Solid which make it difficult if not impossible to apply known remedies. We discussed potential approaches that can resolve these problems, both using known and novel research and developments, with a specific focus on how Solid itself can be further developed or extended, and through use of both technical and socio-technical innovations. To argue for their feasibility, we included relevant associations with existing efforts for each argument.

Thus, through this article, we have presented our findings regarding making sense of Solid in terms of how to first understand the use of Solid within a use-case, its implications on abilities and control by various actors, and to then investigate its legality in terms of GDPR. Going beyond merely pointing out problems, we also discussed what should be developed to address specific problems. Our arguments point towards the necessity for first developing interoperable vocabularies as the basis for enabling all other automated approaches, and to then utilise these as conformance criteria that is mandatory for Solid’s resources to exhibit in implementations. Further, we identify specific tools and processes that can be developed to assist the user in their decision-making and to inspect their data use, while also suggesting the use of communities to mitigate burdens on individuals. Through these, we hope to have provided a path based on pragmatism and responsible development that can take Solid towards the realisation of its vision.

Funding Acknowledgements

This work has been made possible through a Short Term Scientific Mission (STSM) grant from COST ACTION CA19134 Distributed Knowledge Graphs (DKG) - funded by the Horizon 2020 Framework Programme of the European Union. The author (Harshvardhan J. Pandit) has received funding from the ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant 13/RC/2106_P2.

References

[1]
E. Mansour et al., “A Demonstration of the Solid Platform for Social Web Applications,” in Proceedings of the 25th International Conference Companion on World Wide Web - WWW ’16 Companion, 2016, pp. 223–226, doi: 10.1145/2872518.2890529.
[2]
“Solid Technical Reports.” https://solid.github.io/specification/.
[3]
“Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation),” Official Journal of the European Union, vol. L119, May 2016.
[4]
L. Edwards, M. Finck, M. Veale, and N. Zingales, “Data subjects as data controllers: A Fashion(able) concept?” Internet Policy Review, Jun. 2019.
[5]
H. Janssen, J. Cobbe, and J. Singh, “Personal information management systems: A user-centric privacy utopia?” Internet Policy Review, vol. 9, no. 4, Dec. 2020, doi: 10.14763/2020.4.1536.
[6]
H. Janssen, J. Cobbe, C. Norval, and J. Singh, “Decentralized data processing: Personal data stores and the GDPR,” International Data Privacy Law, vol. 10, no. 4, pp. 356–384, Jan. 2021, doi: 10.1093/idpl/ipaa016.
[7]
“Solid Protocol.” https://solidproject.org/TR/protocol.
[8]
“Solid WebID Profile.” https://solid.github.io/webid-profile/.
[9]
“Web Access Control.” https://solid.github.io/web-access-control-spec/.
[10]
“Access Control Policy (ACP).” https://solidproject.org/TR/acp.
[11]
“Solid Application Interoperability.” https://solid.github.io/data-interoperability-panel/specification/.
[12]
The Flemish Data Utility Company,” www.vlaanderen.be. https://www.vlaanderen.be/digitaal-vlaanderen/het-vlaams-datanutsbedrijf/the-flemish-data-utility-company.
[13]
S. Van Damme, P. Mechant, E. Vlassenroot, M. Van Compernolle, R. Buyle, and D. Bauwens, “Towards a Research Agenda for Personal Data Spaces: Synthesis of a Community Driven Process,” in Electronic Government, 2022, pp. 563–577, doi: 10.1007/978-3-031-15086-9_36.
[14]
R. Buyle et al., “Streamlining governmental processes by putting citizens in control of their personal data,” in Proceedings of the International Conference on Electronic Governance and Open Society: Challenges in Eurasia, 2019.
[15]
S. Verbrugge, F. Vannieuwenborg, M. Van der Wee, D. Colle, R. Taelman, and R. Verborgh, “Towards a personal data vault society: An interplay between technological and business perspectives,” in 2021 60th FITCE Communication Days Congress for ICT Professionals: Industrial Data Cloud, Low Latency and Privacy (FITCE), 2021, pp. 1–6, doi: 10.1109/FITCE53297.2021.9588540.
[16]
G. Havur, M. Sande, and S. Kirrane, “Greater Control and Transparency in Personal Data Processing:” in Proceedings of the 6th International Conference on Information Systems Security and Privacy, 2020, pp. 655–662, doi: ggxswk.
[17]
B. Esteves, H. J. Pandit, and V. Rodríguez-Doncel, ODRL Profile for Expressing Consent through Granular Access Control Policies in Solid,” in 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS PW), 2021, pp. 298–306, doi: gnck5x.
[18]
L. Debackere, P. Colpaert, R. Taelman, and R. Verborgh, “A Policy-Oriented Architecture for Enforcing Consent in Solid,” in Companion Proceedings of the Web Conference 2022, 2022, pp. 516–524, doi: 10.1145/3487553.3524630.
[19]
B. Esteves, V. Rodríguez-Doncel, H. J. Pandit, N. Mondada, and P. McBennett, “Using the ODRL Profile for Access Control for Solid Pod Resource Governance,” in The Semantic Web: ESWC 2022 Satellite Events, 2022, pp. 16–20, doi: 10.1007/978-3-031-11609-4_3.
[20]
I. Akaichi, “Semantic Technology based Usage Control for Decentralized Systems.” arXiv, Jun-2022 [Online]. Available: https://arxiv.org/abs/2206.04947
[21]
C. H.-J. Braun and T. Käfer, “Attribute-based access control on solid pods using privacy-friendly credentials,” in Proceedings of poster and demo track and workshop track of the 18th international conference on semantic systems co-located with 18th international conference on semantic systems (SEMANTiCS 2022) ed.: U. Şimşek, 2022, p. 5.
[22]
M. Jesús-Azabal, J. Berrocal, S. Laso, J. M. Murillo, and J. Garcia-Alonso, SOLID and PeaaS: Your Phone as a Store for Personal Data,” in Current Trends in Web Engineering, 2020, pp. 5–10, doi: 10.1007/978-3-030-65665-2_1.
[23]
R. Dedecker, W. Slabbinck, J. Wright, P. Hochstenbach, P. Colpaert, and R. Verborgh, “What’s in a Pod?” Oct. 2022.
[24]
G. De Mulder, B. De Meester, P. Heyvaert, R. Taelman, A. Dimou, and R. Verborgh, PROV4ITDaTa: Transparent and direct transferof personal data to personal stores,” in Companion Proceedings of the Web Conference 2021, 2021, pp. 695–697, doi: gmjqg6.
[25]
B. Esteves, V. Rodriguez-Doncel, and R. Longares, “Automating the response to GDPR’s Right of Access,” in 35th International Conference on Legal Knowledge and Information Systems (JURIX 2022), 2022, p. 6.
[26]
D. De Bot and T. Haegemans, “Data Sharing Patterns as a Tool to Tackle Legal Considerations about Data Reuse with Solid: Theory and Applications in Europe,” Digita Research Report DGT_1. Digita, pp. 1–25, Jan-2021.
[27]
C. Esposito, O. Hartig, R. Horne, and C. Sun, “Assessing the Solid Protocol in Relation to Security & Privacy Obligations.” arXiv, Oct-2022 [Online]. Available: https://arxiv.org/abs/2210.08270
[28]
TechDispatch #3/2020 - Personal Information Management Systems | European Data Protection Supervisor.” https://edps.europa.eu/data-protection/our-work/publications/techdispatch/techdispatch-32020-personal-information_en, 2020.
[29]
14:00-17:00, ISO/IEC 17788:2014 Information technology Cloud computing Overview and vocabulary,” ISO. https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/06/05/60544.html.
[30]
“Cloud Computing Risk Assessment,” ENISA. https://www.enisa.europa.eu/publications/cloud-computing-risk-assessment.
[31]
EDPS Guidelines on the concepts of controller, processor and joint controllership under Regulation (EU) 2018/1725,” European Data Protection Supervisor (EDPS), Nov. 2019.
[32]
14:00-17:00, ISO/IEC 22123-1:2021 Information technology Cloud computing Part 1: Vocabulary,” ISO. https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/08/03/80350.html.
[33]
14:00-17:00, ISO/IEC 19944-1:2020 Cloud computing and distributed platforms - Data flow, data categories and data use Part 1: Fundamentals,” ISO. https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/07/95/79573.html.
[34]
ISO/IEC, ISO/IEC 29184:2020 Information technology – Online privacy notices and consent.” International Standards Organisation (ISO), Jun-2020.
[35]
14:00-17:00, ISO/IEC 7498-1:1994 Information technology Open Systems Interconnection Basic Reference Model: The Basic Model,” ISO. https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/02/02/20269.html.
[36]
R. Verborgh et al., “Triple Pattern Fragments: A low-cost knowledge graph interface for the Web,” Journal of Web Semantics, vol. 37–38, pp. 184–206, Mar. 2016, doi: 10.1016/j.websem.2016.03.003.
[37]
H. Janssen, J. Cobbe, C. Norval, and J. Singh, “Personal Data Stores and the GDPR’s lawful grounds for processing personal data,” in Data For Policy, 2019, doi: gj7g8j.
[38]
“Guidelines 05/2020 on consent under Regulation 2016/679,” European Data Protection Board (EPDB), May 2020.
[39]
M. Finck and F. Pallas, “They who must not be identifieddistinguishing personal from non-personal data under the GDPR,” International Data Privacy Law, vol. 10, no. 1, pp. 11–36, Feb. 2020, doi: ggzp5p.
[40]
M. Veale and F. Z. Borgesius, “Adtech and Real-Time Bidding under European Data Protection Law,” German Law Journal, vol. 23, no. 2, pp. 226–256, Mar. 2022, doi: 10.1017/glj.2022.18.
[41]
H. Harkous, K. Fawaz, R. Lebret, F. Schaub, K. G. Shin, and K. Aberer, “Polisis: Automated analysis and presentation of privacy policies using deep learning,” in 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 531–548.
[42]
M. Kretschmer, J. Pennekamp, and K. Wehrle, “Cookie Banners and Privacy Policies: Measuring the Impact of the GDPR on the Web,” ACM Transactions on the Web, vol. 15, no. 4, pp. 1–42, Jul. 2021, doi: gmjqg4.
[43]
M. Degeling, C. Utz, C. Lentzsch, H. Hosseini, F. Schaub, and T. Holz, “We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy,” in Proceedings 2019 Network and Distributed System Security Symposium, 2019, doi: gfxgxm.
[44]
“Terms of Service; Didn’t Read.” https://tosdr.org/.
[45]
M. Veale, M. Nouwens, and C. Santos, “Impossible Asks: Can the Transparency and Consent Framework Ever Authorise Real-Time Bidding After the Belgian DPA Decision?” Technology and Regulation, vol. 2022, pp. 12–22, Feb. 2022, doi: 10.26116/techreg.2022.002.
[46]
M. Toth, N. Bielova, and V. Roca, “On dark patterns and manipulation of website publishers by CMPs,” Proceedings on Privacy Enhancing Technologies, vol. 2022, no. 3, pp. 478–497, Jul. 2022, doi: 10.56553/popets-2022-0082.
[47]
C. Santos, N. Bielova, and C. Matte, “Are cookie banners indeed compliant with the law? Deciphering EU legal requirements on consent and technical means to verify compliance of cookie banners,” Technology and Regulation, pp. 91–135, Dec. 2020, doi: ghtr3n.
[48]
V. Jesus and H. J. Pandit, “Consent Receipts for a Usable and Auditable Web of Personal Data,” IEEE Access, vol. 10, pp. 28545–28563, 2022, doi: 10.1109/ACCESS.2022.3157850.
[49]
C. Matte, C. Santos, and N. Bielova, “Purposes in IAB Europe’s TCF: Which legal basis and how are they used by advertisers?” in Annual Privacy Forum (APF 2020), 2020.
[50]
“Data brokers: A call for transparency and accountability,” Federal Trade Commission (FTC), Washington, USA, 2014.
[51]
T. Urban, D. Tatang, M. Degeling, T. Holz, and N. Pohlmann, “Measuring the Impact of the GDPR on Data Sharing in Ad Networks,” in ASIA CCS, 2020, p. 15, doi: ghd9kw.
[52]
P. Laperdrix, N. Bielova, B. Baudry, and G. Avoine, “Browser Fingerprinting: A Survey,” ACM Transactions on the Web, vol. 14, no. 2, pp. 1–33, Apr. 2020, doi: ggxj9v.
[53]
J. L. Kröger, J. Lindemann, and D. Herrmann, “How do app vendors respond to subject access requests? A longitudinal privacy study on iOS and Android Apps,” in Proceedings of the 15th International Conference on Availability, Reliability and Security, 2020, pp. 1–10, doi: gg6z2g.
[54]
T. Urban, D. Tatang, M. Degeling, T. Holz, and N. Pohlmann, “A Study on Subject Data Access in Online Advertising After the GDPR,” in Data Privacy Management, Cryptocurrencies and Blockchain Technology, 2019, pp. 61–79.
[55]
“Four Years Under the GDPR: How to fix its enforcement.” AccessNow, 2022.
[56]
P. Schütz, “Data Protection Authorities under the EU General Data Protection Regulation. A new global benchmark (extended version).” Fraunhofer Institute for Systems and Innovation Research ISI, Jan-2022.
[57]
H. Nissenbaum, “A contextual approach to privacy online,” Daedalus, vol. 140, no. 4, pp. 32–48, 2011, doi: fjkb7c.
[58]
S. Wilson et al., “Crowdsourcing Annotations for WebsitesPrivacy Policies: Can It Really Work?” in Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 133–143, doi: gfxvsd.
[59]
H. J. Pandit et al., “Creating A Vocabulary for Data Privacy,” in The 18th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE2019), 2019, p. 17, doi: ggwx7x.
[60]
A. Kurteva, T. R. Chhetri, H. J. Pandit, and A. Fensel, “Consent through the lens of semantics: State of the art survey and best practices,” Semantic Web, vol. Preprint, no. Preprint, pp. 1–27, Jan. 2021, doi: gmsjzn.
[61]
L. Tauginienė, P. Hummer, A. Albert, A. Cigarini, and K. Vohland, Ethical Challenges and Dynamic Informed Consent,” in The Science of Citizen Science, K. Vohland, A. Land-Zandstra, L. Ceccaroni, R. Lemmens, J. Perelló, M. Ponti, R. Samson, and K. Wagenknecht, Eds. Cham: Springer International Publishing, 2021, pp. 397–416.
[62]
H. J. Pandit, “Proposals for Resolving Consenting Issues with Signals and User-side Dialogues.” arXiv, Aug-2022 [Online]. Available: https://arxiv.org/abs/2208.05786

  1. https://solidproject.org/↩︎

  2. https://solidproject.org/developers/tools/↩︎

  3. https://solidproject.org/users/get-a-pod↩︎

  4. https://www.iso.org/ics/35.210/x/p/1/u/0/w/0/d/0↩︎

  5. https://solidproject.org//self-hosting/css↩︎

  6. https://start.inrupt.com/↩︎

  7. https://apps.nextcloud.com/apps/solid↩︎

  8. https://solidproject.org//users/get-a-pod↩︎

  9. ‘public access’ only refers to access mode for data, and doesn’t constitute a permission to use or further disseminate it↩︎

  10. Here, agreement is a document outlining an arrangement or understanding between entities, whereas a contract is a specific formal agreement between entities that is intended to be legally enforceable and is governed by relevant jurisdictional requirements and obligations regarding validity, enforcement, liabilities, and remedies.↩︎

  11. https://solidproject.org/apps↩︎

  12. https://solidproject.org/faqs↩︎

  13. https://nextcloud.com/blog/press_releases/pr20210414/↩︎

  14. https://aur.archlinux.org/↩︎

  15. https://solidproject.org/apps↩︎

  16. https://nextcloud.com/blog/press_releases/pr20210414/↩︎

  17. Abbreviated as Art. for Article and Rec. for Recital following common conventions for indicating clauses.↩︎

  18. Hetzner https://www.hetzner.com/legal/terms-and-conditions/, a Cloud service provider outlining their role as a Processor, and that the customer is the responsible party regarding processing of personal data within provisioned services↩︎

  19. https://w3id.org/dpv↩︎