The GDPR requires assessing and conducting a Data Protection Impact Assessment (DPIA) for processing of personal data that may result in high risk and impact to the data subjects. Documenting this process requires information about processing activities, entities and their roles, risks, mitigations and resulting impacts, and consultations. Given the complexities, impact assessments are difficult for stakeholders to identify relevant risks and mitigations, especially for emerging technologies and specific considerations in their use-cases, and to document outcomes in a consistent and reusable manner.

This work utilises linked-data to represent DPIA related information so that it can be better managed and shared in an interoperable manner. It is based on analysis of guidance documents produced by EU Data Protection Authorities (DPA) regarding DPIA and by ENISA regarding risk management.

It provides two extensions to the Data Privacy Vocabulary (DPV) - first for documenting DPIAs, and second for risk management based on ISO 31000 family of standards. It also considers how shared impact assessments can be realised to reuse this DPIA work in other impact assessments, as well as for future regulations such as for AI and Cybersecurity.

This specification is a proposal to the W3C Data Privacy Vocabularies and Controls CG (DPVCG).

For a complete discussion on the research aspects of this work, relation to state of the art, and discussion of its practicality and merit, please see the draft research article: Pandit, Harshvardhan J. (2022). A Semantic Specification for Data Protection Impact Assessments (DPIA). https://harshp.com/dpv-dpia/paper/paper

Introduction

The [[[DPV]]] currently (as of v0.7) provides the concept dpv:DPIA for the representation of Data Protection Impact Assessments (DPIA) as an organisational measure. It does not elaborate on how this concept should be used in terms of how the specifics of a DPIA such as what processing operations it relates to or who performs it or the outcomes should be indicated. This extension addresses this gap by identifying the additional concepts required and providing documentation for how they can be applied alongside DPV to express the DPIAs.

The document is structured in the following manner:

  1. DPIA Requirements provides a brief summary of information requirements for DPIA based on GDPR and various Data Protection Authorities (DPAs) guidance and templates.
  2. DPIA overview presents a high-level description of the DPIA process, its information contents, and outputs - based on the analysed requirements.
  3. Describing processing with DPV provides guidance on how the underlying information necessary for conducting a DPIA is expressed using DPV.
  4. DPIA necessity describes the conditions under which a DPIA is necessary, and how this information should be expressed.
  5. DPIA Risk Assessment describes the risk and impact assessments to be conducted as part of the DPIA.
  6. DPIA outcome describes the outcomes of the DPIA process, its information contents, and outputs to be documented.
  7. DPIA documentation describes the various actors, entities, and information that is typically associated with DPIAs and must be documented.
  8. Examples provides an overview of applying this extension to DPIA documents found "in the wild".
  9. DPIA concepts appendix provides a summarised table of concepts within the DPIA extension.
  10. Proposed Concepts for DPV appendix provides a list of concepts proposed for inclusion within DPV.
  11. Sources appendix provides a link to the GDPR, DPA guidelines and templates, and any other sources relevant to this work.

DPIA requirements

A simplified overview of DPIA being a 3-step process

Step 1: DPIA Necessity Determination

DPIA is described in [[GDPR]] Article.35 in terms of three steps or processes. The first, (A.35-1) analyses the processing activities to determine whether a DPIA is required to be performed. The outcome of this step is the determination of whether a DPIA is needed or not needed, and the justification for this decision. Note that the decision criteria for this step solely focuses on whether the processing "is likely to result in a high risk to the rights and freedoms of natural persons", and not on the residual likelihood.

GDPR A.35-1 Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data.

GDPR itself specifies certain criterias in terms of processing activities and their context which must always be considered as likely to result in high-risk, and therefore must always require carrying out a DPIA. These include criterias such as profiling, systemic monitoring, and automated processing which are kinds of processing operations, whether the impacts of processing will be "legal effects", processing involves large scale of special categories of data, or the location of processing being a publicly accessible area where systemic monitoring takes place.

GDPR A.35-3 A data protection impact assessment referred to in paragraph 1 shall in particular be required in the case of:

  1. a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person;
  2. processing on a large scale of special categories of data referred to in Article 9(1), or of personal data relating to criminal convictions and offences referred to in Article 10; or
  3. a systematic monitoring of a publicly accessible area on a large scale.

Additionally, the GDPR empowers DPAs to establish additional criterias for which a DPIA would be necessary. This means that depending on which DPA is applicable for a given processing operations - such as based on the organisation's location or involvement of specific data subjects - additional criterias for when to conduct DPIAs are necessary. In practice, it would be pragmatic to consider a DPIA being necessary for any criteria mentioned by a DPA regardless of whether they are the relevant authority for processing activities.

GDPR A.35-4 The supervisory authority shall establish and make public a list of the kind of processing operations which are subject to the requirement for a data protection impact assessment pursuant to paragraph 1.

Of interest is also the ability for DPAs to publish criterias for which DPIAs are not necessary. Similar to the list where processing is necessary, this list is specific to the DPA and may not be shared by other DPAs. Here a criteria appearing on a list does not automatically "exempt" it from DPIA but is to be taken as a general guidance that operations of this type do not typically necessitate the undertaking of a DPIA.

GDPR A.35-5 The supervisory authority may also establish and make public a list of the kind of processing operations for which no data protection impact assessment is required.

Step 2: Carrying out a DPIA

The second step takes place where a DPIA is needed, and consists of carrying out the DPIA. This involves again analysing the processing activities and determining whether they are "necessary" and "proportional" for what is intended to be done (i.e. purpose), and to assess the risks to the "rights and freedoms" of data subjects in terms of: what risks exist, whether mitigating measures exist and their effectiveness, and the residual risks still in effect.

GDPR Art.35-7 The assessment shall contain at least:

  1. a systematic description of the envisaged processing operations and the purposes of the processing, including, where applicable, the legitimate interest pursued by the controller;
  2. an assessment of the necessity and proportionality of the processing operations in relation to the purposes;
  3. an assessment of the risks to the rights and freedoms of data subjects referred to in paragraph 1; and
  4. the measures envisaged to address the risks, including safeguards, security measures and mechanisms to ensure the protection of personal data and to demonstrate compliance with this Regulation taking into account the rights and legitimate interests of data subjects and other persons concerned.

Step 3: Effect of DPIA Outcome on Processing

Based on the outcome of the previous step where the DPIA was performed and the residual risk determined, the third and final step of the DPIA uses this information to make informed decisions regarding the carrying out of processing activities the DPIA relates to. If the outcome of the DPIA was that there is still a (residual) risk to the rights and freedoms of individuals (i.e. high-risk), then processing should not take place until such high-risks exist.

Here the controller has the option to go back to the drawing board and change their processing activities and/or risk mitigation measures until they are satisfied that there are no high-risks applicable. Alternately, the controller can consult the DPA to get an authoritative opinion on the assessment and decision to be taken.

GDPR A.36-1 The controller shall consult the supervisory authority prior to processing where a data protection impact assessment under Article 35 indicates that the processing would result in a high risk in the absence of measures taken by the controller to mitigate the risk.

Summary of DPIA Requirements

From the interpretation of DPIA being a 3-step process, its requirements in terms of required information can be summarised as follows:

  1. General information about processing activities (i.e. purposes, processing, personal data, legal basis, technical and organisational measures, entities involved, etc.)
  2. Criterias under which a DPIA is necessary - and whether the processing satisfies any of these so that a DPIA is required
  3. Rights and freedoms relevant to the processing and data subjects
  4. Risks to rights and freedoms relevant to the processing and data subjects
  5. Effectivess of measures in place to mitigate risks - and whether the residual risks still in effect constitutes high-risk
  6. How the decision of the DPIA influenced the processing activities i.e. they were halted or changes made to risk mitigation measures or a DPA was consulted

In the above, the order follows a logical structure based on what information is required first and its use in following steps. However, the specifics for steps are executed or carried out, such as the determination of whether and how a certain criteria is met regarding DPIA being mandatory, is not within the scope of this work. Here the only concern is the representation of information involved i.e. the inputs to such steps and the resulting outputs.

DPIA overview

Overview of the DPIA Specification

Describing processing using DPV

Scale and Scope are important concepts for consideration of risks and impacts in a DPIA. However, they are also relevant in other processes, such as other impact assessments, management of data and technologies, and so on. Therefore, these concepts should be provided as part of the main DPV vocabulary.

Scale refers to a measurement along some dimension (of another concept). While there can be absolute values for scale (e.g. 9001 as a number), qualitative labels are more common in DPIAs and other avenues. For this reason, some qualitative concepts would be useful to be provided as part of DPV. The proposal is to have these concepts, in order of larger scale to smaller: Massive, Huge, Large, Medium, Small, Sporadic, Singular.

Specific scales relevant in a DPIA include: personal data (DataVolume), data subjects (DataSubjectScale), and processing areas (GeographicScale). The property hasScale is needed to associate these as a context of activities, and therefore as a sub-property of hasContext. Each type of scale is specialised with the qualifiers (list above) to provide a convenient ability to refer to that concept, e.g. large scale of data subjects.

|          | DataVolume         | DataSubjectScale            | GeographicScale         |
| -------- | ------------------ | --------------------------- | ----------------------- |
| Massive  | MassiveDataVolume  | MassiveScaleOfDataSubjects  | MassiveGeographicScale  |
| Huge     | HugeDataVolume     | HugeScaleOfDataSubjects     | HugeGeographicScale     |
| Large    | LargeDataVolume    | LargeScaleOfDataSubjects    | LargeGeographicScale    |
| Medium   | MediumDataVolume   | MediumScaleOfDataSubjects   | MediumGeographicScale   |
| Small    | SmallDataVolume    | SmallScaleOfDataSubjects    | SmallGeographicScale    |
| Sporadic | SporadicDataVolume | SporadicScaleOfDataSubjects | SporadicGeographicScale |
| Singular | SingularDataVolume | SingularScaleOfDataSubjects | SingularGeographicScale |

A better alternative for GeographicScale would be to express coverage in terms of locations, as: Global, NearlyGlobal, MultiNational, National, Regional, Locality, WithinEnvironment. This is much clearer in terms of what the scale is as compared to labels like large and massive which are context dependant. Either can be provided, or even both can be provided. If both are to be provided, then GeographicCoverage could be a subclass of GeographicScale and the parent concept for these coverage concepts.

Scope, in differentiating it from Scale, is defined as the variance of something i.e. how much of something is present or how different it is or what is included and what is not. Scale is more about measurement of something. Scope can include things such as specific data categories, or groups of data subjects, or areas - which would not be accurate to specify as being the scale of something. Therefore, the concept Scope and property hasScope are necessary to express this. Further specialisation is not advised as scope can vary (wildly at times) depending on what the use-case is.

There are existing concepts within DPV which fall under the category of Scale. These include dpv:Frequency, to be provided with the qualifiers or specialisations as Continous, Often, Sporadic, Singular; and dpv:Duration to be provided with qualifiers Endless, TemporalDuration, UntilEvent, UntilTime, FixedOccurences. Both of these sets describe how these terms are used and providing them would be beneficial for representing information. Note that while frequency terms are consistent with describing scale, those from duration are not homogenous and describe different types of information. This is intentional as duration can be temporal, event-base, or iteration-based. Providing them all under scale despite this would be a good design choice for consistency with other similar concepts.

DPIA necessity conditions

TODO

DPIA Risk Assessment

Overview of the Risk Ontology

For expressing risks, mitigations, and impacts, DPV provides high-level concepts as:

For more specific risk assessment information, such as risk levels and severity, there is ongoing work on a ISO 31000 (series) based risk ontology

Specific risk related concepts that are relevant here:

DPIA outcomes

The three parts of a DPIA (necessity, procedure, outcome) are separate in terms of what their outcomes can be and need to be documented. For this, the concept dpv:DPIA should be a subclass of Audit representing any investigation or assessment or audit. This permits reuse of dpv:hasStatus dpv:AuditStatus to indicate the status of any audit, in this case for DPIAs.

| Status ↓ DPIA →  | DPIANecessityAssessment | DPIAProcedure   | DPIAOutcome         |
|------------------|-------------------------|-----------------|---------------------|
| AuditRequired    | Check DPIA needed       | DPIA needed     | Outcome pending     |
| AuditAccepted    | Correct analysis        | DPIA accepted   | Outcome accepted    |
| AuditRefused     | Wrong analysis          | Incorrect DPIA  | Wrong analysis      |
| AuditApproved    | Approved analysis       | DPIA approved   | Outcome approved    |
| AuditRequested   | Request Checking        | DPIA requested  | Analysis requested  |
| AuditNotRequired | DPIA check not needed   | DPIA not needed | Analysis not needed |

The dpv:AuditStatus only represents the overall status of that process/event/concept - so it will only inform whether someone needs to be done or has been done. In DPIAs, it is also necessary to record the outcome of each part, i.e. what was determined after the necessity assessment would specify whether a DPIA is needed to be conducted, and so on. To represent this information, a new property dpv:hasOutcome is proposed for addition in to DPV.

Specific outcomes of each process are:

In these, all these statuses can also be used as annotations for other concepts, such as specific instances of dpv:PersonalDataHandling, or dpv:Technology to indicate their relation and relevance in terms of DPIAs.

Similar to Audit, other relevant processes that are involved in a DPIA but can be generalised include: Approval, Investigation, and Review. These concepts can be relevant as organisational measures, for example to specify there is a reviewing procedure or policy in place for processing activities. For representing various consultations, DPV provides the dpv:Consultation concept. For DPIAs, this needs to be extended as: ConsultationWithDataSubject and ConsultationWithDPO.

DPIA documentation

For describing the what/where/how/when type annotations associated with DPIA (and its parts), DCMI terms are reused as follows:

In DPIA documents, a large amount of information is expected to be recorded in the form of justifications for why something was or was not done regarding the requirements set out by GDPR or DPAs. This information would typically be indicated as a textual description (i.e. free-form text) accompanying some question or concept. Given the importance of this concept in legal compliance, and the necessity to record this information in a form more explicit than (mere) descriptions, the property hasJustification and concept Justification is proposed for inclusion in DPV. The concept enables associating a textual statement, or document, or specific concept as the justification for its state or existence, and is also useful beyond DPIAs - such as for acknowledging legal compliance obligations or recording a DPO’s statements during an investigation.

Examples

TODO

Appendix: DPV proposal

For dpv:Processing with parent concept in brackets.

For dpv:DataSubject:

For dpv:TechnicalOrganisationalMeasure:

For dpv:Technologoy:

For dpv:ProcessingContext:

For dpv:Purpose:

For dpv:PersonalData in DPV-PD:

Appendix: DPIA concepts

TODO

Appendix: Sources of Information

TODO

Acknowledgements

Funding: This research has received funding from Uniphar PLC, and the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106_P2) and co-funded by the European Regional Development Fund. Harshvardhan J. Pandit has received funding under the Irish Research Council’s Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.