The GDPR requires assessing and conducting a Data Protection Impact Assessment (DPIA) for processing of personal data that may result in high risk and impact to the data subjects. Documenting this process requires information about processing activities, entities and their roles, risks, mitigations and resulting impacts, and consultations. Given the complexities, impact assessments are difficult for stakeholders to identify relevant risks and mitigations, especially for emerging technologies and specific considerations in their use-cases, and to document outcomes in a consistent and reusable manner.
This work utilises linked-data to represent DPIA related information so that it can be better managed and shared in an interoperable manner. It is based on analysis of guidance documents produced by EU Data Protection Authorities (DPA) regarding DPIA and by ENISA regarding risk management.
It provides two extensions to the Data Privacy Vocabulary (DPV) - first for documenting DPIAs, and second for risk management based on ISO 31000 family of standards. It also considers how shared impact assessments can be realised to reuse this DPIA work in other impact assessments, as well as for future regulations such as for AI and Cybersecurity.
This specification is a proposal to the W3C Data Privacy Vocabularies and Controls CG (DPVCG).
For a complete discussion on the research aspects of this work, relation to state of the art, and discussion of its practicality and merit, please see the draft research article: Pandit, Harshvardhan J. (2022). A Semantic Specification for Data Protection Impact Assessments (DPIA). https://harshp.com/dpv-dpia/paper/paper
The [[[DPV]]] currently (as of v0.7) provides the concept dpv:DPIA
for the representation of Data Protection Impact Assessments (DPIA) as an organisational measure. It does not elaborate on how this concept should be used in terms of how the specifics of a DPIA such as what processing operations it relates to or who performs it or the outcomes should be indicated. This extension addresses this gap by identifying the additional concepts required and providing documentation for how they can be applied alongside DPV to express the DPIAs.
The document is structured in the following manner:
DPIA is described in [[GDPR]] Article.35 in terms of three steps or processes. The first, (A.35-1) analyses the processing activities to determine whether a DPIA is required to be performed. The outcome of this step is the determination of whether a DPIA is needed or not needed, and the justification for this decision. Note that the decision criteria for this step solely focuses on whether the processing "is likely to result in a high risk to the rights and freedoms of natural persons", and not on the residual likelihood.
GDPR A.35-1 Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data.
GDPR itself specifies certain criterias in terms of processing activities and their context which must always be considered as likely to result in high-risk, and therefore must always require carrying out a DPIA. These include criterias such as profiling, systemic monitoring, and automated processing which are kinds of processing operations, whether the impacts of processing will be "legal effects", processing involves large scale of special categories of data, or the location of processing being a publicly accessible area where systemic monitoring takes place.
GDPR A.35-3 A data protection impact assessment referred to in paragraph 1 shall in particular be required in the case of:
Additionally, the GDPR empowers DPAs to establish additional criterias for which a DPIA would be necessary. This means that depending on which DPA is applicable for a given processing operations - such as based on the organisation's location or involvement of specific data subjects - additional criterias for when to conduct DPIAs are necessary. In practice, it would be pragmatic to consider a DPIA being necessary for any criteria mentioned by a DPA regardless of whether they are the relevant authority for processing activities.
GDPR A.35-4 The supervisory authority shall establish and make public a list of the kind of processing operations which are subject to the requirement for a data protection impact assessment pursuant to paragraph 1.
Of interest is also the ability for DPAs to publish criterias for which DPIAs are not necessary. Similar to the list where processing is necessary, this list is specific to the DPA and may not be shared by other DPAs. Here a criteria appearing on a list does not automatically "exempt" it from DPIA but is to be taken as a general guidance that operations of this type do not typically necessitate the undertaking of a DPIA.
GDPR A.35-5 The supervisory authority may also establish and make public a list of the kind of processing operations for which no data protection impact assessment is required.
The second step takes place where a DPIA is needed, and consists of carrying out the DPIA. This involves again analysing the processing activities and determining whether they are "necessary" and "proportional" for what is intended to be done (i.e. purpose), and to assess the risks to the "rights and freedoms" of data subjects in terms of: what risks exist, whether mitigating measures exist and their effectiveness, and the residual risks still in effect.
GDPR Art.35-7 The assessment shall contain at least:
Based on the outcome of the previous step where the DPIA was performed and the residual risk determined, the third and final step of the DPIA uses this information to make informed decisions regarding the carrying out of processing activities the DPIA relates to. If the outcome of the DPIA was that there is still a (residual) risk to the rights and freedoms of individuals (i.e. high-risk), then processing should not take place until such high-risks exist.
Here the controller has the option to go back to the drawing board and change their processing activities and/or risk mitigation measures until they are satisfied that there are no high-risks applicable. Alternately, the controller can consult the DPA to get an authoritative opinion on the assessment and decision to be taken.
GDPR A.36-1 The controller shall consult the supervisory authority prior to processing where a data protection impact assessment under Article 35 indicates that the processing would result in a high risk in the absence of measures taken by the controller to mitigate the risk.
From the interpretation of DPIA being a 3-step process, its requirements in terms of required information can be summarised as follows:
In the above, the order follows a logical structure based on what information is required first and its use in following steps. However, the specifics for steps are executed or carried out, such as the determination of whether and how a certain criteria is met regarding DPIA being mandatory, is not within the scope of this work. Here the only concern is the representation of information involved i.e. the inputs to such steps and the resulting outputs.
Scale and Scope are important concepts for consideration of risks and impacts in a DPIA. However, they are also relevant in other processes, such as other impact assessments, management of data and technologies, and so on. Therefore, these concepts should be provided as part of the main DPV vocabulary.
Scale
refers to a measurement along some dimension (of another concept). While there can be absolute values for scale (e.g. 9001 as a number), qualitative labels are more common in DPIAs and other avenues. For this reason, some qualitative concepts would be useful to be provided as part of DPV. The proposal is to have these concepts, in order of larger scale to smaller: Massive
, Huge
, Large
, Medium
, Small
, Sporadic
, Singular
.
Specific scales relevant in a DPIA include: personal data (DataVolume
), data subjects (DataSubjectScale
), and processing areas (GeographicScale
). The property hasScale
is needed to associate these as a context of activities, and therefore as a sub-property of hasContext
. Each type of scale is specialised with the qualifiers (list above) to provide a convenient ability to refer to that concept, e.g. large scale of data subjects.
| | DataVolume | DataSubjectScale | GeographicScale |
| -------- | ------------------ | --------------------------- | ----------------------- |
| Massive | MassiveDataVolume | MassiveScaleOfDataSubjects | MassiveGeographicScale |
| Huge | HugeDataVolume | HugeScaleOfDataSubjects | HugeGeographicScale |
| Large | LargeDataVolume | LargeScaleOfDataSubjects | LargeGeographicScale |
| Medium | MediumDataVolume | MediumScaleOfDataSubjects | MediumGeographicScale |
| Small | SmallDataVolume | SmallScaleOfDataSubjects | SmallGeographicScale |
| Sporadic | SporadicDataVolume | SporadicScaleOfDataSubjects | SporadicGeographicScale |
| Singular | SingularDataVolume | SingularScaleOfDataSubjects | SingularGeographicScale |
A better alternative for GeographicScale
would be to express coverage in terms of locations, as: Global
, NearlyGlobal
, MultiNational
, National
, Regional
, Locality
, WithinEnvironment
. This is much clearer in terms of what the scale is as compared to labels like large and massive which are context dependant. Either can be provided, or even both can be provided. If both are to be provided, then GeographicCoverage
could be a subclass of GeographicScale
and the parent concept for these coverage concepts.
Scope, in differentiating it from Scale, is defined as the variance of something i.e. how much of something is present or how different it is or what is included and what is not. Scale is more about measurement of something. Scope can include things such as specific data categories, or groups of data subjects, or areas - which would not be accurate to specify as being the scale of something. Therefore, the concept Scope
and property hasScope
are necessary to express this. Further specialisation is not advised as scope can vary (wildly at times) depending on what the use-case is.
There are existing concepts within DPV which fall under the category of Scale. These include dpv:Frequency
, to be provided with the qualifiers or specialisations as Continous
, Often
, Sporadic
, Singular
; and dpv:Duration
to be provided with qualifiers Endless
, TemporalDuration
, UntilEvent
, UntilTime
, FixedOccurences
. Both of these sets describe how these terms are used and providing them would be beneficial for representing information. Note that while frequency terms are consistent with describing scale, those from duration are not homogenous and describe different types of information. This is intentional as duration can be temporal, event-base, or iteration-based. Providing them all under scale despite this would be a good design choice for consistency with other similar concepts.
TODO
For expressing risks, mitigations, and impacts, DPV provides high-level concepts as:
dpv:Risk
, dpv:hasRisk
dpv:RiskMitigationMeasure
, dpv:mitigatesRisk
dpv:Consequence
, dpv:hasConsequence
, ExclusionOfDataSubjects
, DiscriminationOfDataSubjects
dpv:Impact
, dpv:hasImpact
, dpv:hasImpactOn
For more specific risk assessment information, such as risk levels and severity, there is ongoing work on a ISO 31000 (series) based risk ontology
Specific risk related concepts that are relevant here:
The three parts of a DPIA (necessity, procedure, outcome) are separate in terms of what their outcomes can be and need to be documented. For this, the concept dpv:DPIA
should be a subclass of Audit
representing any investigation or assessment or audit. This permits reuse of dpv:hasStatus dpv:AuditStatus
to indicate the status of any audit, in this case for DPIAs.
| Status ↓ DPIA → | DPIANecessityAssessment | DPIAProcedure | DPIAOutcome |
|------------------|-------------------------|-----------------|---------------------|
| AuditRequired | Check DPIA needed | DPIA needed | Outcome pending |
| AuditAccepted | Correct analysis | DPIA accepted | Outcome accepted |
| AuditRefused | Wrong analysis | Incorrect DPIA | Wrong analysis |
| AuditApproved | Approved analysis | DPIA approved | Outcome approved |
| AuditRequested | Request Checking | DPIA requested | Analysis requested |
| AuditNotRequired | DPIA check not needed | DPIA not needed | Analysis not needed |
The dpv:AuditStatus
only represents the overall status of that process/event/concept - so it will only inform whether someone needs to be done or has been done. In DPIAs, it is also necessary to record the outcome of each part, i.e. what was determined after the necessity assessment would specify whether a DPIA is needed to be conducted, and so on. To represent this information, a new property dpv:hasOutcome
is proposed for addition in to DPV.
Specific outcomes of each process are:
dpv:DPIANecessityAssessment
: the concept DPIANecessityStatus
with specialisations DPIARequired
and DPIANotRequired
dpv:DPIAProcedure
: the concept DPIARiskStatus
with specialisations HighRiskToRights
, LowRiskToRights
, NoRiskToRights
dpv:DPIAOutcome
: the concept DPIAOutcomeStatus
with specialisations HighResidualRisk
, ConsultationRequired
, MitigatedRisk
In these, all these statuses can also be used as annotations for other concepts, such as specific instances of dpv:PersonalDataHandling
, or dpv:Technology
to indicate their relation and relevance in terms of DPIAs.
Similar to Audit
, other relevant processes that are involved in a DPIA but can be generalised include: Approval
, Investigation
, and Review
. These concepts can be relevant as organisational measures, for example to specify there is a reviewing procedure or policy in place for processing activities.
For representing various consultations, DPV provides the dpv:Consultation
concept. For DPIAs, this needs to be extended as: ConsultationWithDataSubject
and ConsultationWithDPO
.
For describing the what/where/how/when type annotations associated with DPIA (and its parts), DCMI terms are reused as follows:
dct:title
: title of the DPIAdct:creator
: creator of DPIAdct:description
: description of the DPIAdct:identifier
: an identifier for the DPIA, could be a unique reference and/or a versiondct:created
: date the DPIA was created or generateddct:modified
: date DPIA was modifieddct:dateSubmitted
: date the DPIA was submitted (e.g. for audit or approval)dct:dateAccepted
: date the DPIA was accepted (e.g. after assessment)dct:temporal
: other temporal information as needed dct:valid
: denotes duration for how long the DPIA would be valid or in effect, or could be a reference to when to conduct a DPIA againdct:conformsTo
: whether the DPIA follows some guidelines or conforms to a code of conduct or a template/methodology/standarddct:isVersionOf
: reference to prior version of this DPIAdct:subject
: references the 'topic' of the DPIA, i.e. what is being assessed, e.g. a product or a specific service. This would typically be a dpv:PersonalDataHandling
or its subclass within DPV, but common use could also be as a string descriptiondct:coverage
: indicating the scope of the DPIA, e.g. temporal or geographical or jurisdictionalIn DPIA documents, a large amount of information is expected to be recorded in the form of justifications for why something was or was not done regarding the requirements set out by GDPR or DPAs. This information would typically be indicated as a textual description (i.e. free-form text) accompanying some question or concept. Given the importance of this concept in legal compliance, and the necessity to record this information in a form more explicit than (mere) descriptions, the property hasJustification
and concept Justification
is proposed for inclusion in DPV. The concept enables associating a textual statement, or document, or specific concept as the justification for its state or existence, and is also useful beyond DPIAs - such as for acknowledging legal compliance obligations or recording a DPO’s statements during an investigation.
TODO
For dpv:Processing
with parent concept in brackets.
Access
(dpv:Use
)Assess
(dpv:Use
)Filter
(dpv:Transform
)Monitor
(dpv:Consult
)Modify
(dpv:Alter
)Observe
(dpv:Obtain
)Screen
(dpv:Transform
)For dpv:DataSubject
:
MentallyVulernable
AsylumSeeker
ElderlyDataSubject
For dpv:TechnicalOrganisationalMeasure
:
CredentialManagement
(dpv:AuthorisationProcedure
)DataBackupProtocols
(dpv:TechnicalMeasure
)PhysicalAccessControlMethod
(dpv:AccessControlMethod
)For dpv:Technologoy
:
ConventionalTechnology
NewTechnologoy
For dpv:ProcessingContext
:
AutomationOfProcessing
FullyAutomatedProcessing
AutomatedProcessingWithHumanVerification
AutomatedProcessingWithHumanOversight
AutomatedProcessingWithHumanInput
PartiallyAutomatedProcessing
CompletelyManualProcessing
For dpv:Purpose
:
MaintainCreditCheckingDatabase
(dpv:CreditChecking
)MaintainCreditRatingDatabase
(dpv:CreditChecking
)MaintainFraudDatabase
(dpv:FraudPreventionAndDetection
)For dpv:PersonalData
in DPV-PD:
PerformanceAtWork
(dpv-pd:Behavioral,dpv-pd:Professional
)FinancialStatus
(dpv-pd:Financial
)Reliability
(dpv-pd:Behavioral
)Profile
(dpv:PersonalData
)WorkEnvironment
(dpv-pd:Professional
)BrowserHistory
(dpv-pd:BrowsingBehavior
)VehicleLicense
(dpv-pd:Identifying
)VehicalLicenseNumber
(VehicleLicense
)VehicalLicenseRegistration
(VehicleLicense
)FacialPrint
(dpv-pd:Biometric
)PersonalDocuments
(dpv-pd:External
)HouseholdData
(dpv:PersonalData
)SocialMediaData
(dpv-pd:Communication
) with existing categories under thisPubliclyAvailableSocialMediaData
(SocialMediaData
)TODO
TODO
Funding: This research has received funding from Uniphar PLC, and the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106_P2) and co-funded by the European Regional Development Fund. Harshvardhan J. Pandit has received funding under the Irish Research Council’s Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.