Issues with Modelling Subjective Locations
published:
by Harshvardhan J. Pandit
is part of: Data Privacy Vocabulary (DPV)
DPV DPVCG semantic-web Working Note
Continuing the discussion from previous post where I proposed modelling concepts such as public/private and home/work as DPV locations. In this post I summarise the discussions around subjective locations in the DPVCG meetings to identify key issues that need to be discussed and resolved with potential solutions.
Introduction
In the last DPVCG meetings, we discussed the subjective location concepts, and identified five issues with their modelling that need to be addressed:
- Locality: An assessment of locality e.g. is it here locally or is it there remotely -- most commonly used to distinguish data storage on premises and in cloud;
- Public/Private: Locations being categorised as public/private with a taxonomy expanding upon these and including hybrid mixture of both;
- Subjective Labels: Use of subjective locations that are not fixed in geo-physical space merely by their label but rely on the interpretation or application in order to be concretely resolved e.g. home, work;
- Virtual Locations: Digital addresses and 'places' that are phrased as locations in language e.g. data is located in device, or in your browser, or the home page of the website, or the location of a file in a file system; and
- Backwards Compatibility: Where these concepts should be defined - in DPV or LOC namespaces and what to do with the existing concepts in DPV.
Locality of Locations
DPV has the concept dpv:LocationLocality
to refer to whether the locations are described as here or there i.e. as local or remote. This concept was added as the locality of data storage is an important consideration in security and in legal considerations. It provides a simple and convenient method to express where is the data in combination with other locations e.g. to state that the server located in a country is a remote location (which cannot be identified by itself from just the server location). The concepts are also not suitable to be organisational measures as merely being local or remote is not a measure.
Other concepts present in DPV in addition to locality include public/private and location fixture (single, multiple, federated, etc.) which are similarly relevant. Therefore, the recommendation is to keep these core concepts in DPV as they are useful in security and legal interpretations and to move all other subjective locations to the LOC extension (see below).
Public/Private Locations
REQUIREMENT: Model locations as being public or private spaces in the sense of ownership and accessibility.
For public/private locations, the issue at hand is ensuring that our modelling of these concepts in DPV matches and is aligned with the legal definitions and interpretations for these in a way that is sensible, useful, and comprehensible to others. Whereas public/private are broad labels used quite commonly, others such as the mixture of the two are not legal terms, but their descriptions are intended to support legal interpretations. For example, a private place that has within it a publicly accessible portion has consequences to consider when putting up a CCTV monitoring specific areas. The proposed public/private taxonomy models such concepts to support an explicit acknowledgement of whether the space is public or privately owned, and whether it is accessible to the public or not.
The issue identified after discussion on these was whether these concepts were correctly modelled and comprehensible. To ensure these are addressed, I have changed some of the labels so that they are more explicit in describing the category of location. I have also changed their parent by removing subjective location as these concepts are well defined and interpreted in law (as the modelling intends).
Subjective Labels as Locations
REQUIREMENT: Model commonly used 'places' such as 'home' and 'work'.
For use of subjective labels such as home and work as locations, the first issue identified is that they are distinct from other geo-physical locations such as cities and countries which have a clear identity and presence, and therefore whether they should be modelled with the same category/type as a location. To address these, I have proposed the concept loc:SubjectiveLocation
as the parent of all locations concepts which require a subjective interpretation -- such as home requiring asking which home & where is it located?
REQUIREMENT: Model the distinction between being located inside and outside of that location.
The second issue with these is what prefix to use to refer to them -- with options suggested being AtHome and WithinHome, where it was discussed as to what these prefixes are implying. Within is referring to a boundary and being within that boundary, whereas At is more loosely defined in terms of only referring the overall space. Further, Within also implies the need to consider External spaces i.e. if we want to say something is located within a specific place, then the distinction also needs to be made for the space outside of the specified place. This has implications when considering questions such as was this located within the workplace or outside of it?
It is important to highlight the observation here that when we say some location, it is always implied that we are referring to the boundary and space within it. For example, if we are talking about home or work, we most definitely mean the space inside the home and the work places. Therefore, if a distinction must be made for space outside of these, then only that should be a seprate concept. However, if we model this as a concept, it leads to unnecessary complications and there is a better way to model this: use a property like dpv:isOutsideOfLocation
which by definition would mean places that are not the specified location. This reduces the need to model each concept as insideX/outsideX.
If we choose to model the outside location as a distinct concept, it leads to the following issue. Let's assume we have the concept OutsideThisSpace as an abstract subjective concept, and then ask the user to use this with other subjective concepts such as Home and Work to have the desired interpretation. Only using the concept such as Home already means within/at home, so adding InsideThisSpace is redundant. However, when using this concept, we are asking the user to create an intersection of the places represented by inside home and outside home -- which is a null set i.e. it is empty and undefined. If we ask the user to combine these concepts, we get the universal set i.e. it represents everything! Therefore, a combination with a combination that negates something is not a good idea. To highlight how this affects practicality: the accepted proposal for modelling inverted locations such as Non-EU uses a distinct concept to represent the external location. If instead we were to use EU and OutsideThisSpace together, the interpretation is extremely sensitive as it relies on reading both concepts -- if only EU is read then we have the opposite interpretation. Instead, by having Non-EU as a distinct concept that does not contain EU as a concept ensures, by definition and semantics, that such an interpretation cannot take place.
REQUIREMENT: Clarify distinction and overlap between location and personal data
A third issue that arises when modelling such subjective locations is the need to distinguish them from the similar or sometimes duplicate labels used to refer to personal data. For example, home as a location and also to describe a person's home as their personal data. If we are modelling these as locations, how do we define such overlap, what are its implications, and based on these -- should these be modelled as personal data directly? An important observation here is that, in theory, all locations can be personal data e.g. work (place) can refer to a person's or an employee's workplace, and a park (place) can refer to a place the person spends time at. However, they can also be used as non-personal data e.g. work (place) refers to the location of an organisation, and a park (place) refers to the location used by the road planning committee -- BUT these are not likely to be used and are also not the main focus of DPVCG.
As the PD taxonomy already provides concepts that model personal data, it should be the namespace within which the person's home and work and other locations which are intended to be personal data should be defined with parent subjective location. The other places which are unlikely to be personal data should have their taxonomy in the LOC extension e.g. beach, park, train station. This distinction between personal and non-personal spaces should also help with the objective of the DPVCG to assist with privacy categorisations (like we do in the PD extension by annotating concept as sensitive data). In this manner, if someone wants to specify a location such as home, they have an easy check to see if it is likely to be personal data and the concept is in PD extension.
Virtual Locations
REQUIREMENT: Model virtual and digital spaces as locations
Another kind of subjective location is the phrasing used to refer to where data is being stored. For example, your photos are stored only on your device, or all processing happens locally within the device. Here, there is underlying technology which is used to store the data (i.e. the device) and the reference to space is as a boundary of being inside the device and not being outside of it. Such distinctions are important for privacy as they imply not transferring data outside the device. They are also increasingly relevant for AI processing that happens on device. The issue raised here was whether these are indeed location concepts or just facets of the technology and therefore belong in the TECH extension (e.g. similar to defining a kind of access control).
The origin of these phrases comes from the necessity to define the location of data when none is available. For example, in the above, if an app was storing data in its namespace in the device and it is necessary to define a location, the developer is likely to put "within/on device" as the answer. Note that this is a different question from how the data is saved -- the answer to which is a form of data storage technology, and that both are necessary to get the complete picture. Thus, for digital technologies, the location used to store the data MUST be by definition also a technology. The question then that we are trying to answer here is whether we are referring to the data storage technology or the implications of this in terms of where it is "located" and who can "access" it.
REQUIREMENT: Clarify overlap between location and technology concepts
A complication in the creation of technology as location concepts is that any technology can be a (data storage) location. For example, hardware such as device, camera, or sensor, as well as software such as software or app -- all can be specified as being capable of storing data and thus being locations. Instead of creating a distinct taxonomy for each, these concepts should be in the TECH extension (as they are now) and no repetition for these should be present in other extensions (not in DPV and not in LOC). Instead, we can interpret the requirement as we want to have a way to specify that data is being stored inside or outside that hardware/software. This brings the problem to a similar phase as the earlier requirements regarding 'inside/outside home', and thus we should be able to use the same solution. This means we use dpv:hasLocation tech:Something
to mean data is stored in that technology and dpv:isOutsideOfLocation tech:Something
to mean data is stored outside of that technology. To assert that data is only stored in the technology -- this is a tech/org measure and should be modelled as such instead of a location concept. This way, we don't need to duplicate technology concepts as locations and it also provides consistency when using technology concepts as location concepts. However, we should not define any or all technologies as instances of dpv:Location
as this won't always be true.
Backwards Compatibility
REQUIREMENT: Ensure backwards compatibility
The DPV currently contains several location concepts under dpv:LocationLocality
such as dpv:WithinDevice
and dpv:CloudLocation
that have an incompatible overlap with the approaches proposed above. They also become an issue as the location concepts would be fragmented across DPV and LOC namespaces. The ideal fix would be to delete these concepts as they superseded by the new interpretation of directly using technology concepts where needed as locations, however, this breaks backwards compatibility for existing uses.
The criteria for deciding whether to break backwards compatibility is:
- How many uses would be affected that are critically reliant on the current modelling? We don't know specifics of which implementations are using these concepts, but it should be correct to assume that these concepts would be in use somewhere as they are relevant to several legal obligations. However, it is not likely that these are critically reliant on the existence of these concepts as they are relatively minor concepts as compared to personal data and purposes in the broad sense of things.
- Is there an alternative to the breaking concepts and is it comparable and compatible? Yes, we are proposing an alternative, though it is not a 1:1 replacement as the concepts are now spread across LOC and TECH extensions. However, all existing concepts will be covered i.e. the same information can be represented. Therefore, we are proposing a change rather than a complete removal.
- What is the cost of updating the existing implementation to use the new concepts? If we only consider the replacement of concepts, the cost should be quite low as there are replacements. However, if the concepts were expanded i.e. a use-case had its own taxonomy expanding on the DPV concepts then the cost could be higher. However, we do not know of any such cases at the moment.
Based on the above analysis, I propose that we should break backwards compatibility and remove/replace the existing concepts as the new model should provide a better representation of concepts related to location, technology, and personal data without overlaps while also addressing their intersection and how to expand their taxonomy in future.