Student Project Ideas

Ideas for student to implement as projects
published:
by Harshvardhan J. Pandit
academic projects students

This page lists ideas for projects which relate to my research interests and I am willing to collaborate and supervise students in their realisation. Each idea can be applied for multiple disciplines (e.g. computer science, law), and can be adapted to different skills and technologies (e.g. your favourite programming language) as well as expertise levels (e.g. Undergraduate, Masters, PhD, funded projects). Each project is expressed as multiple steps, each of which can be implemented individually for a smaller skillset (e.g. undergraduate) or combined for complex implementations (PhD or funded project).

filter by: all tags

Index of Project Ideas

Project Ideas

AI Risk Classification Tool

#AI #GDPR #law #python #javascript #DPV #RDF

tldr; you will develop a tool which will take some inputs (e.g. purpose, technology) describing the use of AI within an use-case, and will provide an output showing: risks in its use, legal implications (e.g. high-risk under GDPR, AI Act), as well as suggest measures to limit the risk (e.g. use specific ISO standards, improve accuracy of outputs, DPIA). You will extend existing prototype implementations.

Artificial Intelligence (AI) is progressing at an alarmingly rapid rate. As a technology with the potential to be applied and used everywhere, it comes with great risks which range from minor annoyance (e.g. wrong spelling isn't detected) to major disruptions to society (e.g. humans are harmed, democratic elections are affected). One of the greatest challenges facing our times, and what the AI Act primarily aims to address, is how to understand what are the risks for using AI within specific use-cases. To better explore the topic, we first need to understand: (1) What is meant by AI?; (2) How can AI be used within specific use-case? (3) How can using AI affect people, society, and organisations?. To answer these, we researched and developed a 'framework' that uses 5 concepts: Domain (e.g. Education), Purpose (e.g. Identity Verification), Application or Capability (e.g. Facial Recognition), User or Operator (e.g. Lecturer), and Subject (e.g. Students). Using combinations of these, we express and identify risk categories e.g. high-risk per the AI Act Annex III or how a specific dataset or model might lead to high-risk uses. We also identify relevant risks for specific concepts within taxonomies created from the 5 concepts (e.g. what are the risks of facial recognition? what are the specific risks when facial recognition is used for identity verification?)

In this project, you will continue this work by further developing and extending the taxonomies and the tool used for risk classification. You can try a demo of the tool in action for the AI Act. The underlying combinations are expressed using the N3 language - see GitHub repo. You will further develop the interface, create more rules, and add documentation. You will also make it easy for people to express their use-cases, for example, by providing options for popular uses such as using a LLM like ChatGPT to correct exam answers - and then explain the output (high-risk, impacts students) in terms of this use-case. If you do not have the sufficient programming background, you will conduct legal research i.e. how can the different use-cases from AI Act, GDPR (DPIA high-risk categories) be expressed in terms of these concepts, and assist in the documentation process which does not require a programming background.

For references, please see:

Ethics Assessment / DPIA Support Tool

#ethics #GDPR #javascript #python #RDF #DPV

Description: Performing an ethics assessment or a Data Protection Impact Assessment (DPIA) is a key requirement whenever there are people and personal data involved. Such assessments are increasingly being embedded Students are required to assess their projects, Researchers require ethics integration into their work as well as statements for publications, and Organisations require it for GDPR. This project will develop a support tool to help students, researchers, and organisations to conduct their Ethics/GDPR applications in a more efficient manner and to improve the process of identifying ethics / data protection risks as well as suitable measures for their mitigations.

The project involves work on building a web dashboard for assisting researchers with documentation for data protection and ethical clearance, and will provide an opportunity to participate in real-world applications of technology in the areas of data protection and ethics. The chief task would be to build an information system that will allow users to receive suggestions and guidance for addressing risks regarding ethics and privacy. The core idea of the project is based on the following steps.

  • An ethics assessment or a DPIA can be expressed as a form containing 'fields' which require specific information. The project requires identification of these fields - and a large body of work exists to do this in a machine-readable manner. For example, see A Semantic Specification for Data Protection Impact Assessments (DPIA) which expresses the information in a machine-readable form using the Data Privacy Vocabulary (DPV). The premise of these is that instead of entering textual information (i.e. sentences), semantic concepts can be used to ensure information is expressed in a form that can be programmatically checked and used.
  • Some information can be conditional based on other information, for e.g. if the field 'will personal data be stored?' has the information 'yes', then another field 'list categories of personal data' must have appropriate information. This requires checking the necessary information exists.
  • Entered information is also required to be correct, e.g. 'encryption' is not a personal data category. Therefore, the entered information is also required to be checked for correctness.
  • Based on the correctly entered information, some inferences need to be drawn, e.g. consent is required, or a full DPIA or ethics assessment is required, which will determine the form of output provided back to the users. To do this, the concepts entered within the form need to be grouped into 'patterns' which when satisfied result in these inferences. For example, if personal data is collected - GDPR applies. Or, if consent is to be collected, consent records must be maintained. Based on these, a final outcome needs to be calculated for the form of assessment required, e.g. a self-assessment or a submission to a committee.
  • The information entered and the inferences drawn need to be recorded into logs recording the submission and decision.
  • Collecting and analysing the assessments will provide insight into how specific risks are associated with concepts. For e.g. the most common form of 'error' in the submission form might be incorrectly indicating personal data is not involved, or that consent is not required. Such analysis can be used to further refine the input form to ensure the inputs are correctly understood and entered. Further, specific risks for most common uses can be identified and provided to assist in the form filling process, e.g. if most submissions use an online service to conduct surveys - the form can specifically use this information to further ask if the online service is safe and appropriate to be used. Finally, the analysis should also provide a list of common concepts for inputs, e.g. how will personal data be stored can be accompanied with the options for popular services such as Dropbox, Google Drive, OneDrive. This will simplify the form for most users and speed up the process of figuring out input information.

Annotated Privacy Information Dataset of Apps from iOS and Android App-Stores

#privacy #smartphones #RDF #Python #JavaScript #BrowserAddon

Tldr; Scrape the app’s information from app store and automatically annotate it with information relevant for privacy

Motivation: Apple’s iOS and Google’s Android are the dominant smartphone OSs, and through their respective App-Stores are responsible for providing the infrastructure and functionality to users for installing and managing applications on their devices. Increasingly, these companies are creating requirements for applications to declare information about their privacy practices, which not only include a privacy policy, but also information on what kinds of data the apps collect and how they use it [1]. Collecting this information in a machine-readable dataset can enable understanding current practices, pitfalls, and the privacy practices of apps - as well as tracking how they evolve with time.

Implementation: You will utilise a web-scraping [2] tool to parse the information from app-store pages. This can be written in any language of choice, though python offers a large collection of ready-to-use frameworks. The information extracted from these pages will be defined using the Data Privacy Vocabulary (DPV) [3], a metadata specification for declaring how data is used. The formal representation used for DPV is the Resource Description Framework (RDF), which you will become familiar with in order to read and write it for generating the dataset / corpus. The actual data will be stored and managed using a relational database and SQL, such that it can be extracted and exported to RDF to provide interoperability.

What you will learn: (i) How Apps utilise personal data; (ii) How to do web-scraping; (iii) How to work with concepts and semantics, e.g. when creating schemas; (iv) Creating and managing datasets as research resources; (v) What/Where more information is needed in App Stores

tldr; for another idea: You will work on improving the 'transparency labels' shown in smartphone app stores, e.g. Apple's store provides 'Data used to Track you' by identifying additional information to be shown, and showing examples using labels developed from analysing selected applications. You will also create a 'machine-readable' version of this label so that it can be automatically detected, and used programmatically e.g. filter all apps that do not track me.

For examples of what information can be put into a transparency label, see Data Privacy Vocabulary (DPV).

References:
[1] https://developer.apple.com/app-store/app-privacy-details/
[2] https://en.wikipedia.org/wiki/Web_scraping
[3] http://w3.org/ns/dpv

Automated Privacy Policy Generation Using Metadata and Templates

#GDPR #privacy #javascript #python #DPV #RDF

Tldr; Create different variations of privacy policies in terms of text and design by using templates and metadata containing the required information

Motivation: Privacy policies and Terms and Conditions, as they are presented on the web, are a long boring wall of legal text which is difficult to comprehend and use. There have been various avenues for making this simpler, such as summarising [1], or using machine learning to identify relevant information [2], and even alternatives such as visualising information [3]. However, instead of starting from a fixed given set of complicated text or a fixed layout, this research instead takes a different approach. It explores whether using metadata and a set of templates can make privacy policies easier to generate, offer different options to comprehend them, and enable alternative mediums such as visualisations to be easily implemented. It thus aims to show that metadata and automation can help people better understand privacy practices.

Implementation: You will analyse how privacy policies look, read, and affect comprehension of information through existing literature on these topics. You will then create ‘templates’ for policies - where a template is (simplified) some generic document with blanks that will be filled with use-case specific information. The template will be defined using a ‘templating library’ such as Jinja2 or Moustache. You will create different templates for rendering the same information in different sentences, layouts, forms (e.g. visual, multimedia). The information to be used to fill in the template will be declared using Data Privacy Specification (DPV), a metadata specification for declaring how data is used. The formal representation used for DPV is the Resource Description Framework (RDF), which you will become familiar with in order to read and write it for generating the dataset / corpus. The actual data will be stored and managed using JSON or JSON-LD which makes it easier to use in the web browser and in javascript.

What you will learn: (i) Issues with existing privacy policies; (ii) How to automate documentation using templates; (iii) What information is relevant for understanding privacy and legal compliance; (iv) How people comprehend information; (v) How to query and use linked data in a web application.

References:
[1] https://tosdr.org/
[2] https://pribot.org/polisis
[3] Privacy CURE: Consent Comprehension Made Easy https://www.specialprivacy.eu/images/documents/IFIP_SEC_2020.pdf
[4] http://w3.org/ns/dpv

Tool to Report GDPR Violations in Online Services

#consent #GDPR #javascript #BrowserAddon

Motivation: The GDPR lays out specific requirements regarding how to inform people about e.g. data collection, how to exercise rights, and when requesting consent. Given that most of us interact with online services these days, it is difficult for authorities to manually inspect websites for compliance and produce the required documentation at a large scale. While technologies have been proposed to help automate the detection of issues - their outcomes need to be expressed with the correct legal terms and links back to specific GDPR clauses in order for them to be useful in legal investigations. This project will assist in these tasks by creating a browser extension which will enable any individual to highlight problematic parts of a website e.g. consent dialogues, and will automatically generate documentation containing links to appropriate legal clauses i.e. which specific parts of GDPR it violates.

Implementation: The project will involve understanding the requirements of GDPR and investigations into compliance violations. The project will create a browser extension - which requires knowledge of javascript and web development - to provide an interface to users to report and to capture evidence in the form of annotated screenshots e.g. problematic parts of a consent dialogue.

An early prototype has validated this idea, see this demo showing a browser extension that captures a screenshot of a consent dialogue and provides an interface where users can select their domain (e.g. lay person, HCI, law) to get a list of issues expressed in terms of their chosen domains. The tool then produced a legally-worded documentation that listed those issues along with specific clauses associated with each of those issues with the intention to submit it to the Data Protection Officer (DPO) or the Authority. For more information, see Crowd-sourcing Multi-Domain Issues in Consent Dialogues for Automated Generation of Legal Complaints

To implement these ideas, the project will involve the following steps:

  • Choosing a specific aspect of GDPR's requirements and the corresponding implementation in online services, e.g. consent and the consenting dialogue, or exercising of rights and the privacy policy or form. A list of 'issues' associated with the topic will be compiled from the state of the art.
  • The list of 'issues' will be expressed a lists for a particular domain, e.g. a sentence for the lay person to express they don't understand what they are consenting to will be different from how a legal scholar would express it. Each such issue and sentence would be put into a database and linked with the relevant GDPR clauses.
  • A user interface needs to be developed for users to report the issues. This should involve capturing information automatically e.g. such as the URL of the website they are on, the date and time, their location based on IP address - which will be part of the final output document.
  • The tool should also provide the ability to capture an image of the website and to highlight specific parts of it in relation to an identified issue (see the demo mentioned earlier).
  • The tool should identify and insert relevant details such as organisation name (e.g. Meta), service name (e.g. Facebook), and to whom the issue should be reported to - i.e. contact of DPO as well as the Data Protection Authority (which can be 'local' e.g. Austria or 'EU' e.g. Meta has HQ in Ireland). Practically, this might mean manually creating a database containing this information with some selected popular services.
  • Based on the selected issues and the entered information, the tool should generate a document that contains the listed issues, relevant GDPR clauses, and annotated screenshots (if any). The document should be in a form that can be shared (e.g. PDF, copy text into an email) and export-able. It should provide suitable wording (e.g. To the DPO, please see below issues) with variations for different tasks (e.g. complaining to the DPO or an authority, assisting the DPO in identifying the issues, offering fixes, exercising rights under GDPR).
  • Based on the extent of this work, its feasibility and evaluation in terms of practicality, this can be turned into a community-initiative where the submissions are collected and shared with authorities or NGOs for further action.

Extracting Structured Metadata from Consent Dialogues

#privacy #consent #NLP #ML

Consent dialogue boxes are everywhere on the web - with the information geared towards making it easy for users to comprehend how their personal data is being used. However, this information is presented in human-readable format with no way for machines to analyse it. The aim of this project is to extract this information and represent it as structured metadata to enable automation and analysis of privacy based approaches. For example, the statement "We use your address to deliver goods you buy on our website" can be represented by: address - personal data, deliver goods - purpose, use - processing.

The project goal is to use NLP techniques to identify such categories by using classifiers and ML similar to existing work regarding privacy policies [1][2][3]. The extracted information would then be represented using vocabularies such as DPV [4].

[1] Usable Privacy Project
[2] Pribot
[3] CLAUDETTE
[4] DPV

Programmatic Privacy Notices and Dialogues

#privacy #consent #cookies #Browser #signals #internet #GDPR #Javascript

Tldr; Creating APIs for programmatically generating notices and dialogues, as those used for consent and cookies on websites, to be developed from metadata (e.g. JSON) to avoid known issues (e.g. dark patterns).

Motivation: Consent and cookie dialogues are a plague on the web - they’re there on every website, and most people click on the ‘Agree’ button without reading or understanding what they just agreed to. Even though this has been shown to violate data protection and privacy laws [1], enforcement takes time and is difficult to undertake at the scale of the web. It is difficult to create a singular solution that is acceptable to all parties (users, service providers, authorities), which makes it difficult to achieve common goals. This research explores how a web browser can provide a set of APIs to generate privacy notices and dialogues on user-side, with different methods or options offering various levels of controls to the websites and users to control their interactions.

Implementation: You will understand how current privacy notices and consent dialogues function in terms of information, content, legal requirements, and technical implementations. You will then identify different types of APIs that can automate the generation of different components (e.g. API for notice, for showing options, for giving consent). You will implement these using CSS and JS, and test them using browser addons.

What you will learn: (i) What are privacy notices and consent dialogues (ii); How to create and implement APIs based on different stakeholder requirements (iii) Legal and Privacy implications of your developed technologies (iv) Challenges associated with developing privacy solutions

Implementing ISO/IEC 29184 privacy notices

#privacy #consent #JavaScript #BrowserAddon #GDPR

tldr; The ISO/IEC 29184 is a standard specifying content of privacy notices, how they should be presented, and how they can be used - including machine-readable versions. This project will implement the standard to create privacy notices that conform with the standard, assess how the standard addresses existing issues regarding consent and privacy notices, and provide reusable components for others to improve their notices.

A Privacy Signal for Automating Consent Interactions

#privacy #browser #signaling #internet #GDPR #BrowserAddon #Javascript

tldr; Creating a browser signal that helps automate some of the interactions on notices and consent dialogues online

Motivation: Consent and cookie dialogues are a plague on the web - they’re there on every website, and most people click on the ‘Agree’ button without reading or understanding what they just agreed to. Even though this has been shown to violate data protection and privacy laws [1], enforcement takes time and is difficult to undertake at the scale of the web. Instead, there is a growing call for easier-to-use and enforce ‘signals’ which indicate privacy preferences in a human-centric manner. Previously, Do Not Track (DNT) and Platform for Privacy Preferences (P3P) were two major efforts which failed to gain adoption. The current ones showing promise are Global Privacy Control (GPC) [2] - which can prohibit any further sharing of data, and the Advanced Data Protection Control (ADPC) [3] which can enable a machine-readable request and indicating of preferences to permit or prohibit certain actions. While uptake for both increases, there are two important areas of implementation and research - (1) what language to use within the signals so that both users and websites interpret it in the same manner; and (2) how to manage these within the browser.

Implementation: You will understand DNT, P3P, GPC and ADPC specifications. They’re fairly technical which means you will also learn about how they work through HTTP communication protocols. You will then explore how ADPC can be implemented using HTTP. Within ADPC, a language is necessary for expressing whether something is permitted or prohibited and what it is (e.g. a purpose, some personal data). For expressing these concepts, you will use the Data Privacy Vocabulary (DPV) [4], a metadata specification for declaring how data is used. To test the developed signal, you will create browser addons and a (sample) backend server for creating and managing the preferences set within ADPC, and testing the communication between users and websites. You will also investigate expressing consenting choices as rules or heuristics of the kind: only 1st party; only analytics; no third party ads; And see whether this affects consenting behaviours or notices

What you will learn: (i) What information is relevant when making privacy decisions about personal data use and sharing; (ii) HTTP protocols and how the web functions; (iii) How to express permissions or prohibitions for data sharing; (iv) Legal and Privacy implications of permitting or prohibiting websites from using data; (v) The next generation of privacy signals within the browser.

References:
[1] Do Cookie Banners Respect my Choice?: Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework https://arxiv.org/pdf/1911.09964
[2] https://globalprivacycontrol.github.io/gpc-spec/
[3] https://www.dataprotectioncontrol.org/spec/
[4] https://w3id.org/dpv

Recording online consent via browser extension

#privacy #consent #javascript #GDPR

Description: The "I Agree" button has become inescapable while browsing the web. While it is present as a legal requirement for collecting consent, once we have clicked the button, we have no record of what we just agreed to. In this project, you will be creating a digital receipt to record the given consent and the information associated with it.

The goal is to create a browser extension that automatically captures the information in a consent dialogue box, and enables the user to later view it in a dashboard. It will use existing standards such as Consent Receipt [1] and Data Privacy Vocabulary [2] to record this information.

This project will provide exposure on front-end development in real-world websites, and an opportunity for increased transparency online regarding privacy. It will also provide a learning experience for use of programming tools (e.g. git) and research based workflows.

Pre-requisites: Good working knowledge of Javascript/CSS and its use in web-pages

[1] Consent Receipt https://kantarainitiative.org/confluence/display/infosharing/Consent+Receipt+Specification
[2] DPV http://w3.org/ns/dpv


Old Ideas that are no longer supported

These ideas were proposed earlier and I am currently not accepting these for supervising student projects. I will support their implementation by sharing knowledge and providing guidance - so if you are interested in any of these then you should email me.

Building a Registry of CCTV Notices and their Privacy Practices in Dublin

#privacy #surveillance #dashboard #GDPR #BrowserAddon #Javascript

Tldr; Creating a browser signal that helps automate some of the interactions on notices and consent dialogues online.

Motivation: GDPR requires CCTVs to be accompanied with notices explaining who is operating them, what kind of data is collected and retained, what it is used for, any possible use of techniques such as facial recognition, etc [1]. It is difficult to identify such notices for CCTV, and to keep track of them all together. It would be useful for the general public, authorities, and other CCTV users to understand the contents of these notices, and how they operate under GDPR.

Implementation: You will learn how CCTV notices are used in real-life, what their information is, and the kinds of technologies involved (e.g. types of cameras). You will collect examples of real-life CCTV and their accompanying notices, store them in a database, and build a dashboard to provide convenient access to this information. The dashboard will enable users to see an overview of CCTV usage (data categories, purposes, controllers) and their locations (e.g. on a map). You will build a form or input provider for crowdsourcing this information. You will write your report based on this implementation and analysis of collected information (e.g. comment on data being collected, technologies involved, difficulty of obtaining this information).

What you will learn: (i) How CCTVs work in real-life, and their requirements under GDPR regarding privacy, transparency, and notices (ii) Information gathering and data modeling (iii) Building a dashboard based on functional and non-functional requirements (iv) Analysis of technologies regarding privacy risks and GDPR

References:
[1] https://www.dataprotection.ie/en/dpc-guidance/guidance-on-the-use-of-cctv

Extracting Structured Metadata From Privacy Policies

#privacy #privacy-policy #NLP #ML

Privacy policies are notoriously difficult to read. One of the challenges is the use of legal and intentionally obfuscating language. Though the GDPR has made it a legal requirement to make use of clear languages in policies, there is yet a barrier towards effective transparency regarding the information presented in such policies. This project aims to extract information from the policy, such as - sources of data, their requirement in processes, legal basis, storage periods - and express it as structured metadata for use in research that aims to simplify privacy policies via techniques such as summarisation and visualisation.

The project goal is to use NLP techniques to identify relevant information by using classifiers and ML [1][2][3], which would enable extraction of information from the text of privacy policies, and to represent it using vocabularies such as DPV [4] and GDPRov [5].

[1] Usable Privacy Project
[2] Pribot
[3] CLAUDETTE
[4] DPV
[5] GDPRov

DPA Grace Periods (this is more lawyery)

#GDPR

Who gives grace periods for enforcement - table of DPAs. DPC has one for 5 OCT 2020. Why grace periods so late after GDPR? e.g. 2016 published, 2018 enforcement, then grace periods after that. Do DPAs have authority to give grace periods? Rational of grace periods could be to clarify unknown ambiguity, for known or repeated implementation, grace periods legitimise illegal data processing. Legality of data processing conducted up to or before grace periods - have there been indication of cancel or stop that processing, e.g. to ask for consent again - has this been mentioned in the guidelines?

Crumple: Folding Privacy Policies via Summaries

#privacy #privacy-policy #python #NLP

Privacy policies can be made easier to digest if they are provided as efficient summaries which the users can read and understand quickly. This project will attempt to assist in the understanding of a privacy policy by abstracting or folding larger sections into shorter summaries. This will done by analysing the text using NLP and identifying relevant information to provide a summary.

An example of a crumpled summary is to describe data will never be shared with third parties - which in the expanded version describe that data will be shared only with data processors and list the purposes and technical measures that safeguard the transmissions. Another example is where data will only be shared for legally required obligations - with the expanded version describing what those legal requirements are and how will the data be shared.

Scoring Privacy Policies For Transparency and Readability

#privacy #privacy-policy #NLP #ML

Privacy policies are notoriously difficult to read and understand, chiefly because of the obfuscated legal language used and the confusing structure. Though the GDPR has strived to provide more transparency in the language used, there is no measurement of how to evaluate such policies. The aim of this project is to identify metrics for transparency and readability in the privacy policy and to score a given policy using them. An example metrics could be categorisation of information, where the policy has separate structures explaining data collection, sharing, etc. The project will use NLP to identify relevant clasues in the text of the policy, and ML to classify the policy using generated metrics.

Accessibility-like requirements for Privacy

#privacy #JavaScript #BrowserAddon

There are accessibility features that are meant to help people with dis-abilities to see and view and hear and interpret and understand information and interface. These features also end up making the system better for use for the majority of users by virtue of better design and control. Can something similar be done for privacy based on vulnerability and special categories of data and inviduals?

Privacy Policy Visualisation

#GDPR #privacy #Python #Javascript

This project will research different visual mediums and modalities to allow the user to comprehend and interact with options instead of textual policies which are difficult to read. Use of visual information can include icons, diagrams, graphical interfaces, or different types of controls.

Privacy Policy Personalisation

#GDPR #privacy #Python #Javascript

Privacy policies are written in a way that makes it difficult to understand which conditions apply to the current user and what exactly is happening with the data. For example, using terms such as "may apply" is confusing. This project will research into ways to personalise the privacy policy or notice to the user to generate text for what is precisely happening for the user. For example, "you used X service, for which we do Y things" and "you have not used X service, therefore the following Y things do not happen in your case".

Testing 'benefits' of consent

#consent #GDPR #privacy #ads

tldr; giving consent is assumed to provide some benefit, investigate what the benefit is in terms of extent and effectiveness

When an individual provides their consent to the collection, use, and sharing of their personal data - one question is regarding what benefit or value they receive in return [3]. In the context of websites and consent, the purported benefit is commonly associated with the individual receiving improvements to their experience such as personalisation in the content or ads they see [1]. However, there is no tangible or demonstrable way to perceive the difference in such benefits when consent is given as compared to when it is refused. In addition to these, the process of giving consent is complicated and unclear to individuals [2] and uses purpose descriptions [4] which further obscures its transparency and impact.

To investigate this situation further, this work requires the researcher(s) to visit websites and interact with consent dialogues to identify and record: (a) whether benefits exist in return for consenting; (b) whether they are clear and comprehensible; (c) can they be seen or demonstrated; and (d) what ‘value’ does it provide to the individual. For example, when a website says that the consent will enable personalisation in ads shown, does it also clarify where the ads will be shown, in what form and manner, and whether the individual can identify ads being influenced by their consent. Through such analysis, the work seeks to determine what 'value' is obtained in lieu of consent, its scope and form, and whether the impact of consent is transparent to the individual. The primary articles guiding this work are [4] for describing the purposes of consent and [2] for establishing the human-centric view in terms of understanding and comprehension when consenting.

References
[1] S. J. De and A. Imine, “Consent for targeted advertising: the case of Facebook,” AI & Soc, May 2020, doi: 10/ggzp38.
[2] S. Human and F. Cech, “A Human-centric Perspective on Digital Consenting: The Case of GAFAM,” presented at the Human Centred Intelligent Systems 2020, Split, Croatia, 2020, Accessed: Jun. 08, 2020. [Online]. Available: https://epub.wu.ac.at/7523/.
[3] G. Malgieri and B. Custers, “Pricing privacy – the right to know the value of your personal data,” Computer Law & Security Review, vol. 34, no. 2, pp. 289–303, Apr. 2018, doi: 10/gc7nbt.
[4] C. Matte, C. Santos, and N. Bielova, “Purposes in IAB Europe’s TCF: which legal basis and how are they used by advertisers?,” presented at the Annual Privacy Forum (APF 2020), Oct. 2020, Accessed: May 27, 2020. [Online]. Available: https://hal.inria.fr/hal-02566891.

Implementing Privacy Signals in Browsers

#privacy #browser #internet #signals #BrowserAddon #Javascript #RDF #GDPR

Tldr; Create and analyse different methods and their impact for using and managing ADPC signals to indicate user’s privacy preferences using basic Web HTTP communication methods

Motivation: Consent and cookie dialogues are a plague on the web - they’re there on every website, and most people click on the ‘Agree’ button without reading or understanding what they just agreed to. Even though this has been shown to violate data protection and privacy laws [1], enforcement takes time and is difficult to undertake at the scale of the web. Instead, there is a growing call for easier-to-use and enforce ‘signals’ which indicate privacy preferences in a human-centric manner. The two important ones are Global Privacy Control (GPC) [2] which can prohibit any further sharing of data, and the Advanced Data Protection Control (ADPC) [3] which can enable a machine-readable request and indicating of preferences to permit or prohibit certain actions. While uptake for both increases, there are two important areas of implementation and research - (1) what language to use within the signals so that both users and websites interpret it in the same manner; and (2) how to manage these within the browser.

Implementation: You will understand both ADPC and GPC specifications. They’re fairly technical which means you will also learn HTTP communication protocols. You will then explore how ADPC can be implemented using HTTP. Within ADPC, a language is necessary for expressing whether something is permitted or prohibited and what it is (e.g. a purpose, some personal data). For expressing these concepts, you will use the Data Privacy Vocabulary (DPV) [4], , a metadata specification for declaring how data is used. The formal representation used for DPV is the Resource Description Framework (RDF), which you will become familiar with in order to read and write it for generating the dataset / corpus. To test the ADPC, you will create simple addons and backend server for creating and managing the preferences set within ADPC, and testing the communication between users and websites. You will explore different ADPC iterations in terms of how they affect the size (HTTP signals are expected to be small), their impact in terms of preferences (how much can we express), and what kinds of information can be sent this way (e.g. we can only share preference for personal data, but not who it is shared with).

What you will learn: (i) What information is relevant when making privacy decisions about personal data use and sharing; (ii) HTTP protocols and how the web functions; (iii) How to express permissions or prohibitions for data sharing; (iv) Legal and Privacy implications of permitting or prohibiting websites from using data; (v) The next generation of privacy signals within the browser.

References:
[1] Do Cookie Banners Respect my Choice?: Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework https://arxiv.org/pdf/1911.09964
[2] https://globalprivacycontrol.github.io/gpc-spec/
[3] https://www.dataprotectioncontrol.org/spec/
[4] http://w3.org/ns/dpv

An Ad-blocker for Cookie and Consent Dialogues online

#consent #GDPR #Javascript #BrowserAddon #RDF

Motivation: The GDPR’s extensive requirements for valid consent have filled the internet with consent dialogues that are a persistent annoyance to the web. The websites try to request consent because they are legally required to do so, and in the process use several deceptive and manipulative practices to force the individual to give consent. This project involves creating a browser extension similar to an ad-blocker that will block the consent-requests. The project will also study the effect of such blocking on the actions of the website. In principle, the websites are only supposed to collect and process data after consent. Therefore, the blocking of consent dialogues should result in no data collection.

Implementation: The project will involve understanding the requirements of how consent is requested on the internet and the mechanisms of consent dialogues on websites. The project will create a browser extension - which requires knowledge of javascript and web development - to capture the problematic parts of a consent dialogue and tag it with common violations that are linked to specific requirements in GDPR.

Attaching Risk Factors to Consent Dialogues

#privacy #ethics #consent #javascript

The notion of informed consent in GDPR requires the individual to also be notified about potential risks associated with the processing of their personal data. However, consent dialogues contain only information associated with use of personal data and do not provide any information about the risks of sharing that information. The aim of this project is to attach risks with appropriate information within a consent dialogue to enable the user to make a balanced judgement about their consent and use of personal data. The project involves detecting categories of information within a consent dialogue and associating an existing corpus of privacy risks by visually annotating the consent dialogue.