Student Project Ideas

Ideas for student to implement as projects
published:
by Harshvardhan J. Pandit
academic projects students
click to filter by: all

Consent Dashboard for the Browser

#consent #GDPR #privacy #BrowserAddon #Javascript

Privacy Policy Personalisation

#GDPR #privacy #Python #Javascript

Privacy Policy Visualisation

#GDPR #privacy #Python #Javascript

Analysis of GDPR Rights implementations in the Real-world

#GDPR #rights

Demonstrating Potential for Inferred Personal Data

#privacy #Python

Implementing ISO/IEC 29184 privacy notices

#privacy #consent #JavaScript #BrowserAddon #GDPR

Accessibility-like requirements for Privacy

#privacy #JavaScript #BrowserAddon

There are accessibility features that are meant to help people with dis-abilities to see and view and hear and interpret and understand information and interface. These features also end up making the system better for use for the majority of users by virtue of better design and control. Can something similar be done for privacy based on vulnerability and special categories of data and inviduals?

Consent/Preferences as heuristics

#consent #privacy #JavaScript

Express consenting choices as rules or heuristics of the kind: only 1st party; only analytics; no third party ads; And see whether this affects consenting behaviours or notices

DPA Grace Periods (this is more lawyery)

#GDPR

Who gives grace periods for enforcement - table of DPAs. DPC has one for 5 OCT 2020. Why grace periods so late after GDPR? e.g. 2016 published, 2018 enforcement, then grace periods after that. Do DPAs have authority to give grace periods? Rational of grace periods could be to clarify unknown ambiguity, for known or repeated implementation, grace periods legitimise illegal data processing. Legality of data processing conducted up to or before grace periods - have there been indication of cancel or stop that processing, e.g. to ask for consent again - has this been mentioned in the guidelines?

DPV labels for Apps

#privacy #smartphones #RDF #Python #JavaScript #BrowserAddon

Provide more information within app store and smartphone devices instead of just tracking data categories. This info is based on DPV concepts. DPV label for app instead of ATT

Building a Registry of CCTV Notices and their Privacy Practices in Dublin

#privacy #surveillance #dashboard #GDPR #BrowserAddon #Javascript

Tldr; Creating a browser signal that helps automate some of the interactions on notices and consent dialogues online.

Motivation: GDPR requires CCTVs to be accompanied with notices explaining who is operating them, what kind of data is collected and retained, what it is used for, any possible use of techniques such as facial recognition, etc [1]. It is difficult to identify such notices for CCTV, and to keep track of them all together. It would be useful for the general public, authorities, and other CCTV users to understand the contents of these notices, and how they operate under GDPR.

Implementation: You will learn how CCTV notices are used in real-life, what their information is, and the kinds of technologies involved (e.g. types of cameras). You will collect examples of real-life CCTV and their accompanying notices, store them in a database, and build a dashboard to provide convenient access to this information. The dashboard will enable users to see an overview of CCTV usage (data categories, purposes, controllers) and their locations (e.g. on a map). You will build a form or input provider for crowdsourcing this information. You will write your report based on this implementation and analysis of collected information (e.g. comment on data being collected, technologies involved, difficulty of obtaining this information).

What you will learn: (i) How CCTVs work in real-life, and their requirements under GDPR regarding privacy, transparency, and notices (ii) Information gathering and data modeling (iii) Building a dashboard based on functional and non-functional requirements (iv) Analysis of technologies regarding privacy risks and GDPR

References:
[1] https://www.dataprotection.ie/en/dpc-guidance/guidance-on-the-use-of-cctv

Browser extension for Documenting Legal Compliance Violations in Consent Dialogues

#consent #GDPR Javascript #BrowserAddon #privacy #RDF

Motivation: The GDPR lays out specific requirements on how to request and collect consent - something most websites seem to get wrong. With limited human and technological resources, it is difficult for an investigation agency to manually inspect all the websites for compliance. Technologies can help solve this by automating the detection part, but need data to be effective in detection. Further, the technological findings need to be expressed using legal language and formal information in order for them to be used effectively in legal compliance. This project will assist in these tasks by creating a browser extension which will enable any individual to highlight problematic parts of a website and consent dialogue, and will automatically generate documentation containing links to appropriate legal clauses i.e. which specific parts of GDPR it violates.

Implementation: The project will involve understanding the requirements of consent under GDPR, and evaluating how they are being violated on the internet at large. The project will create a browser extension - which requires knowledge of javascript and web development - to capture the problematic parts of a consent dialogue and tag it with common violations that are linked to specific requirements in GDPR.

This project intends to address these by identifying the different mechanisms used to collect consent, establish their relation to existing consent requirements, and compile likely violations of legal requirements. Initially, researcher(s) will visit websites, identify elements used for consent, record their use, and establish which consent requirements in GDPR are applicable to the design element. Later, the researcher(s) will analyse the collected data for commonality in design elements, and report on likely violations of the legal requirements. The primary articles guiding this work are [4] and [8].

References
[1] D. Machuletz and R. Böhme, “Multiple Purposes, Multiple Problems: A User Study of Consent Dialogs after GDPR,” arXiv:1908.10048 [cs] , Aug. 2019, Accessed: Sep. 02, 2019. [Online] . Available: http://arxiv.org/abs/1908.10048.
[2] C. Utz, M. Degeling, S. Fahl, F. Schaub, and T. Holz, “(Un)informed Consent: Studying GDPR Consent Notices in the Field,” in ACM SIGSAC Conference on Computer and Communications Security (CCS’19), London, United Kingdom, Nov. 2019, p. 18.
[3] I. Fouad, C. Santos, F. A. Kassar, N. Bielova, and S. Calzavara, “On Compliance of Cookie Purposes with the Purpose Specification Principle,” in IWPE, 2020, p. 9.
[4] S. Human and F. Cech, “A Human-centric Perspective on Digital Consenting: The Case of GAFAM,” presented at the Human Centred Intelligent Systems 2020, Split, Croatia, 2020, Accessed: Jun. 08, 2020. [Online] . Available: https://epub.wu.ac.at/7523/.
[5] C. Matte, N. Bielova, and C. Santos, “Do Cookie Banners Respect my Choice?,” in 41st IEEE Symposium on Security and Privacy, 2020, p. 19, [Online] . Available: http://www-sop.inria.fr/members/Nataliia.Bielova/papers/Matt-etal-20-SP.pdf.
[6] C. Matte, C. Santos, and N. Bielova, “Purposes in IAB Europe’s TCF: which legal basis and how are they used by advertisers?,” presented at the Annual Privacy Forum (APF 2020), Oct. 2020, Accessed: May 27, 2020. [Online] . Available: https://hal.inria.fr/hal-02566891.
[7] M. Nouwens, I. Liccardi, M. Veale, D. Karger, and L. Kagal, “Dark Patterns after the GDPR: Scraping Consent Pop-ups and Demonstrating their Influence,” Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–13, Apr. 2020, doi: 10/ggx9vq.
[8] C. Santos, N. Bielova, and C. Matte, “Are cookie banners indeed compliant with the law? Deciphering EU legal requirements on consent and technical means to verify compliance of cookie banners,” HAL, p. 75, Jun. 2020, [Online] . Available: https://hal.inria.fr/hal-02875447.
[9] “Guidance note on cookies and other tracking technologies,” Data Protection Commission, Ireland, Dublin, Ireland, Apr. 2020. Accessed: Jun. 08, 2020. [Online] . Available: https://www.dataprotection.ie/sites/default/files/uploads/2020-04/Guidance%20note%20on%20cookies%20and%20other%20tracking%20technologies.pdf.
[10] “Report by the Data Protection Commission on the use of cookies and other tracking technologies (revised),” Data Protection Commission, Ireland, Dublin, Ireland, Apr. 2020. Accessed: Jun. 08, 2020. [Online] . Available: https://www.dataprotection.ie/sites/default/files/uploads/2020-04/Data%20Protection%20Commission%20cookies%20sweep%20REVISED%2015%20April%202020%20v.01.pdf.

Testing 'benefits' of consent

#consent #GDPR #privacy #ads

tldr; giving consent is assumed to provide some benefit, investigate what the benefit is in terms of extent and effectiveness

When an individual provides their consent to the collection, use, and sharing of their personal data - one question is regarding what benefit or value they receive in return [3]. In the context of websites and consent, the purported benefit is commonly associated with the individual receiving improvements to their experience such as personalisation in the content or ads they see [1]. However, there is no tangible or demonstrable way to perceive the difference in such benefits when consent is given as compared to when it is refused. In addition to these, the process of giving consent is complicated and unclear to individuals [2] and uses purpose descriptions [4] which further obscures its transparency and impact.

To investigate this situation further, this work requires the researcher(s) to visit websites and interact with consent dialogues to identify and record: (a) whether benefits exist in return for consenting; (b) whether they are clear and comprehensible; (c) can they be seen or demonstrated; and (d) what ‘value’ does it provide to the individual. For example, when a website says that the consent will enable personalisation in ads shown, does it also clarify where the ads will be shown, in what form and manner, and whether the individual can identify ads being influenced by their consent. Through such analysis, the work seeks to determine what 'value' is obtained in lieu of consent, its scope and form, and whether the impact of consent is transparent to the individual. The primary articles guiding this work are [4] for describing the purposes of consent and [2] for establishing the human-centric view in terms of understanding and comprehension when consenting.

References
[1] S. J. De and A. Imine, “Consent for targeted advertising: the case of Facebook,” AI & Soc, May 2020, doi: 10/ggzp38.
[2] S. Human and F. Cech, “A Human-centric Perspective on Digital Consenting: The Case of GAFAM,” presented at the Human Centred Intelligent Systems 2020, Split, Croatia, 2020, Accessed: Jun. 08, 2020. [Online]. Available: https://epub.wu.ac.at/7523/.
[3] G. Malgieri and B. Custers, “Pricing privacy – the right to know the value of your personal data,” Computer Law & Security Review, vol. 34, no. 2, pp. 289–303, Apr. 2018, doi: 10/gc7nbt.
[4] C. Matte, C. Santos, and N. Bielova, “Purposes in IAB Europe’s TCF: which legal basis and how are they used by advertisers?,” presented at the Annual Privacy Forum (APF 2020), Oct. 2020, Accessed: May 27, 2020. [Online]. Available: https://hal.inria.fr/hal-02566891.

Annotated Privacy Information Dataset of Apps from iOS and Android App-Stores

#smartphone #web-scraping #semantics #python #java #RDF

Tldr; Scrape the app’s information from app store and automatically annotate it with information relevant for privacy

Motivation: Apple’s iOS and Google’s Android are the dominant smartphone OSs, and through their respective App-Stores are responsible for providing the infrastructure and functionality to users for installing and managing applications on their devices. Increasingly, these companies are creating requirements for applications to declare information about their privacy practices, which not only include a privacy policy, but also information on what kinds of data the apps collect and how they use it [1]. Collecting this information in a machine-readable dataset can enable understanding current practices, pitfalls, and the privacy practices of apps - as well as tracking how they evolve with time.

Implementation: You will utilise a web-scraping [2] tool to parse the information from app-store pages. This can be written in any language of choice, though python offers a large collection of ready-to-use frameworks. The information extracted from these pages will be defined using the Data Privacy Vocabulary (DPV) [3], a metadata specification for declaring how data is used. The formal representation used for DPV is the Resource Description Framework (RDF), which you will become familiar with in order to read and write it for generating the dataset / corpus. The actual data will be stored and managed using a relational database and SQL, such that it can be extracted and exported to RDF to provide interoperability.

What you will learn: (i) How Apps utilise personal data; (ii) How to do web-scraping; (iii) How to work with concepts and semantics, e.g. when creating schemas; (iv) Creating and managing datasets as research resources; (v) What/Where more information is needed in App Stores

References:
[1] https://developer.apple.com/app-store/app-privacy-details/
[2] https://en.wikipedia.org/wiki/Web_scraping
[3] http://w3.org/ns/dpv

A Privacy Signal for Automating Consent Interactions

#privacy #browser #signaling #internet #GDPR #BrowserAddon #Javascript

tldr; Creating a browser signal that helps automate some of the interactions on notices and consent dialogues online

Motivation: Consent and cookie dialogues are a plague on the web - they’re there on every website, and most people click on the ‘Agree’ button without reading or understanding what they just agreed to. Even though this has been shown to violate data protection and privacy laws [1], enforcement takes time and is difficult to undertake at the scale of the web. Instead, there is a growing call for easier-to-use and enforce ‘signals’ which indicate privacy preferences in a human-centric manner. Previously, Do Not Track (DNT) and Platform for Privacy Preferences (P3P) were two major efforts which failed to gain adoption. The current ones showing promise are Global Privacy Control (GPC) [2] - which can prohibit any further sharing of data, and the Advanced Data Protection Control (ADPC) [3] which can enable a machine-readable request and indicating of preferences to permit or prohibit certain actions. While uptake for both increases, there are two important areas of implementation and research - (1) what language to use within the signals so that both users and websites interpret it in the same manner; and (2) how to manage these within the browser.

Implementation: You will understand DNT, P3P, GPC and ADPC specifications. They’re fairly technical which means you will also learn about how they work through HTTP communication protocols. You will then explore how ADPC can be implemented using HTTP. Within ADPC, a language is necessary for expressing whether something is permitted or prohibited and what it is (e.g. a purpose, some personal data). For expressing these concepts, you will use the Data Privacy Vocabulary (DPV) [4], a metadata specification for declaring how data is used. To test the developed signal, you will create browser addons and a (sample) backend server for creating and managing the preferences set within ADPC, and testing the communication between users and websites.

What you will learn: (i) What information is relevant when making privacy decisions about personal data use and sharing; (ii) HTTP protocols and how the web functions; (iii) How to express permissions or prohibitions for data sharing; (iv) Legal and Privacy implications of permitting or prohibiting websites from using data; (v) The next generation of privacy signals within the browser.

References:
[1] Do Cookie Banners Respect my Choice?: Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework https://arxiv.org/pdf/1911.09964
[2] https://globalprivacycontrol.github.io/gpc-spec/
[3] https://www.dataprotectioncontrol.org/spec/
[4] https://w3id.org/dpv

Programmatic Privacy Notices and Dialogues

#privacy #consent #cookies #Browser #signals #internet #GDPR #Javascript

Tldr; Creating APIs for programmatically generating notices and dialogues, as those used for consent and cookies on websites, to be developed from metadata (e.g. JSON) to avoid known issues (e.g. dark patterns).

Motivation: Consent and cookie dialogues are a plague on the web - they’re there on every website, and most people click on the ‘Agree’ button without reading or understanding what they just agreed to. Even though this has been shown to violate data protection and privacy laws [1], enforcement takes time and is difficult to undertake at the scale of the web. It is difficult to create a singular solution that is acceptable to all parties (users, service providers, authorities), which makes it difficult to achieve common goals. This research explores how a web browser can provide a set of APIs to generate privacy notices and dialogues on user-side, with different methods or options offering various levels of controls to the websites and users to control their interactions.

Implementation: You will understand how current privacy notices and consent dialogues function in terms of information, content, legal requirements, and technical implementations. You will then identify different types of APIs that can automate the generation of different components (e.g. API for notice, for showing options, for giving consent). You will implement these using CSS and JS, and test them using browser addons.

What you will learn: (i) What are privacy notices and consent dialogues (ii); How to create and implement APIs based on different stakeholder requirements (iii) Legal and Privacy implications of your developed technologies (iv) Challenges associated with developing privacy solutions

Implementing Privacy Signals in Browsers

#privacy #browser #internet #signals #BrowserAddon #Javascript #RDF #GDPR

Tldr; Create and analyse different methods and their impact for using and managing ADPC signals to indicate user’s privacy preferences using basic Web HTTP communication methods

Motivation: Consent and cookie dialogues are a plague on the web - they’re there on every website, and most people click on the ‘Agree’ button without reading or understanding what they just agreed to. Even though this has been shown to violate data protection and privacy laws [1], enforcement takes time and is difficult to undertake at the scale of the web. Instead, there is a growing call for easier-to-use and enforce ‘signals’ which indicate privacy preferences in a human-centric manner. The two important ones are Global Privacy Control (GPC) [2] which can prohibit any further sharing of data, and the Advanced Data Protection Control (ADPC) [3] which can enable a machine-readable request and indicating of preferences to permit or prohibit certain actions. While uptake for both increases, there are two important areas of implementation and research - (1) what language to use within the signals so that both users and websites interpret it in the same manner; and (2) how to manage these within the browser.

Implementation: You will understand both ADPC and GPC specifications. They’re fairly technical which means you will also learn HTTP communication protocols. You will then explore how ADPC can be implemented using HTTP. Within ADPC, a language is necessary for expressing whether something is permitted or prohibited and what it is (e.g. a purpose, some personal data). For expressing these concepts, you will use the Data Privacy Vocabulary (DPV) [4], , a metadata specification for declaring how data is used. The formal representation used for DPV is the Resource Description Framework (RDF), which you will become familiar with in order to read and write it for generating the dataset / corpus. To test the ADPC, you will create simple addons and backend server for creating and managing the preferences set within ADPC, and testing the communication between users and websites. You will explore different ADPC iterations in terms of how they affect the size (HTTP signals are expected to be small), their impact in terms of preferences (how much can we express), and what kinds of information can be sent this way (e.g. we can only share preference for personal data, but not who it is shared with).

What you will learn: (i) What information is relevant when making privacy decisions about personal data use and sharing; (ii) HTTP protocols and how the web functions; (iii) How to express permissions or prohibitions for data sharing; (iv) Legal and Privacy implications of permitting or prohibiting websites from using data; (v) The next generation of privacy signals within the browser.

References:
[1] Do Cookie Banners Respect my Choice?: Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework https://arxiv.org/pdf/1911.09964
[2] https://globalprivacycontrol.github.io/gpc-spec/
[3] https://www.dataprotectioncontrol.org/spec/
[4] http://w3.org/ns/dpv

Recording online consent via browser extension

#privacy #consent #javascript #GDPR

Description: The "I Agree" button has become inescapable while browsing the web. While it is present as a legal requirement for collecting consent, once we have clicked the button, we have no record of what we just agreed to. In this project, you will be creating a digital receipt to record the given consent and the information associated with it.

The goal is to create a browser extension that automatically captures the information in a consent dialogue box, and enables the user to later view it in a dashboard. It will use existing standards such as Consent Receipt [1] and Data Privacy Vocabulary [2] to record this information.

This project will provide exposure on front-end development in real-world websites, and an opportunity for increased transparency online regarding privacy. It will also provide a learning experience for use of programming tools (e.g. git) and research based workflows.

Pre-requisites: Good working knowledge of Javascript/CSS and its use in web-pages

[1] Consent Receipt https://kantarainitiative.org/confluence/display/infosharing/Consent+Receipt+Specification
[2] DPV http://w3.org/ns/dpv

Recording online agreement via browser extension

#privacy #privacy-policy #javascript #GDPR

Agreeing with a privacy policy involves reading the policy legally, but few utilise the time or opportunity to do so. However, once clicked, the button conveys consent attached to the privacy policy and the person is not offered the opportunity or proof to record what they have agreed with. In this project, you will be creating a digital receipt to record the acceptance of a privacy policy by capturing the context in which the button was clicked as well as the privacy policy associated with it.

The goal is to create a browser extension that can assist the person in recording their agreements and the related policies by saving them as a 'notice receipt' within the browser. The saved receipts can then be viewed and interacted with via a dashboard. The project will use existing vocabularies such as the DPV [1] and Notice Receipt Schema [2] to record the information.

This project will provide exposure in front-end development in real-world websites, as well as an opportunity to work towards increased transparency and accountability in the online transactions of privacy. The project will also provide a learning experience with use of programming tools (e.g. git) and being involved in a research project.

[1] DPV http://w3.org/ns/dpv
[2] OPN: Open Notice Receipt Schema http://ceur-ws.org/Vol-2451/paper-21.pdf

Automated Privacy Policy Generation Using Metadata and Templates

#privacy #semantics #law #python #RDF #GDPR

Tldr; Create different variations of privacy policies in terms of text and design by using templates and metadata containing the required information

Motivation: Privacy policies and Terms and Conditions, as they are presented on the web, are a long boring wall of legal text which is difficult to comprehend and use. There have been various avenues for making this simpler, such as summarising [1], or using machine learning to identify relevant information [2], and even alternatives such as visualising information [3]. However, instead of starting from a fixed given set of complicated text or a fixed layout, this research instead takes a different approach. It explores whether using metadata and a set of templates can make privacy policies easier to generate, offer different options to comprehend them, and enable alternative mediums such as visualisations to be easily implemented. It thus aims to show that metadata and automation can help people better understand privacy practices.

Implementation: You will analyse how privacy policies look, read, and affect comprehension of information through existing literature on these topics. You will then create ‘templates’ for policies - where a template is (simplified) some generic document with blanks that will be filled with use-case specific information. The template will be defined using a ‘templating library’ such as Jinja2 or Moustache. You will create different templates for rendering the same information in different sentences, layouts, forms (e.g. visual, multimedia). The information to be used to fill in the template will be declared using Data Privacy Specification (DPV), a metadata specification for declaring how data is used. The formal representation used for DPV is the Resource Description Framework (RDF), which you will become familiar with in order to read and write it for generating the dataset / corpus. The actual data will be stored and managed using JSON or JSON-LD which makes it easier to use in the web browser and in javascript.

What you will learn: (i) Issues with existing privacy policies; (ii) How to automate documentation using templates; (iii) What information is relevant for understanding privacy and legal compliance; (iv) How people comprehend information; (v) How to query and use linked data in a web application.

References:
[1] https://tosdr.org/
[2] https://pribot.org/polisis
[3] Privacy CURE: Consent Comprehension Made Easy https://www.specialprivacy.eu/images/documents/IFIP_SEC_2020.pdf
[4] http://w3.org/ns/dpv

An Ad-blocker for Cookie and Consent Dialogues online

#consent #GDPR #Javascript #BrowserAddon #RDF

Motivation: The GDPR’s extensive requirements for valid consent have filled the internet with consent dialogues that are a persistent annoyance to the web. The websites try to request consent because they are legally required to do so, and in the process use several deceptive and manipulative practices to force the individual to give consent. This project involves creating a browser extension similar to an ad-blocker that will block the consent-requests. The project will also study the effect of such blocking on the actions of the website. In principle, the websites are only supposed to collect and process data after consent. Therefore, the blocking of consent dialogues should result in no data collection.

Implementation: The project will involve understanding the requirements of how consent is requested on the internet and the mechanisms of consent dialogues on websites. The project will create a browser extension - which requires knowledge of javascript and web development - to capture the problematic parts of a consent dialogue and tag it with common violations that are linked to specific requirements in GDPR.

Developing schemas for data interoperability between Facebook and Twitter

#GDPR #Facebook #Twitter #schemas #ontology #Java #Python #Javascript #RDF

Motivation: The GDPR has provided the Right to Data Portability - under which a user can request a copy of their provided data and can also request that data to be transferred from one service to another. This was intended to foster interoperability and data exchange between different services, thereby fostering more innovation through competition. However, to date there has been no significant progress in this effort. The Data Transfer project started by IT giants (Google, Facebook, Apple, Twitter, Microsoft) in 2018 has yet to bear fruit. With this as the backdrop, the aim of this project is to create a tool that will allow importing/exporting data - with Facebook and Twitter used as specific use-cases.

Implementation: The project involves creating ‘schemas’ for Facebook and Twitter, and developing a tool to convert data between them. This is based on utilising existing research efforts within the ADAPT Centre which use semantic web technologies (RDF, SPARQL, Mappings) to enable this work: see https://doi.org/10.5281/zenodo.4029338. Apart from developing the schemas, the project will also develop the tool that transforms data between ‘Facebook’ and ‘Twitter’ schemas, and create a data pipeline to automate this task and show its feasibility at large scales. The technologies involved in this are Java, Python, Javascript (web-development) with knowledge of Semantic Web and schemas an added bonus.

Assisting with Ethical Clearance in Universities

#ethics #javascript

Description: Modern organisations must take into account GDPR concerns when conducting data collection and in universities this role is taken by the Ethics committee and its sub-committees. This project will develop a new stand-alone web-based tool to help students completing Ethics/GDPR applications for their projects. It is planned to create a high usability web interface that simplifies the current form and known patterns of use, e.g. a simple survey, to guide students through the form generation process. The outcomes will be a transportable JSON object that expresses the student’s wishes in a machine-readable way for further stages in an approval workflow.

The project involves work on building a web dashboard for assisting researchers with documentation for data protection and ethical clearance, and will provide an opportunity to participate in real-world applications of technology in the areas of data protection and ethics. The chief task would be to build an information system that will allow users to receive suggestions and guidance for addressing risks regarding ethics and privacy.

Extracting Structured Metadata From Privacy Policies

#privacy #privacy-policy #NLP #ML

Privacy policies are notoriously difficult to read. One of the challenges is the use of legal and intentionally obfuscating language. Though the GDPR has made it a legal requirement to make use of clear languages in policies, there is yet a barrier towards effective transparency regarding the information presented in such policies. This project aims to extract information from the policy, such as - sources of data, their requirement in processes, legal basis, storage periods - and express it as structured metadata for use in research that aims to simplify privacy policies via techniques such as summarisation and visualisation.

The project goal is to use NLP techniques to identify relevant information by using classifiers and ML [1][2][3], which would enable extraction of information from the text of privacy policies, and to represent it using vocabularies such as DPV [4] and GDPRov [5].

[1] Usable Privacy Project
[2] Pribot
[3] CLAUDETTE
[4] DPV
[5] GDPRov

Extracting Structured Metadata from Consent Dialogues

#privacy #consent #NLP #ML

Consent dialogue boxes are everywhere on the web - with the information geared towards making it easy for users to comprehend how their personal data is being used. However, this information is presented in human-readable format with no way for machines to analyse it. The aim of this project is to extract this information and represent it as structured metadata to enable automation and analysis of privacy based approaches. For example, the statement "We use your address to deliver goods you buy on our website" can be represented by: address - personal data, deliver goods - purpose, use - processing.

The project goal is to use NLP techniques to identify such categories by using classifiers and ML similar to existing work regarding privacy policies [1][2][3]. The extracted information would then be represented using vocabularies such as DPV [4].

[1] Usable Privacy Project
[2] Pribot
[3] CLAUDETTE
[4] DPV

Categorising News by Topic

#news #NLP #ML

When a new product is launched, such as the latest Apple iPhone, the news is dominated by articles targetting one product. This project investigates combining such related news into one coherent summary or article for better presentation to the reader. It will use NLP to identify categories and topics in a news article, and combine related news article together in a corpus. It will then extract novel details from different articles and combine the results into a single summary.

Finding Related News Articles For Issues on Privacy

#news #NLP #ML

Every issue raised about privacy has several related news articles which offer evidence or commentary on it. These are often isolated and collecting them requires significant effort and time. The aim of this project is to assist in the finding of relevant news article given a privacy issue. For example, given the topic 'cloud data storage', news article of relevance include those about 'data breaches', 'cloud security', 'legal obligations in a jurisdiction'. The identification requires creation of a taxonomy for privacy issues and using NLP to identify and categorise news articles based on concepts within the taxonomy.

Scoring Privacy Policies For Transparency and Readability

#privacy #privacy-policy #NLP #ML

Privacy policies are notoriously difficult to read and understand, chiefly because of the obfuscated legal language used and the confusing structure. Though the GDPR has strived to provide more transparency in the language used, there is no measurement of how to evaluate such policies. The aim of this project is to identify metrics for transparency and readability in the privacy policy and to score a given policy using them. An example metrics could be categorisation of information, where the policy has separate structures explaining data collection, sharing, etc. The project will use NLP to identify relevant clasues in the text of the policy, and ML to classify the policy using generated metrics.

autoDIXIT: Generating Clues for DIXIT from Image Analysis

#ML #NLP #ImageAnalysis #python

DIXIT is a game where cards containing images are displayed to the players along with a clue and the players have to identify the correct card matching the clue. The aim of the project is to generate such clues for a given set of images using image analysis. This involves identification of the contents of an image, matching it with popular trivia such as a song or a movie, and generating the clue using NLP.

Generating a Privacy Policy Corpus

#python #privacy #privacy-policy

The analysis of privacy on the web is highly dependant on the privacy policies made available on websites. However, such policies neither have a fixed address nor structure. Therefore, research involving them first needs to identify the privacy policies and collect them, which is a time-consuming task. This project will create a web crawler that identifies and saves privacy policies from websites. The crawler will generate a corpus of privacy policies from the web by identifying the URL of a privacy policy in a given website, extracting and cleaning its text, and saving it in an interoperable format for future use.

Summarising Event Coverage from Tweets

#twitter #NLP #ML

It has become nearly customary to tweet when an event is going on. Such tweets provide coverage of the event and offer valuable crowd-sourced insights. The aim of the project is to collect and analyse tweets for a given event. An example could be a conference, where attendees tweet about the speakers, venue, food, coffee, and also their opinions. Therefore, the project will also involve identification and categorisation of the tweets along these topics. These tweets will then be summarised in a single article offering an overview of the event.

Evaluating TRI over Tweets

#twitter #NLP #ML

Temporal Random Indexing (TRI) [1] is a technique that maps words into vectors (word spaces) along different time periods to enable analysis of how the meaning of words changes over time. This project will apply TRI to a selective corpus of tweets to identify how words and concepts change in use over a popular social platform. An application is identification of culturally relevant words or topics which spike in usage over a period by being associated with different contexts, such as the use of words in Twitter trends [2][3].

[1] "Analysing Word Meaning over Time by Exploiting Temporal Random Indexing" P. Basille, A. Caputo, and G. Semeraro. CLiC-it 2014 http://clic.humnet.unipi.it/proceedings/Proceedings-CLICit-2014.pdf%22%7D
[2] https://trends24.in/
[3] https://trendogate.com/

Unfolding summarised text to generate articles

#NLP #ML #python

Text analysis techniques can now summarise an article with satisfying efficiency. The advantages of summarised text are its short reading time and concentration of important details. This project investigates the creation of larger articles from a given summary by identifying the context of the article and retrieving additional information about it from the web. This requires the use of NLP to identify topics of relevance within the summary, and ML to retrieve relevant information based from an existing corpus or source such as news articles.

Crumple: Folding Privacy Policies via Summaries

#privacy #privacy-policy #python #NLP

Privacy policies can be made easier to digest if they are provided as efficient summaries which the users can read and understand quickly. This project will attempt to assist in the understanding of a privacy policy by abstracting or folding larger sections into shorter summaries. This will done by analysing the text using NLP and identifying relevant information to provide a summary.

Using AI to detect AI biases

#ethics #ML #python

Bias in the use of AI is a rising cause of concern and one of the major ethical hurdles in adoption of new technologies. While bias can occur in many forms, one specific form is skewing the outcome with a positive or negative bias - such as reducing the likelihood of women getting jobs because of their sex. While such biases are difficult to detect and address due to their invisibility and the complexities of a system, one possible answer would be to represent a system as a black box which takes an input and produces some output. This project investigates the approach of detecting biases in an algorithm by providing inputs and measuring the statistical likelihood of outcomes. The outcome of the project will be a tool for analysing system logs consisting of inputs and outputs and detecting potential biases encoded within them.

Cookie Monster: Detecting Privacy-invasive cookies in browser

#privacy #javascript

Cookies are tiny little packets of data stored in the browser for a variety of reasons - from preserving your login session to saving your form data and shopping cart. However, they have also been used nefariously to track users across the web, and has given them a negative perception. Laws such as the ePrivacy directive and GDPR require such cookies to be transparent about their purpose, mainly through the cookie notice. However, the user does not have the technological capabilities nor expertise to inspect whether such notices are consistent with the use of cookies. This project will target user privacy in a browser by providing a way for users to analyse the different cookies stored and investigate their applications and purposes. This will involve building a browser extension that collects cookies and analyses them to identify known trackers and bad actors and produce a report in a dashboard for the user.

Attaching Risk Factors to Consent Dialogues

#privacy #ethics #consent #javascript

The notion of informed consent in GDPR requires the individual to also be notified about potential risks associated with the processing of their personal data. However, consent dialogues contain only information associated with use of personal data and do not provide any information about the risks of sharing that information. The aim of this project is to attach risks with appropriate information within a consent dialogue to enable the user to make a balanced judgement about their consent and use of personal data. The project involves detecting categories of information within a consent dialogue and associating an existing corpus of privacy risks by visually annotating the consent dialogue.

Python Engine for Prose Objects

#python

Prose Objects [1], part of the Common Accord project [2], are a form of templates for legal documents that provide a consistent and machine-readable representation for policies and contracts. The aim is assist in the creation of legal documents in a consistent manner. In this, a legal document is represented as a form, which can be populated using metadata defined externally, such as in a JSON file. This project involves the construction of a python engine for generating prose documents. This will involve reading metadata from files in JSON and Markdown format, and applying them to legal documents represented using template systems such as Jinja2.

[1] Prose Objects https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2925871
[2] Common Accord http://www.commonaccord.org/