Student Project Ideas
by Harshvardhan J. Pandit
academic projects students
click to filter by: all
Recording online consent via browser extension
Description: The "I Agree" button has become inescapable while browsing the web. While it is present as a legal requirement for collecting consent, once we have clicked the button, we have no record of what we just agreed to. In this project, you will be creating a digital receipt to record the given consent and the information associated with it.
The goal is to create a browser extension that automatically captures the information in a consent dialogue box, and enables the user to later view it in a dashboard. It will use existing standards such as Consent Receipt  and Data Privacy Vocabulary  to record this information.
This project will provide exposure on front-end development in real-world websites, and an opportunity for increased transparency online regarding privacy. It will also provide a learning experience for use of programming tools (e.g. git) and research based workflows.
 Consent Receipt https://kantarainitiative.org/confluence/display/infosharing/Consent+Receipt+Specification
 DPV http://w3.org/ns/dpv
Recording online agreement via browser extension
The goal is to create a browser extension that can assist the person in recording their agreements and the related policies by saving them as a 'notice receipt' within the browser. The saved receipts can then be viewed and interacted with via a dashboard. The project will use existing vocabularies such as the DPV  and Notice Receipt Schema  to record the information.
This project will provide exposure in front-end development in real-world websites, as well as an opportunity to work towards increased transparency and accountability in the online transactions of privacy. The project will also provide a learning experience with use of programming tools (e.g. git) and being involved in a research project.
#privacy #privacy-policy #python #GDPR
Description: Privacy Policies are too long, difficult to read, and complex to understand. There is a lot of ongoing research for using AI to make privacy policies easier to comprehend by identifying relevant information. This project approaches the problem from the other end - generating privacy policies for given information. Such a tool is useful for researchers to investigate the effect of layouts, language used, as well as for organisations to create simpler and more effective policies.
The goal would be to accept information in a structured format (e.g. via csv, JSON, or an old-fashioned form) and generate privacy policies for different use-cases (e.g. online website, shopping store). It will use existing vocabularies to represent the information, including Data Privacy Vocabulary , GDPRov , and GDPRtEXT .
The project will provide exposure on text-based programming techniques such as templating engines, as well as the state of privacy online. It will also provide a learning experience for use of programming tools (e.g. git) and research based workflows.
Assisting with Ethical Clearance in Universities
Description: Modern organisations must take into account GDPR concerns when conducting data collection and in universities this role is taken by the Ethics committee and its sub-committees. This project will develop a new stand-alone web-based tool to help students completing Ethics/GDPR applications for their projects. It is planned to create a high usability web interface that simplifies the current form and known patterns of use, e.g. a simple survey, to guide students through the form generation process. The outcomes will be a transportable JSON object that expresses the student’s wishes in a machine-readable way for further stages in an approval workflow.
The project involves work on building a web dashboard for assisting researchers with documentation for data protection and ethical clearance, and will provide an opportunity to participate in real-world applications of technology in the areas of data protection and ethics. The chief task would be to build an information system that will allow users to receive suggestions and guidance for addressing risks regarding ethics and privacy.
Extracting Structured Metadata From Privacy Policies
#privacy #privacy-policy #NLP #ML
Privacy policies are notoriously difficult to read. One of the challenges is the use of legal and intentionally obfuscating language. Though the GDPR has made it a legal requirement to make use of clear languages in policies, there is yet a barrier towards effective transparency regarding the information presented in such policies. This project aims to extract information from the policy, such as - sources of data, their requirement in processes, legal basis, storage periods - and express it as structured metadata for use in research that aims to simplify privacy policies via techniques such as summarisation and visualisation.
The project goal is to use NLP techniques to identify relevant information by using classifiers and ML , which would enable extraction of information from the text of privacy policies, and to represent it using vocabularies such as DPV  and GDPRov .
 Usable Privacy Project
Extracting Structured Metadata from Consent Dialogues
#privacy #consent #NLP #ML
Consent dialogue boxes are everywhere on the web - with the information geared towards making it easy for users to comprehend how their personal data is being used. However, this information is presented in human-readable format with no way for machines to analyse it. The aim of this project is to extract this information and represent it as structured metadata to enable automation and analysis of privacy based approaches. For example, the statement "We use your address to deliver goods you buy on our website" can be represented by: address - personal data, deliver goods - purpose, use - processing.
The project goal is to use NLP techniques to identify such categories by using classifiers and ML similar to existing work regarding privacy policies . The extracted information would then be represented using vocabularies such as DPV .
 Usable Privacy Project
Categorising News by Topic
#news #NLP #ML
When a new product is launched, such as the latest Apple iPhone, the news is dominated by articles targetting one product. This project investigates combining such related news into one coherent summary or article for better presentation to the reader. It will use NLP to identify categories and topics in a news article, and combine related news article together in a corpus. It will then extract novel details from different articles and combine the results into a single summary.
Finding Related News Articles For Issues on Privacy
#news #NLP #ML
Every issue raised about privacy has several related news articles which offer evidence or commentary on it. These are often isolated and collecting them requires significant effort and time. The aim of this project is to assist in the finding of relevant news article given a privacy issue. For example, given the topic 'cloud data storage', news article of relevance include those about 'data breaches', 'cloud security', 'legal obligations in a jurisdiction'. The identification requires creation of a taxonomy for privacy issues and using NLP to identify and categorise news articles based on concepts within the taxonomy.
Scoring Privacy Policies For Transparency and Readability
#privacy #privacy-policy #NLP #ML
autoDIXIT: Generating Clues for DIXIT from Image Analysis
#ML #NLP #ImageAnalysis #python
DIXIT is a game where cards containing images are displayed to the players along with a clue and the players have to identify the correct card matching the clue. The aim of the project is to generate such clues for a given set of images using image analysis. This involves identification of the contents of an image, matching it with popular trivia such as a song or a movie, and generating the clue using NLP.
#python #privacy #privacy-policy
Summarising Event Coverage from Tweets
#twitter #NLP #ML
It has become nearly customary to tweet when an event is going on. Such tweets provide coverage of the event and offer valuable crowd-sourced insights. The aim of the project is to collect and analyse tweets for a given event. An example could be a conference, where attendees tweet about the speakers, venue, food, coffee, and also their opinions. Therefore, the project will also involve identification and categorisation of the tweets along these topics. These tweets will then be summarised in a single article offering an overview of the event.
Evaluating TRI over Tweets
#twitter #NLP #ML
Temporal Random Indexing (TRI)  is a technique that maps words into vectors (word spaces) along different time periods to enable analysis of how the meaning of words changes over time. This project will apply TRI to a selective corpus of tweets to identify how words and concepts change in use over a popular social platform. An application is identification of culturally relevant words or topics which spike in usage over a period by being associated with different contexts, such as the use of words in Twitter trends .
 "Analysing Word Meaning over Time by Exploiting Temporal Random Indexing" P. Basille, A. Caputo, and G. Semeraro. CLiC-it 2014 http://clic.humnet.unipi.it/proceedings/Proceedings-CLICit-2014.pdf%22%7D
Unfolding summarised text to generate articles
#NLP #ML #python
Text analysis techniques can now summarise an article with satisfying efficiency. The advantages of summarised text are its short reading time and concentration of important details. This project investigates the creation of larger articles from a given summary by identifying the context of the article and retrieving additional information about it from the web. This requires the use of NLP to identify topics of relevance within the summary, and ML to retrieve relevant information based from an existing corpus or source such as news articles.
Crumple: Folding Privacy Policies via Summaries
#privacy #privacy-policy #python #NLP
Using AI to detect AI biases
#ethics #ML #python
Bias in the use of AI is a rising cause of concern and one of the major ethical hurdles in adoption of new technologies. While bias can occur in many forms, one specific form is skewing the outcome with a positive or negative bias - such as reducing the likelihood of women getting jobs because of their sex. While such biases are difficult to detect and address due to their invisibility and the complexities of a system, one possible answer would be to represent a system as a black box which takes an input and produces some output. This project investigates the approach of detecting biases in an algorithm by providing inputs and measuring the statistical likelihood of outcomes. The outcome of the project will be a tool for analysing system logs consisting of inputs and outputs and detecting potential biases encoded within them.
Cookie Monster: Detecting Privacy-invasive cookies in browser
Attaching Risk Factors to Consent Dialogues
The notion of informed consent in GDPR requires the individual to also be notified about potential risks associated with the processing of their personal data. However, consent dialogues contain only information associated with use of personal data and do not provide any information about the risks of sharing that information. The aim of this project is to attach risks with appropriate information within a consent dialogue to enable the user to make a balanced judgement about their consent and use of personal data. The project involves detecting categories of information within a consent dialogue and associating an existing corpus of privacy risks by visually annotating the consent dialogue.
Python Engine for Prose Objects
Prose Objects , part of the Common Accord project , are a form of templates for legal documents that provide a consistent and machine-readable representation for policies and contracts. The aim is assist in the creation of legal documents in a consistent manner. In this, a legal document is represented as a form, which can be populated using metadata defined externally, such as in a JSON file. This project involves the construction of a python engine for generating prose documents. This will involve reading metadata from files in JSON and Markdown format, and applying them to legal documents represented using template systems such as Jinja2.
 Prose Objects https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2925871
 Common Accord http://www.commonaccord.org/