From Legal Codes to Coding: Computer Scientists Tackle Data Privacy Law
In 2016, the European Union passed the General Data Protection Regulation, or GDPR, to set privacy and security standards on how data can be collected and used by internet-based organizations. In 2018, California passed the California Consumer Privacy Act, or CCPA, which gives consumers more control over their personal information.
While these regulations are paramount to ensure consumer safety on the internet, they pose serious issues for developers who must update websites to meet the legal stipulations.
With required compliance now in effect, Mohammad Sadoghi, associate professor of computer science at the University of California, Davis, is collaborating with Faisal Nawab, assistant professor of computer science at the University of California, Irvine, to help systems developers make sense of these new laws, translate them into a language that they can understand, code compliance into new systems and find a way to adjust older systems without dismantling them entirely.
Sadoghi and Nawab have started tackling this issue with a $1.2 million grant from the National Science Foundation. They are partnering with the New Jersey Institute of Technology, or NJIT.
The Privacy Problem
Getting current systems compliant with regulations set by legislations like the GDPR and CCPA is incredibly important because it has to do with personal data — who's collecting it, who's using it and what's being done with it.
Sadoghi puts it into a sobering perspective:
"If you were to ask people, 'Would you get your journal that is full of your intimate thoughts and photocopy it and give it to everybody in your classroom?' Everyone would say, 'No,'" he said. "But without even knowing it, every single thing we do is out there, and it's being collected and used in any imaginable and unimaginable ways."
Everything from behavior prediction from social media posts to the watch monitoring your breathing. From the smart thermostat logging the temperature in your house to the algorithms on your streaming service absorbing what you are watching. It's being tracked. And prior to these regulations, data collection was similar to the Wild West: lawless.
"Imagine that this amount of information is being tracked without any clear guidelines or without proper awareness," said Sadoghi. "Who has the right to collect that information? Who has the right to analyze that information? And who has the right to derive things from that information? That's the challenge we are living with right now."
A Legal Translation
Legislation like the GDPR and the CCPA are establishing the precedent for protecting people's privacy and data. Understanding and interpreting the law can be tricky, however. In fact, many early violations to the regulations were because the technical teams doing the compliance didn't have a good understanding of what the regulations were asking from a legal perspective.
This is where Sadoghi and Nawab saw the need for a way to understand what the regulations mean and how they should be interpreted so computer scientists aren't making their own conclusions.
The first part of this project aims to turn these legal concepts into ideas that can be written as code.
Sadoghi and Nawab approached Stacy-Ann Elvy, a UC Davis law professor who specializes in the commercial law of privacy and its relationship to emergent technology and human rights law, to help understand exactly what the regulations were requiring and how to meet that requirement in a technical way.
During the first phase of the project, which began in September of 2021, Elvy frequently met with Sadoghi, Nawab and their team of researchers, supplementing their interpretations and guiding the project toward accurate ways of addressing the translation of legal ideas into ones and zeroes.
This was the first time Elvy, Sadoghi and Nawab had a reason to work across the disciplines of law and computer science, but Elvy can see these types of collaborations occurring more frequently as new legislation regulating internet privacy is introduced.
"I believe privacy law scholars in the legal academy are increasingly aware of the importance of collaborating with computer scientists," she said. "Both camps can learn from each other."
A Sidecar Approach to System Compliance
The second part of the project is figuring out how to build compatible systems that can be added to existing infrastructures and ensure the data protection regulations are being met. This begins to get complicated, Nawab explains, because data systems are extremely complex.
"Data systems are supporting billions of users, so information is not just stored in one place," he said. "It is usually stored with different levels of hierarchy in different machines and different regions of the world and in different formats."
For instance, a piece of collected data may be stored in its raw format in one location, but it could also be used to teach a machine-learning model somewhere else. So, while a regulation like "delete the data if the user requests it to be deleted," may seem simple, it actually has many implications because of the complexity of how data is stored.
Take what is called "the right to be forgotten." The right to be forgotten concept argues that even if a user has given their consent to have their data processed and stored, they still have the right to opt out of their data being used and request for their data to be deleted from all the services. To an outsider, it may be black-and-white: The data is either there or it is deleted. But, Nawab says, the reality is much grayer.
"Does deleting it mean that I don't have the ability to access the data? Does it mean that it's deleted from the storage unit? Does it mean that it needs to be deleted from all copies around the world that contain different copies of this data?"
The systems, in this case, need the functionalities to delete all iterations of the user's data to be compliant. However, many companies' business models were built on acquiring as much user data as possible to use later, whether for more information or to sell to an interested party. That level of access to data, the researchers say, is no longer appropriate, and these existing infrastructures need to be made compliant with current regulations.
Sadoghi, Nawab and their collaborators — UCI professors Sharad Mehrotra and Nalini Venkatasubramanian and NJIT assistant professor Shantanu Sharma, all of computer science — are working on a "sidecar" approach to this problem. Instead of building a new system or completely overhauling an existing system, they aim to add a layer on top of the existing system databases that will ensure compliance.
The main challenge for this sidecar is building a framework that can interact with multiple data processing ecosystems and establish solutions amid that diversity. One possible solution is to use a blockchain system as a governance platform for the sidecar. A blockchain is an immutable database that contains information that can be used and shared within a decentralized network.
Sadoghi has been working on an open-source blockchain platform called Apache ResilientDB (Incubating) since 2018. In essence, the blockchain platform, ResilientDB in this case, would be used as a secure log to maintain the lineage and provenance of the data, as well as accountability, while the sidecar software is administering changes to the software.
"There is so much software and hardware out there, you cannot just disrupt everything you have in legacy and start from scratch," said Sadoghi. "This is decades of development. With this sidecar approach, we can start with one corner and slowly start augmenting and supplementing the system with this additional layer that allows turning an infrastructure that had no guarantees in terms of privacy into a fully compliant system without complete destruction of the ecosystem."
Law-Abiding Systems
These new regulations, which give the user more ownership over how their data is collected and used and require data systems to respect users' privacy, may be vague and "in the white space of interpretation," according to Sadoghi, but they are taking a step in the right direction by bringing this issue of data privacy to people's attention. However, establishing the law is only half the battle.
"It's great to have regulations and it's great to have this legal framework, but if systems cannot achieve these regulations, then they have no use," said Nawab. "We need to take those legal regulations and put the framework and the ecosystem together to make them a reality."