Atlanta | Aug. 16, 2018
A team of students from the Colleges of Computing, Engineering, and Sciences is asking for your vote. They’re concerned about privacy – but not at the polls. They’re campaigning to protect personal data from unnecessary disclosure during research and also to protect researchers’ access to meaningful data.
They’re one of four national finalists for the People’s Choice award of $5,000 in a contest by the National Institute of Standards and Technology (NIST) Public Safety Communications Research Division (PSCR) and TopCoder, called “The Unlinkable Data Challenge: Advancing Methods in Differential Privacy.”
The big problem? Dramatic increases in computing power make it possible to combine and utilize data from multiple sources which may contain sensitive information about individuals. Once unrelated datasets are combined, previously “private” facts can be linked to easily identify the person behind them. A 2002 study by Latanya Sweeney of Harvard’s Data Privacy Lab found that the combination of just three “quasi-identifiers” (date of birth, five-digit postal code, and gender) could uniquely identify 87 percent of the U.S. population. Making minor changes to birth dates and other personally identifiable information still does not provide adequate protection against linkage attacks.
Contest organizers say this valid privacy concern is unfortunately limiting the use of data for research. Privacy engineers have been asked to create a new solution.
“Our proposed solution is to generate differentially private synthetic data using Generative Adversarial Networks (GANs)," explains Rachel Cummings (pictured - L), assistant professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Tech, and organizer of the Georgia Tech team. “This synthetic data can then be used for a variety of analysis tasks, including classification, regression, clustering, and answering unknown research questions. If the synthetic data are statistically similar to the original (sensitive) data, then analysis on the synthetic data should be accurate with respect to the original database.”
She says it can be achieved by privately training neural networks inside a GAN to generate new data points (drawn from the same distribution as the original data).
“By generating synthetic data privately, any future analysis on the data also would be private, due to the post-processing guarantees of differential privacy,” she says.
The idea builds from previous work on differentially private GANs to add further optimizations that will enhance performance across a wide variety of data types and analysis tasks.
The team includes: Digvijay Boob (PhD ISyE-ACO), Uthaipon Tantitongpipat (PhD CS-ACO), Kyle Zimmerman (MS Cybersecurity), Dhamma Kimpara (pictured - R, BS Math), and Chris Waites (BS CS). Students are members of Cummings’ weekly privacy reading group, and they worked together over the summer to submit their idea to the NIST contest.
In addition to the $5,000 People's Choice Award, the Georgia Tech team is eligible for a $40,000 Judges' Choice Award and, if successful, would advance to future rounds with the potential to win an additional $140,000 toward their research.