Are you a mathematician or data scientist interested in a new challenge? Then join this exciting data privacy competition with up to $150,000 in prizes, where participants will create new or improved differentially private synthetic data generation tools.
When a data set has important public value but contains sensitive personal information and can’t be directly shared with the public, privacy-preserving synthetic data tools solve the problem by producing new, artificial data that can serve as a practical replacement for the original sensitive data, with respect to common analytics tasks such as clustering, classification and regression.
By mathematically proving that a synthetic data generator satisfies the rigorous Differential Privacy guarantee, we can be confident that the synthetic data it produces won’t contain any information that can be traced back to specific individuals in the original data.
The “Differential Privacy Synthetic Data Challenge” will entail a sequence of three marathon matches run on the Topcoder platform, asking contestants to design and implement their own synthetic data generation algorithms, mathematically prove their algorithm satisfies differential privacy, and then enter it to compete against others’ algorithms on empirical accuracy over real data, with the prospect of advancing research in the field of Differential Privacy.
If you’re not a differential privacy expert, and you’d like to learn, join the Topcoder community for tutorials to help you catch up and compete! Join, learn, and compete for $150,000 in prizes!
How Important Is This?
This challenge is focused on proactively protecting individual privacy while allowing for public safety data to be used by researchers for positive purposes and outcomes. NIST’s PSCR (public safety communications research) has strong commitments to both public safety research and the preservation of security and privacy, including the use of de-identification.
There is no absolute protection that data will not be misused. Even a dataset that protects individual identities may, if it gets into the wrong hands, be used for ill purposes. Weaknesses in the security of the original data can threaten the privacy of individuals.
It is well known that privacy in data release is an important area for the Federal Government (which has an Open Data Policy), state governments, the public safety sector and many commercial non-governmental organizations. Developments coming out of this competition would hopefully drive major advances in the practical applications of differential privacy for these organizations.
The purpose of this series of competitions is to provide a platform for researchers to develop more advanced differentially private methods that can substantially improve the privacy protection and utility of the resulting datasets.
Get Involved – How to Participate
Note: All submissions for this challenge are being collected through the Topcoder website.
The Differential Privacy Synthetic Data Challenge is phase 2 of The Unlinkable Data Challenge: Advancing Methods in Differential Privacy, where competitors wrote concept papers to identify new approaches to de-identification and inform the final coding design of this challenge. Participants of this challenge will create an algorithm and participate in a sequence of Marathon Matches. Throughout each marathon match, participants will design and implement their own differentially private synthetic data generation algorithm, mathematically prove that their algorithm satisfies differential privacy, and enter it to compete against others’ algorithms on empirical accuracy over real data, with the prospect of advancing the understanding in the field of Differential Privacy.
This is a multi-phased contest with three marathon matches. Competitors may enter the contest at any point to participate between November 2018 and April 2019. Topcoder will bring the registrations from previous matches to the next matches.
The marathon mechanism will provide participants with immediate feedback about the quality of their submission using an online leaderboard, which allows teams to repeatedly improve and validate the capabilities of their algorithms through several phases of increasing difficulty. In each marathon, participants are able to make changes to their algorithm, team with other Topcoder members and watch their opponents move up and down the leaderboard. The final stage in each marathon match will be a sequestered stage where participants submit their final code to be rigorously tested and evaluated. Where a competitor’s algorithm falls with respect to the utility-privacy frontier curve will determine who wins at each marathon match.