Anonymous Data Might Not Be So Anonymous After All

A recent study found that with just four data points, 90 percent of individuals could be accurately re-identified

Big data is often viewed as a panacea: it can fix the healthcare industry, it can provide marketers with the best data, and it can allow social networks to grow exponentially. But given all the data leaks in recent years, credit card data in particular, big data has been proven invasive and made it possible to track people like never before, even in anonymized data sets.

Yves-Alexandre de Montjoye, a PhD student at MIT, performed a study on an anonymized set of credit card data, which represented three months of credit card records for 1.1 million people who shopped in 10,000 stores. The study, published in the journal Science, found that with just four data points — location, time, price, and one piece of outside information — 90 percent of individuals could be accurately re-identified.

Making the data less specific by widening price bands, or expanding the number of days considered as one data point, does little to prevent re-identification in anonymous data sets. In the widest possible resolution only 15 percent of users could be identified, but with 10 data points the rate was back up to 80 percent.

Even a little information can betray a user in an anonymous data set. Consider, as an example, a trip to a mall. Just posting a selfie with your coffee on Facebook could betray location, time and maybe even price. Pair that data with a credit card leak, and maybe some phishing, and more than half of Americans could be at risk for compromising their financial data.

The concern is that companies could use similar methods to gain knowledge of a user without their permission, which could result in consequences users couldn’t possibly anticipate.

Rebecca Herold, privacy consultant and author, told CNBC:

This also raises the question of how such data would be used within insurance actuarial calculations, insurance claims and adjustments, loan and mortgage application considerations, divorce proceedings.

Given the recent allegations that is leaking information about users’ tobacco use, marital status, and ZIP code, it’s not far fetched that anonymous data can be used to re-identify users. Users are also taking to healthcare apps, which could store and share even more intimate data about their lives with third parties.

Users have been struggling to attain anonymity online, in one form or another, and are constantly thwarted by poorly designed experiences and leaky metadata. And as long as companies remain in love with exchanging huge troves of metadata, users won’t really ever be able adequately protect themselves.

Top image courtesy of Shutterstock.