There’s an old story that a reporter would always know when the US military was planning a big operation because they would order pizzas in the evening to support their secret late-night planning sessions. Similarly, during the cold war, Soviet intelligence agent Yuri Totrov could distinguish diplomats from CIA agents using data like pay scale, recruitment age, education, naturalization, and where they worked when they returned home from foreign postings.
What do these stories have in common? “Aggregating” or combining data from multiple sources can actually reveal surprisingly specific information. You might not work for the Pentagon, but your data can be aggregated in the same way to de-anonymize you. Here’s a small collection of these surprising privacy failures:
- The Classic Paper – “Simple Demographics Often Identify People Uniquely” shows that knowing just birth date, gender, and zip code is enough to uniquely identify most people.
- Netflix Debacle – A more recent example that also has become a classic cautionary tale: An “anonymous” Netflix dataset was de-anonymized by correlating it with the IMDB database. This is discussed in the paper “Robust De-anonymization of Large Sparse Datasets” and of course the related lawsuit. Arvind Narayanan and his co-researchers also did work on social networks and writing style, discussed next.
- Social Exposure – “De-anonymizing social networks” demonstrates that an “anonymous” Twitter graph can be re-identified using Flickr for auxiliary information. (Also by Arvind Narayanan.)
- Your Words Betray You – Your choice of words in writing can be analyzed to uniquely identify you according to the paper “On the feasibility of Internet-Scale Author Identification” – “Consider two words that are nearly interchangeable, say ‘since’ and ‘because’. Different people use the two words in a differing proportion. By comparing the relative frequency of the two words, you get a little bit of information about a person, typically under 1 bit. But by putting together enough of these ‘markers’, you can construct a profile.” (Also by Arvind Narayanan.)
- Picture’s Worth 1000 Words – A supposedly anonymous NYC Taxi Cab Database nevertheless included enough information that the authors could determine celebrities’ destinations based on photographs of where they got into taxies and the home address of frequent visitors to “gentlemen’s clubs”.
- Just One Night – Uber once published a blog post (now removed) that demonstrated it could detect when its riders had one night stands.
- Strike at the Heart – Fitbit’s heart rate monitor collects data about its users and stores it in their cloud. They recently published the results of its analysis of the super bowl that showed spikes when the big plays happened. This didn’t identify any users, but we note that if someone had access to an individual’s heart rate data, they could determine (based on Fitbit’s graph), whether they were watching the super bowl. What else could be aggregated from your heart rate?
- Location, Location, Location – Got any apps that collect GPS data? The traces of your location, even your approximate location, is pretty unique. This is outlined in the paper, “Unique in the crowd, the privacy bounds of human mobility“.
- Show Me Your Money – Bitcoin is often thought of as an anonymous currency, but it’s surprisingly non-anonymous, considering its reputation. This is because a lot of information is contained in the “public ledger” that records all transactions. See also the paper “An analysis of Anonymity in the Bitcoin System“.
- You Think You’re Safe – Think you increase your privacy by blocking cookies? Browser fingerprinting circumvents that protection by noting unique attributes about your browser. Ironically, installing privacy software like AdBlock is rare enough that it actually might make it easier to uniquely identify you. Test your browser here.
These are just some examples of the data aggregation threats being talked about – imagine the ones that aren’t. It’s stories like these that motivate us here at Tozny to create a world with better privacy and security for users. Sign up to be one of the first testers of Tozny’s new privacy-protection product if you feel the same way.