(Originally published January 19, 2017, updated July 24, 2017)
Apple has traditionally distinguished itself from its rivals, like Google and Facebook, by emphasizing its respect of user privacy. It has taken deliberate steps to avoid vacuuming up all of its users’ data, providing encryption at the device level as well as during data transmission. It has done so, however, at the cost of foregoing the benefits that pervasive data collection and analysis have to offer. Such benefits include improving on the growing and popular on-demand search and recommendation services, like Google Now and Microsoft’s Cortana and Amazon’s Echo. Like Apple’s Siri technology, these services act as a digital assistant, providing responses to search requests and making recommendations. Now Apple, pushing to remain competitive in this line of its business, is taking a new approach to privacy, in the form of differential privacy (DP).
Announced in June 2016 during Apple’s Worldwide Developers’ Conference in San Francisco, DP is, as Craig Federighi, senior vice president of software engineering, stated “a research topic in the area of statistics and data analytics that uses hashing, subsampling and noise injection to enable … crowdsourced learning while keeping the data of individual users completely private.” More simply put, DP is the statistical science of attempting to learn as much as possible about a group while learning as little as possible about any individual in it.
Following the announcement, Matt Green, a cryptographer and professor at Johns Hopkins University, published a blog post detailing the motivation behind DP and its implementation. Green then identified two challenges with differential privacy:
- The more information you seek from the data, the more “noise” needs to be introduced to retain privacy, calling out the fundamental tradeoff between privacy and accuracy.
- Once you’ve mined the noise-injected data to a certain point, it can no longer be mined without risking users’ privacy.
Ultimately Green was split on Apple’s announcement. He applauded Apple for making an honest attempt at improving users’ privacy, though he cautioned that, with the planned increase in personal user data, Apple should be transparent as to how the research technology will be deployed, and allow some public scrutiny to confirm it is done properly. The question remains whether Apple will be successful in implementing DP in such a way as to glean useful information, while protecting the privacy of its users.
More recently, Google published papers covering concepts it called “federated learning” and “secure aggregation”. Similar to Apple’s purpose in implementing differential privacy, Google seeks to improve the utility and power of on-demand search and recommendation services on its Android devices. But as Jordan Novet writes in Venture Beat, Google’s “secure aggregation model” … depends heavily on small encrypted summaries of data instead of actual data, whereas [Apple’s] differential privacy entails incorporating random noise into calculations with data.” (emphasis added). That would seem to address the first point of Matt Green’s challenge referred to above, but not necessarily. Novet adds: “The actual data stays local, and Google can only decrypt its average update if there are hundreds or thousands of users contributing to it.”