MediaTech Law

By MIRSKY & COMPANY, PLLC

Legal Issues in Ad Tech: De-Identified vs. Anonymized in a World of Big Data

In the booming world of Big Data, consumers, governments, and even companies are rightfully concerned about the protection and security of their data and how to keep one’s personal and potentially embarrassing details of life from falling into nefarious hands.   At the same time, most would recognize that Big Data can serve a valuable purpose, such as being used for lifesaving medical research and to improve commercial products. A question therefore at the center of this discussion is how, and if, data can be effectively “de-identified” or even “anonymized” to limit privacy concerns – and if the distinction between the two terms is more theoretical than practical. (As I mentioned in a prior post, “de-identified” data is data that has the possibility to be re-identified; while, at least in theory, anonymized data cannot be re-identified.)

Privacy of health data is particularly important and so the U.S. Health Insurance Portability and Accountability Act (HIPPA) includes strict rules on the use and disclosure of protected health information. These privacy constraints do not apply if the health data has been de-identified – either through a safe harbor-blessed process that removes 18 key identifiers or through a formal determination by a qualified expert, in either case presumably because these mechanisms are seen as a reasonable way to make it difficult to re-identify the data.

Read More

Legal Issues in Ad Tech: Anonymized and De-Identified Data

Recently, in reviewing a contract with a demand-side platform (DSP), I came across this typical language in a “Data Ownership” section:

“All Performance Data shall be considered Confidential Information of Advertiser, provided that [VENDOR] may use such Performance Data … to create anonymized aggregated data, industry reports, and/or statistics (“Aggregated Data”) for its own commercial purposes, provided that Aggregated Data will not contain any information that identifies the Advertiser or any of its customers and does not contain the Confidential Information of the Advertiser or any intellectual property of the Advertiser or its customers.” (emphasis added).

I was curious what makes data “anonymized”, and I was even more curious whether the term was casually and improperly used. I’ve seen the same language alternately used substituting “de-identified” for “anonymized”. Looking into this opened a can of worms ….

What are Anonymized and De-Identified Data – and Are They the Same?

Here’s how Gregory Nelson described it in his casually titled “Practical Implications of Sharing Data: A Primer on Data Privacy, Anonymization, and De-Identification”:

“De-identification of data refers to the process of removing or obscuring any personally identifiable information from individual records in a way that minimizes the risk of unintended disclosure of the identity of individuals and information about them. Anonymization of data refers to the process of data de-identification that produces data where individual records cannot be linked back to an original as they do not include the required translation variables to do so.” (emphasis added)

Or in other words, both methods have the same purpose and both methods technically remove personally identifiable information (PII) from the data set. But while de-identified data can be re-identified, anonymized data cannot be re-identified. To use a simple example, if a column from an Excel spreadsheet containing Social Security numbers is removed from a dataset and discarded, the data would be “anonymized”.

But first … what aspects or portions of data must be removed in order to either de-identify or anonymize a set?

But What Makes Data “De-Identified” or “Anonymous” in the First Place?

Daniel Solove has written that, under the European Union’s Data Directive 95/46/EC, “Even if the data alone cannot be linked to a specific individual, if it is reasonably possible to use the data in combination with other information to identify a person, then the data is PII.” This makes things complicated in a hurry. After all, in the above example where Social Security numbers are removed, remaining columns might include normally non-PII information such as zip codes or gender (male or female). But the Harvard researchers Olivia Angiuli, Joe Blitzstein, and Jim Waldo show how even these 3 data points in an otherwise “de-identified” data set (i.e. “medical data” in the image below) can be used to re-identify individuals when combined with an outside data source that shares these same points (i.e. “voter list” in the image below):

Data Sets Overlap Chart

(Source: How to De-Identify Your Data, by Olivia Angiuli, Joe Blitzstein, and Jim Waldo, http://queue.acm.org/detail.cfm?id=2838930)

That helps explain the Advocate General opinion recently issued in the European Union Court of Justice (ECJ), finding that dynamic IP addresses can, under certain circumstances, be “personal data” under the European Union’s Data Directive 95/46/EC. The case involves interpretation of the same point made by Daniel Solove cited above, namely discerning the “personal data” definition, including this formulation in Recital 26 of the Directive:

“(26) … whereas, to determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person …”

There was inconsistency among the EU countries on the level of pro-activity required by a data controller in order to render an IP address “personal data”.   So, for example, the United Kingdom’s definition of “personal data”: “data which relate to a living individual who can be identified – (a) from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller” (emphasis added). Not so in Germany and, according to a White & Case report on the ECJ case, not so according to the Advocate General, whose position was that “the mere possibility that such a request [for further identifying information] could be made is sufficient.”

Which then circles things back to the question at the top, namely: Are Anonymized and De-Identified Data the Same? They are not the same. That part is easy to say. The harder part is determining which is which, especially with the ease of re-identifying presumably scrubbed data sets. More on this topic shortly.

Read More

What’s Behind the Decline in Internet Privacy Litigation?

The number of privacy lawsuits filed against big tech companies has significantly dropped in recent years, according to a review of court filings conducted by The Recorder, a California business journal.

According to The Recorder, the period 2010-2012 saw a dramatic spike in cases filed against Google, Apple, or Facebook (as measured by filings in the Northern District of California naming one of the three as defendants). The peak year was 2012, with 30 cases filed against the three tech giants, followed by a dramatic drop-off in 2014 and 2015, with only five privacy cases filed between the two years naming one of the three as defendants. So what explains the sudden drop off in privacy lawsuits?

One theory, according to privacy litigators interviewed for The Recorder article, is that the decline reflects the difficulty in applying federal privacy statutes to prosecute modern methods of monetizing, collecting, or disclosing online data. Many privacy class action claims are based on statutes passed in the 1980s like the Electronic Communications Privacy Act (ECPA), the Stored Communications Act (SCA), both passed in 1986, and the Video Privacy Protection Act (VPPA), passed in 1988. These statutes were originally written to address specific privacy intrusions like government wire taps or disclosures of video rental history.

Read More

Privacy: Consent to Collecting Personal Information

Gonzalo Mon writes in Mashable that “Although various bills pending in Congress would require companies to get consent before collecting certain types of information, outside of COPPA, getting consent is not a uniformly applicable legal requirement yet. Nevertheless, there are some types of information (such as location-based data) for which getting consent may be a good idea.  Moreover, it may be advisable to get consent at the point of collection when sensitive personal data is in play.”

First, what current requirements – laws, agency regulations and quasi-laws – require obtaining consent, even if not “uniformly applicable”?

1. Government Enforcement.  The Federal Trade Commission’s November 2011 consent decree with Facebook user express consent to sharing of nonpublic user information that “materially exceeds” user’s privacy settings.  The FTC was acting under its authority under Section 5 of the FTC Act against an “unfair and deceptive trade practice”, an authority the FTC has liberally used in enforcement actions involving not just claimed breaches of privacy policies but also data security cases involving managing of personal data without providing adequate security.

2. User Expectations Established by Actual Practice.  The mobile space offers some of the most progressive (and aggressive) examples of privacy rights seemingly established by practice rather than stated policy.  For example, on the PrivacyChoice blog, the CEO of PlaceIQ explained that “Apple and Android have already established user expectations about [obtaining] consent.  Location-based services in the operating system provide very precise location information, but only through a user-consent framework built-in to the OS.  This creates a baseline user expectation about consent for precise location targeting.”  (emphasis added)

Read More