How to Wrestle Your Data From Data Brokers, Silicon Valley - and Cambridge Analytica
Cambridge Analytica thinks that I’m a “Very Unlikely Republican.” Another political data firm, ALC Digital, has concluded I’m a “Socially Conservative,” Republican, “Boomer Voter.” In fact, I’m a 27-year-old millennial with no set party allegiance.
For all the fanfare, the burgeoning field of mining our personal data remains an inexact art.
One thing is certain: My personal data, and likely yours, is in more hands than ever. Tech firms, data brokers and political consultants build profiles of what they know — or think they can reasonably guess — about your purchasing habits, personality, hobbies and even what political issues you care about.
You can find out what those companies know about you but be prepared to be stubborn. Very stubborn. To demonstrate how this works, we’ve chosen a couple of representative companies from three major categories: data brokers, big tech firms and political data consultants.
Few of them make it easy. Some will show you on their websites, others will make you ask for your digital profile via the U.S. mail. And then there’s Cambridge Analytica, the controversial Trump campaign vendor that has come under intense fire in light of a https://www.facebook.com/settings on your computer.
What You Might Get Back From Facebook
Facebook designed its archive to first show you your profile information. That’s all information you typed into Facebook and that you probably intended to be shared with your friends. It’s no surprise that Facebook knows what city I live in or what my AIM screen name was — I told Facebook those things so that my friends would know.
But it’s a bit of a surprise that they decided to feature a list of my ex-girlfriends — what they blandly termed “Previous Relationships” — so prominently.
As you dig deeper in your archive, you’ll find more information that you gave Facebook, but that you might not have expected the social network to keep hold of for years: if you’re me, that’s the Nickelback concert I apparently RSVPed to, posts about switching high schools and instant messages from my freshman year in college.
But finally, you’ll find the creepier information: what Facebook knows about you that you didn’t tell it, on the “Ads” page. You’ll find “Ads Topics” that Facebook decided you were interested in, like Housing, ESPN or the town of Ellijay, Georgia. And, you’ll find a list of advertisers who have obtained your contact information and uploaded it to Facebook, as part of a so-called Custom Audience of specific people to whom they want to show their ads.
You’ll find more of that creepy information on your Ads Preferences page. Despite Mark Zuckerberg telling Rep. Jerry McNerney, D-Calif., in a hearing earlier this month that “all of your information is included in your ‘download your information,’” my archive didn’t include that list of ad categories that can be used to target ads to me. (Some other types of information aren’t included in the download, like other people’s posts you’ve liked. Those are listed here, along with where to find them — which, for most, is in your Activity Log.)
This area may include Facebook’s guesses about who you are, boiled down from some of your activities. Most Americans’ will have a guess about their politics — Facebook says I’m a “moderate” about US Politics — and some will have a guess about so-called “multicultural affinity,” which Facebook insists is not a guess about your ethnicity, but rather what sorts of content “you are interested in or will respond well to.” For instance, Facebook recently added that I have a “Multicultural Affinity: African American.” (I’m white — though, because Facebook’s definition of “multicultural affinity” is so strange, it’s hard to tell if this is an error on Facebook’s part.)
Facebook also doesn’t include your browsing history — the subject of back-and-forths between Mark Zuckerberg and several members of Congress — it says it keeps that just long enough to boil it down into those “Ad Topics.”
For people without Facebook accounts, Facebook says to email firstname.lastname@example.org or fill out an online form to download what Facebook knows about you. One puzzle here is how Facebook gathers data on people whose identities it may not know. It may know that a person using a phone from Atlanta, Georgia, has accessed a Facebook site and that the same person was last week in Austin, Texas, and before that Cincinnati, but it may not know that that person is me. It’s in principle difficult for the company to give the data it collects about logged-out users if it doesn’t know exactly who they are.
Like Facebook, Google will give you a zip archive of your data. Google’s can be much bigger, because you might have stored gigabytes of files in Google Drive or years of emails in Gmail.
But like Facebook, Google does not provide its guesses about your interests, which it uses to target ads. Those guesses are available elsewhere.
How You Can Request Your Data From Google:
- Visit https://takeout.google.com/settings/takeout/ to use Google’s cutely named Takeout service.
- You’ll have to pick which data you want to download and examine. You should definitely select My Activity, Location History and Searches. You may not want to download gigabytes of emails, if you use Gmail, since that uses a lot of space and may take a while. (That’s also information you shouldn’t be surprised that Google keeps — you left it with Gmail so that you could use Google’s search expertise to hold on to your emails. )
- Google will present you with a few options for how to get your archive. The defaults are fine.
- Within a few hours, you should get an email with the subject “Your Google data archive is ready.” Click Download Archive and log in again. That should start the download of a file named something like “takeout-20180412T193535.zip.”
- Unzip the folder; depending on your computer’s operating system, this might be called uncompressing or “expanding.”
- You’ll get a folder called Takeout. Open the file inside it called “index.html” in your web browser to explore your archive.
What You Might Get Back From Google:
Once you open the index.html file, you’ll see icons for the data you chose in step 2. Try exploring “Ads” under “My Activity” — you’ll see a list of times you saw Google Ads, including on apps on your phone.
Google also includes your search history, under “Searches” — in my case, going back to 2013. Google knows what I had forgotten: I Googled a bunch of dinosaurs around Valentine’s Day that year … And it’s not just web searches: the Sound Search history reminded me that at some point, I used that service to identify Natalie Imbruglia’s song “Torn.”
Android phone users might want to check the “Android” folder: Google keeps a list of each app you’ve used on your phone.
Most of the data contained here are records of ways you’ve directly interacted with Google — and the company really does use the those to improve how their services work for me. I’m glad to see my searches auto-completed, for instance.
But the company also creates data about you: Visit the company’s Ads Settings page to see some of the “topics” Google guesses you’re interested in, and which it uses to personalize the ads you see. Those topics are fairly general — it knows I’m interested in “Politics” — but the company says it has more granular classifications that it doesn’t include on the list. Those more granular, hidden classifications are on various topics, from sports to vacations to politics, where Google does generate a guess whether some people are politically “left-leaning” or “right-leaning.”
Here’s who really does sell your data. Data brokers like the credit reporting agency Experian and a firm named Epsilon.
These sometimes-shady firms are middlemen who buy your data from tracking firms, survey marketers and retailers, slice and dice the data into “segments,” then sell those on to advertisers.
Experian is best known as a credit reporting firm, but your credit cards aren’t all they keep track of. They told me that they “firmly believe people should be made aware of how their data is being used” — so if you print and mail them a form, they’ll tell you what data they have on you.
“Educated consumers,” they said, “are better equipped to be effective, successful participants in a world that increasingly relies on the exchange of information to efficiently deliver the products and services consumers demand.”
How You Can Request Your Data From Experian:
- Visit Experian’s Marketing Data Request site and print the Marketing Data Report Request form.
- Print a copy of your ID and proof of address.
- Mail it all to Experian at Experian Marketing Services PO Box 40 Allen, TX 75013
- Wait for them to mail you something back.
What You Might Get Back From Experian:
Expect to wait a while. I’ve been waiting almost a month.
They also come up with a guess about your political views that’s integrated with Facebook — our Facebook Political Ad Collector project has found that many political candidates use Experian’s data to target their Facebook ads to likely supporters.
You should hope to find a guess about your political views that’d be useful to those candidates — as well as categories derived from your purchasing data.
Experian told me they generate the data they have about you from a long list of sources, including public records and “historical catalog purchase information” — as well as calculating it from predictive models.
How You Can Request Your Data From Epsilon:
- Visit Epsilon’s Marketing Data Summary Request form.
- After entering your name and address, Epsilon will answer some of those identity-verification questions that quiz you about your old addresses and cars. If your identity can’t be verified with those, Epsilon will ask you to mail in a form.
- Wait for Epsilon to mail you your data; it took about a week for me.
What You Might Get Back From Epsilon:
Epsilon has information on “demographics” and “lifestyle interests” — at the household level. It also includes a list of “household purchases.”
It also has data that political candidates use to target their Facebook ads, including Randy Bryce, a Wisconsin Democrat who’s seeking his party’s nomination to run for retiring Speaker Paul Ryan’s seat, and Rep. Tulsi Gabbard, D-Hawaii.
In my case, Epsilon knows I buy clothes, books and home office supplies, among other things — but isn’t any more specific. They didn’t tell me what political beliefs they believe I hold. The company didn’t respond to a request for comment.
Oracle’s Data Cloud aggregates data about you from Oracle, but also so-called third party data from other companies.
How You Can Request Your Data From Oracle:
- Visit http://www.bluekai.com/registry/. If you use an ad blocker, there may not be much to see here.
- Explore each tab, from “Basic Info” to “Hobbies & Interests” and “Partner Segment.”
Not fun scrolling through all those pages? I have 84 pages of four pieces of data each.
You can’t search. All the text is actually images of text. Oracle declined to say why it chose to make their site so hard to use.
What You Might Get Back From Oracle:
My Oracle profile includes nearly 1500 data points, covering all aspects of my life, from my age to my car to how old my children are to whether I buy eggs. These profiles can even say if you’re likely to dress your pet in a costume for Halloween. But many of them are off-base or contradictory.
Many companies in Oracle’s data, besides ALC Digital, offer guesses about my political views: Data from one company uploaded by AcquireWeb says that my political affiliations are as a Democrat and an Independent … but also that I’m a “Mild Republican.” Another company, an Oracle subsidiary called AddThis, says that I’m a “Liberal.” Cuebiq, which calls itself a “location intelligence” company, says I’m in a subset of “Democrats” called “Liberal Professions.”
If an advertiser wants to show an ad to Spring Break Enthusiasts, Oracle can enable that. I’m apparently a Spring Break Enthusiast. Do I buy eggs? I sure do. Data on Oracle’s site associated with AcquireWeb says I’m a cat owner …
But it also “knows” I’m a dog owner, which I’m not.
Al Gadbut, the CEO of AcquireWeb, explained that the guesses associated with his company weren’t based on my personal data, but rather the tendencies of people in my geographical area — hence the seemingly contradictory political guesses. He said his firm doesn’t generate the data, but rather uploaded it on behalf of other companies. Cuebiq’s guess was a “probabilistic inference” they drew from location data submitted to them by some app on my phone. Valentina Marastoni-Bieser, Cuebiq’s senior vice president of marketing, wouldn’t tell me which app it was, though.
Data for sale here includes a long list what TV shows I — supposedly — watch.
But it’s not all wrong. AddThis can tell that I’m “Young & Hip.”
The above list is just a sampling of the firms that collect your data and try to draw conclusions about who you are — not just sites you visit like Facebook and controversial firms like Cambridge Analytica.
You can make some guesses as to where this data comes from — especially the more granular consumer data from Oracle. For each data point, it’s worth considering: Who’d be in a position to sell a list of what TV shows I watch, or, at least, a list of what TV shows people demographically like me watch? Who’d be in a position to sell a list of what groceries I, or people similar to me in my area, buy? Some of those companies — companies who you’re likely paying, and for whom the internet adage that “if you’re not paying, you’re the product” doesn’t hold — are likely selling data about you without your knowledge. Other data points, like the location data used by Cuebiq, can come from any number of apps or websites, so it may be difficult to figure out exactly which one has passed it on.
Companies like Google and Facebook often say that they’ll let you “correct” the data that they hold on you — tacitly acknowledgingly that they sometimes get it wrong. But if receiving relevant ads is not important to you, they’ll let you opt-out entirely — or, presumably, “correct” your data to something false.
An upcoming European Union rule called the General Data Protection Regulation portends a dramatic change to how data is collected and used on the web — if only for Europeans. No such law seems likely to be passed in the U.S. in the near future.
ProPublica is a Pulitzer Prize-winning investigative newsroom. Sign up for their newsletter.