My first class of graduate school was Organization of Information. For our first assignment the class had to map out entity-relationships of various works of art. During this project, I got a unintentional introduction to how my search data is aggregated and used. One question on the assignment was to search a certain online retailer for a DVD copy of The Wiz and map out its relationship to L. Frank Baum’s The Wizard of Oz. (Short answer: derivative. Longish answer: derivative of the 1939 film, which was the “child” to the “parent” work: Baum’s original novel.) I visited the site, answered the question, and proceeded to spend the rest of the weekend on recreational Web time (probably YouTube videos of dogs falling down). However, my assignment had a way of following me. Not the assignment, no, but the data collected by the online retailer’s robots and sold as paid, targeted advertisements on nearly every site I visited for the next week. Yes, every paid ad on the sites I visited showed me one thing only: ads for The Wiz.
It just goes to show how much of your data is attached to you. My browser cache has no way of knowing the context of my search history. The data only showed what I searched for, not the why. The mere fact that The Wiz was so out of context of what I would typically shop for, made me always aware of those ads. Often, when targeted ads feature sites I regularly visit or purchase from, I don’t even notice. And that’s how I came to pay attention to the data–to my data.
In Dataclysm: Who We Are (When We Think No One’s Looking) Christian Rudder delivers a simple but powerful primer on data science for the social networker. Rudder, co-founder and president of OkCupid examines user-supplied data to showcase how we think versus how we behave. The interesting launchpad for this work comes from OkCupid (and other aggregate sources around the Web as well)–Rudder uses the data not to sell or surveil, but to examine human perception and behavior: where they reflect and where they splinter.
Perception and behavior. These are what make human interaction, well, human. Rudder’s unique position as head of a dating site gives him access to examine:
- the things that white, Asian, Latino/a, and black men and women talk about; and the things they don’t talk about (which turns out to be a rather interesting metric for gauging valid scripts)
- whether couples with more Facebook friends in common are more or less likely to split up
- how black communities are using Twitter differently–and more effectively–than any other community
- what women have to lose in profile photos on employment/job searching sites
- what porn searches reveal about the overall geographic dispersion of gay populations
Rudder does have a tendency to oversimplify his findings: by presenting overwhelmingly white, male data (because that’s what he understands); by focusing on male-female sexual dynamics and ignoring same-sex dynamics; by erasure of handicapped or otherwise impaired users and ethnicities that don’t fit the four groups who are his focus (it’s possible that the data samples are not significant, but Rudder’s bias is in not mentioning that fact); by putting age caps of 50 on all of his samples.
Flaws aside, Dataclysm is a strong, popularly accessible foundation for use of data science to flush out human perception and behaviors, and I look forward to seeing new uses that Rudder finds for the data. Rudder touches briefly upon the idea of compensation: should we–the data producers–be compensated for the uses of our data? Personally, if data scientists were willing to use the data to tell us more about how we think and behave–to reveal more of our humanity and chart the course of our stories as they become our histories, I would consider myself very well compensated, indeed.
Also, if you think Google doesn’t know that you’re racist, think again.
You can get a copy of Dataclysm: Who We Are (When We Think No One’s Looking) by Christian Rudder on Tuesday, September 9 at your local independent bookstore or library. Here’s one!