The Virtuous Cycle of Existing Theory and Big Data

In recent decades a ton of research has led to the conclusion that while some aspects of our personalities change over time, others are remarkably stable. There are now pretty accurate (we think) pencil-and-paper tests you can take that will give an accurate measure of how extraverted you are, how neurotic, and so on.

I just learned via Facebook and Business Insider (!?!?) about a fantastic study —  “Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach” — that used the Facebook statuses of about 75,000 people to demonstrate just how accurate those assessments are. Here, for example, are status word clouds for the extraverts in the group vs the introverts (as measured by existing tests), and the neurotics vs people with high emotional stability (contains R-rated language):

Word Clouds for personality types

Aren’t these amazing? I could look at them all day. The researchers, led by Hansen Andrew Schwartz at Penn, didn’t know in advance what the stream of status updates from their subjects looked like. All they knew when they started constructing the word clouds above were subjects’ results on a standard personality test.

And those results are clearly meaningful, since the status updates from extraverts and introverts are totally different, as are those from neurotics vs. the emotionally stable. And each personality type is clearly visible from its word cloud: extraverts talk about parties, introverts about computers, and so on.

The cloud for emotional stability surprised me. I was expecting to see lots of the words that appear there, but not such large clusters related to sports, exercise, and getting outdoors. It’s not clear yet what this implies — whether stability comes from getting out, or whether stable people just get out more — but I suspect the causality runs in both directions. We’re learning how much exercise benefits the brain, so it seems like a good and safe prescription to me that if you’re feeling lousy, get up, get out, and move.

This is really clever and important work because it uses new and big data to test and extend existing an existing body of theory in the social sciences. We’ll see a lot more work like this in the near future, and I can’t wait to see how well current thinking stands up to new data. Stay tuned; this is going to be wild.

Here’s one more picture from the study. It does not reflect well at all on my gender, but I have to show it anyway:

word clouds for women vs men