News & Politics  
comments_image Comments

Crowdsourcing: Song Lyrics and Twitter Help Chart Public Mood

Trying to divine the mood of a group of people is hard and requires trust in their answers. A new method has researchers whistling a happier tune.
 
 
Share
 
 
 
 

Social scientists seeking to assess the collective mood of large groups of people traditionally have relied on slow, laborious sampling methods that usually entail some form of self-reporting.

Peter Dodds and Chris Danforth, mathematicians at the University of Vermont, dreamed up an ingenious way to sample the feelings of many more people much more quickly.

They downloaded the lyrics to 232,000 popular songs composed between 1960 and 2007 and calculated how often emotion-laden words like “love,” “hate,” “pain” and “baby” occurred in each.

Then they graphed their results, averaging over the emotional valence of individual words. A clearly negative trend emerged over the 47-year period, from bright and happy (think Pat Boone) to dark and depressive (death metal and industrial music come to mind).

The pair has used similar methods to analyze millions of sentences downloaded from blogs, as well as the text of every U.S. State of the Union address and a vast trove of Twitter tweets.

They see distinctive patterns emerging in how collective moods shift over time. The Internet, with its ability to transmit vast amounts of data, is the key.

“People have been trying to take a picture of what’s happening on the Web in real time and feed it into essentially another dial, like the Consumer Confidence Index or the gross domestic product,” Danforth explains. “That would help decision-makers decide what it is that people are feeling at the moment or how well social programs are working.”

Other researchers are onto the same idea. A team at Indiana University has shown that how calm the public mood is — as measured by the language used in millions of 140-character Twitter tweets — accurately predicts how well the stock market will do in the following few days.

Recently, scientists have even shown they could predict movie box office receipts based solely on Twitter chatter and the number of theaters in which a film is showing.

This new field of looking for hidden patterns in vast quantities of text or other user-generated information — variously called “sociotechnical data mining” or “computational social science” — is deceptively simple: just add together the numerical values assigned to various emotionally positive or negative words in a sentence and take their average.

The method starts with established lists of commonly used words that have been ranked according to their emotional valence.

For the song lyrics experiment, Danforth and Dodds used the Affective Norms for English Words list, developed from a 1999 study in which participants graded their reactions to 1,034 words on a 1-9 scale (in which 9 is “happy, pleased, satisfied [and] contented”). On this scale, “triumphant” scores an 8.82, for example, while “suicide” comes in at 1.25.

Song lyrics — which presumably reflect audience taste — were analyzed mostly to prove that the data-mining technique worked, Danforth says. In breaking out the results, he and Dodds also classified lyrics by genres and individual artists. Not surprisingly, gospel music ranked as the genre having the most positive lyrics.

“One of the things that had surprised us was that we had expected rap and hip-hop to be down near the bottom — but it’s really not, it’s actually sort of in the middle,” Danforth says. “It’s metal, industrial music and punk at the bottom, at least in the lyrics.”

The method may not accurately characterize the meaning of a given text. For example, The Beatles’ “Maxwell’s Silver Hammer” recounts acts of violence — ( “Bang! Bang! Maxwell’s silver hammer came down upon her head/Bang! Bang! Maxwell’s silver hammer made sure that she was dead.”) — but most listeners would understand the song’s lyrics to be comical. Yet when the technique is applied to thousands of song lyrics, differences in intended meaning tend to average out, Danforth says.