Published: July 14, 2016

When disaster strikes, those affected often turn to social media to request aid, offer assistance, or share other information in real time. In recent years, data scientists have begun analyzing millions of Facebook posts and tweets in order to study the collective response before, during and after a crisis.听

In the face of this mountain of information, however, it can be hard to identify the most relevant posts and trends. But thanks to a close collaboration between social science and software engineering, 麻豆影院 researchers Leysia Palen and Kenneth Anderson are innovating new ways to find the underlying human behaviors hidden within noisy data.

鈥淭he trick is understanding the potential of large-volume social media information along with its limits,鈥 says Palen, chair of the in the at CU 麻豆影院. 鈥淛ust because we have a lot of data doesn鈥檛 mean that we have all the answers.鈥

Palen鈥檚 research centers on 鈥渃risis informatics,鈥 a relatively new interdisciplinary field that investigates how people use technology to communicate and coordinate during natural disasters and other upheavals. The work combines intensive qualitative and quantitative analysis to sort through the staggering amounts of data posted to social media accounts in the wake of a crisis.

鈥淚n these disaster scenarios, the volume of information is far too large to sift through manually,鈥 says Anderson, a professor of computer science and associate dean for education at CU 麻豆影院鈥檚 . 鈥淭he Hurricane Sandy data set from 2012, for example, consisted of approximately 220 million tweets and we used that core data set to study many aspects of that disaster.鈥

, Palen and Anderson write that data sets often have to expand (e.g. by incorporating a person鈥檚 last hundred tweets rather than just the one they tweeted with a keyword of interest) before they can be sampled and studied accordingly.听 In other words, the haystack has to get bigger in order to truly locate the needle.

The reason for that, says Palen, is to fill in contextual gaps in the data. Researchers can hone in on relevant posts by searching for particular terms, but can often miss other important information that way. A person might tweet once about 鈥淗urricane Sandy,鈥 for instance, but then continue their train of thought across several subsequent tweets without mentioning 鈥渉urricane鈥 or 鈥渟andy鈥 again. A keyword search would overlook that potentially crucial follow-up.

The researchers note that viewing the data through a social science lens can shine a light on interesting corners of disaster response, such as the people who donated cell phone minutes to first responders on the ground during the 2010 Haiti earthquake, or those who set up a Facebook group to help families reunite with lost pets after Hurricane Sandy. A series of geotagged posts might identify people who have evacuated, even if they never use the word 鈥渆vacuate鈥 in any of their posts.

鈥淵ou might start down the road with one question in mind and then realize halfway through that the more interesting behavioral aspect is completely different, so then you recalibrate,鈥 says Anderson.

The researchers have also found that social media analysis without a social science context tends to erroneously smooth over key differences between various types of disasters, such as a hurricane versus a terrorist attack.

鈥淥ne big mistake is conflating different types of disasters and assuming that the social media response will be the same,鈥 says Palen. 鈥淭he way that people interact online is very different for different events, which is especially important to know for law enforcement purposes.鈥

Parsing this amount of data will always be challenging, notes Anderson, for a variety of reasons that include a relative lack of geotagged tweets and posts (due to default platform settings); people using older versions of social media apps (which send information in disparate formats); and identifying a truly representative sample of the population involved in any given disaster.

Going forward, Palen and Anderson plan to continue improving the predictive capabilities of their model in order to shrink the time window between the event and the social media analysis in order to help public safety officials and emergency response personnel respond faster and more effectively.

Overall, both researchers agree that those affected by disasters demonstrate consistent innovation and resourcefulness.

鈥淚鈥檓 constantly surprised by people鈥檚 ingenuity in these situations,鈥 says Palen. 鈥淵ou can really see people coordinating volunteer efforts and inventing solutions to problems in real time. That鈥檚 how you really know that people are altruistic, smart, and always trying to help themselves and each other in times of need.鈥 听