Academia.eduAcademia.edu

Discovering Social Events through Online Attention

2014, PLoS ONE

https://doi.org/10.1371/JOURNAL.PONE.0102001

Abstract

Twitter is a major social media platform in which users send and read messages (''tweets'') of up to 140 characters. In recent years this communication medium has been used by those affected by crises to organize demonstrations or find relief. Because traffic on this media platform is extremely heavy, with hundreds of millions of tweets sent every day, it is difficult to differentiate between times of turmoil and times of typical discussion. In this work we present a new approach to addressing this problem. We first assess several possible ''thermostats'' of activity on social media for their effectiveness in finding important time periods. We compare methods commonly found in the literature with a method from economics. By combining methods from computational social science with methods from economics, we introduce an approach that can effectively locate crisis events in the mountains of data generated on Twitter. We demonstrate the strength of this method by using it to locate the social events relating to the Occupy Wall Street movement protests at the end of 2011.

Discovering Social Events through Online Attention Dror Y. Kenett1*, Fred Morstatter2, H. Eugene Stanley1, Huan Liu2 1 Center for Polymer Studies and Department of Physics, Boston University, Boston, Massachusetts, United States of America, 2 School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, Arizona, United States of America Abstract Twitter is a major social media platform in which users send and read messages (‘‘tweets’’) of up to 140 characters. In recent years this communication medium has been used by those affected by crises to organize demonstrations or find relief. Because traffic on this media platform is extremely heavy, with hundreds of millions of tweets sent every day, it is difficult to differentiate between times of turmoil and times of typical discussion. In this work we present a new approach to addressing this problem. We first assess several possible ‘‘thermostats’’ of activity on social media for their effectiveness in finding important time periods. We compare methods commonly found in the literature with a method from economics. By combining methods from computational social science with methods from economics, we introduce an approach that can effectively locate crisis events in the mountains of data generated on Twitter. We demonstrate the strength of this method by using it to locate the social events relating to the Occupy Wall Street movement protests at the end of 2011. Citation: Kenett DY, Morstatter F, Stanley HE, Liu H (2014) Discovering Social Events through Online Attention. PLoS ONE 9(7): e102001. doi:10.1371/journal.pone. 0102001 Editor: Matjaz Perc, University of Maribor, Slovenia Received May 14, 2014; Accepted June 13, 2014; Published July 30, 2014 Copyright: ß 2014 Kenett et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The authors confirm that, for approved reasons, some access restrictions apply to the data underlying the findings. Data is collected from twitter API study whose authors may be contacted at [email protected] Funding: HES and DYK thank the Office of Naval Research (ONR, Grant N00014-09-1-0380, Grant N00014-12-1- 0548), Keck Foundation, and the National Science Foundation for support. FM and HL thank the support of the Office of Naval Research (ONR, Grant N000141010091 and N000141110527). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * Email: [email protected] Introduction transmitted words are prefixed with a ‘‘#’’ sign. Every hashtag has a page showing the history of all the tweets containing that Over the past several years various Internet social media hashtag in the text, and this creates a community of users platforms have enabled people to communicate, locate resources, discussing that particular hashtag [24]. This encourages users and disseminate information during times of turmoil, e.g., natural interested in the topic to use the associated hashtag in their tweets disasters, health epidemics, or social unrest. Twitter, one major to increase the audience of their tweet, and the study of this social media platform, has emerged as a leading social media tagging behavior in Twitter has become an extremely active area outlet. With 200 million users sharing 140-character text messages of research [25–27]. In addition to text, users can also annotate (‘‘tweets’’) over 400 million times each day [1], Twitter’s their tweet with their current location, adding what is called a popularity and influence on world events have made it a hot ‘‘geotag.’’ Only about one percent of all tweets are geotagged, yet topic for social media research [2]. Research on Twitter began in they still provide background information about an event. Recent 2010 when researchers saw its potential for rapid communication work has focused on combining location with textual content to and information diffusion. The field of computational social detect topics more relevant to specific regions [28–30]. Because science has been rapidly expanding in response to the influence of geotags are so sparse, recent work has also focused on associating Twitter and other online social platforms [3,4], and new insights non-geotagged tweets with a location to better understand the into social structure and social dynamics are emerging [5–15]. context of the tweet [31,32]. Twitter has also been a focus in studies of humanitarian Social media platforms now strongly factor in the spreading of assistance/disaster relief (HA/DR) efforts [16–18] and in the ideas and the organization of social movements. Over the past few tracking of disease epidemics [19]. Because Twitter enables the years, social media has played a key role in such significant events real-time propagation of information to large groups of users, it is as the Arab Spring uprisings and the violent demonstrations an ideal environment for the dissemination of breaking news from organized in London. Twitter is popular with users seeking to news gatherers and from on-site locations where events are taking spread information about a cause. Because each message can be place. no longer than 140 characters, communication spreading infor- Twitter has several features of interest to the research mation concerning protest gatherings, earthquake relief, or the community. Twitter’s ‘‘retweet’’ feature, which allows users to location of aid stations is extremely rapid [33,34]. Participants in push content through the network by forwarding it to their the Arab Spring used Twitter to quickly coordinate protests followers, has elicited much research on how information [35,36]. Occupy Wall Street, a movement protesting the wealth propagates in social media [20,21], how retweets facilitate online disparity in the United States, was largely organized on Twitter conversation [22], and how retweets factor in times of crisis [23]. under the hashtag ‘‘#OccupyWallStreet.’’ As the movement Twitter uses a special text feature (a ‘‘hashtag’’) in which spread and authorities began to retaliate, protesters used Twitter PLOS ONE | www.plosone.org 1 July 2014 | Volume 9 | Issue 7 | e102001 Kenett, Morstatter, Stanley, Liu to report abuses by police, thus bringing more attention to their activity, is a more general pattern of discussion centered around cause. Social media became so central during the Arab Spring the protests in Zuccotti Park. protests that the regimes in such countries as Egypt and Syria cut the protesters’ access to the Internet. During Hurricane Sandy, Measures of Social Attention authorities used Twitter to spread news of power outages and the locations of resources for those affected by the storm. The Herfindahl-Hirschman index (also known as the Herfin- Because Twitter provides rapid communication and informa- dahl index, or HHI) is a measure of the size of firms in relation to tion diffusion, millions of people use it to keep up with current an industry and indicates the degree of competition among them. events and create their own discussion threads. Because activity on Named after economists Orris C. Herfindahl and Albert O. the Twitter site is huge, it is difficult to differentiate periods of Hirschman, it is an economic concept widely applied in focused discussion from periods of casual chatter. How do we competition law, antitrust law, and technology management. identify the key periods of discussion? How do we filter out the The measure is also used by the United States Department of noise and locate the main issues of discussion people are discussing Justice when evaluating mergers (see http://www.justice.gov/atr/ at any given time? public/guidelines/hhi.html). The result is proportional to the We will first attempt to locate the periods where tweets reflect average market share, weighted by market share. As such, it can actual events on the ground. To harness the abundance of data range from 0 to 1, moving from a huge number of very small firms produced by Twitter, we need a highly-scalable method to find key (with a value reaching zero) to a single monopolistic producer time periods of big events in social media. We focus on the Twitter (with a value reaching 1). Increases in HHI generally indicate a activity surrounding Occupy Wall Street–the vast Twitter decrease in competition and an increase of market power, whereas discussion of that event worldwide–and compare several methods decreases indicate the opposite. of quantifying social communication. We use a normalized HHI [42], H*, which is defined as Occupy Wall Street Movement H{1=N H : ð1Þ 1{1=N The Occupy Wall Street movement began on 17 September 2011 in New York City. The movement was largely promoted on where social media, and many hashtags were used to discuss the event. The chief driving force behind this movement was the growing X N wealth disparity between rich and poor in the United States [37]. H: s2i ð2Þ As the movement gained attention, other Occupy movements i~1 emerged in cities across the US. As citizens in other countries identified with the core concerns of the movement, similar N is the number of hashtags, and s is the percentage of the P actitivies spread across the globe. By 15 October 2011, 951 aggregate measure ( N 1 si ~1). similar protests had occurred in 82 countries [38]. As the We utilize the HHI as a ‘‘thermostat’’ of social attention. Each movement continued to grow it was officially endorsed by a hashtag represents a ‘‘firm’’ and the number of users tweeting this number of city governments and labor unions [39]. hashtag relative to the total number of users in a given time period In this study we collected tweet data from 14 September 2011 represents the hashtag’s ‘‘market cap.’’ This enables us to examine through 3 April 2012 using the parameters shown in Table 1 and the HHI value of different hashtags for a given time period. High encompassing 15,736,835 tweets with 402,758 unique hashtags HHI values indicate a strong focus on a specific topic, and low and 6,967,392 retweets. We used Twitter’s free, publicly-available HHI values indicate a diffused focus among a wide variety of data source, the Streaming API (see https://dev.twitter.com/ topics. docs/streaming-apis) to collect the data, in which three parameters We use HHI analysis to study the OWS dataset and calculate are supported: keywords (which can be supplied in the form of the HHI value for a time horizon of a single day, using the number words, phrases, or hashtags), locations (supplied as a geographic of users and hashtags. One concern of the HHI is that it is bounding box), and users. Every parameter is treated as an ‘‘OR’’ dependent on the number of tweets produced in a given time condition. That is, a tweet will be returned from the Streaming interval. Figure 2 shows the time evolution of the HHI. Figure 3 API if it contains at least one of the keywords, if it is produced compares the HHI with its underlying parameters: the number of from within the bounding box using a ‘‘geotag’’, or if it is authored users and the number of hashtags. Here the diagonal figures by one of the users specified in the parameters. When a user represent the histogram of values for each of these three geotags their tweet, their location is provided as part of the parameters, whereas the off-diagonal panels represent a compar- metadata using the GPS sensor on their device (for more ison of the values of two different parameters. Studying this figure, information see http://support.twitter.com/articles/78525-faqs- it is clear that the HHI is not merely a function of either of these about-the-tweet-location-feature). All parameters supplied to (and two parameters. tweets returned by) the Streaming API were managed using Another attention-based measure of social attention is the TweetTracker [40]. entropy [43] of the hashtags over a given time period. We here Many of the tweets collected were geotagged, with a large consider the hashtag probability to be the number of times the number of the geotagged tweets coming from New York City. hashtag is used over the number of times all hashtags are used in a Figure 1 shows a heatmap of the tweets produced on different days given time interval. The hashtag entropy is calculated by first and we can see the extreme cases of geotagged tweets. Figure 1(a) assigning the probability of a given hashtag, pi, using the fraction shows the tweets for 15 November 2011, when the New York of users who tweeted this hashtag in the given time horizon, Police Department attempted to remove protesters from Zuccotti summing over all hashtags such that: Park. Figure 1(b) shows the tweets for 26 December 2011, when protesting had dwindled. In between these two extremes of PLOS ONE | www.plosone.org 2 July 2014 | Volume 9 | Issue 7 | e102001 Kenett, Morstatter, Stanley, Liu Table 1. Parameters supplied to the Streaming API for each of the data sources. Data Set Keywords Geoboxes User Timelines Occupy Wall Street #occupywallstreet, #ows, #occupyboston, #p2, #occupywallst, #occupy, None None #tcot, #occupytogether, #teaparty, #99percent, #nypd, #takewallstreet, #occupydc, #occupyla, #usdor, #occupysf, #solidarity, #15o, #anonymous, #citizenradio, #gop, #sep17, #occupychicago, #occupyphoenix, #occupyoakland Coordinates below the boundary box indicate the Southwest and Northeast corner, respectively. No users were tracked during the course of data collection. doi:10.1371/journal.pone.0102001.t001 measure of the area under the ROC curve. The ROC AUC X N varies from 0.50 (totally random classification) to 1.0 (perfect SHashtag ~{ pi log( pi ), ð3Þ classification). i We vary the measurement threshold to identify important days, and compare the results with the ground truth. The true positive where N is the number of hashtags in the given time horizon. In rate is defined as the fraction of the actual significant days, as listed evaluating the effectiveness of our HHI-based approach, we by the ground truth, that are also identified by the measure. The compare its performance as a classifier of the ground truth relative false positive rate is the fraction of days that are not identified in to that of the other three models. the ground truth, but are identified as significant by the measure. Each point in the ROC curve corresponds to one selection Indicators of Activity in Social Media threshold. A random classifier yields a diagonal line (AUC = 0.50) To search for periods of focused discussion, we locate time from the bottom-left to the top-right corner. The greater the periods with a large number of tweets or time periods with a large curve’s distance above the diagonal line, the stronger the model’s number of unique hashtags and test whether these two simple predictive power. To obtain ground truth, we extract dates from measures can enable us to identify the focused discussion periods the Wikipedia timeline of the OWS protests (see http://en. in the dataset. We quantitatively test the two simple measures by wikipedia.org/wiki/Timeline_of_Occupy_Wall_Street). Next, by performing a receiver operating characteristic (ROC) curve varying the threshold that indicates ‘‘important’’ days, we find the analysis. The ROC curve plots the fraction of true positives out ROC curve, shown in Figure 4(a). The ROC AUC of the top of the positives and the fraction of false positives out of the hashtags is 0.36 and the ROC AUC of the top tweets is 0.42, both negatives for a binary classifier system. ROC curve analysis is a scoring worse than a perfectly random classifier. standard method in signal detection theory as well as in Although we can mitigate the poor results obtained in the psychology, medicine, and biometrics [41]. One key measure experiment by inverting the class labels–giving the inverted from the ROC curve is the area-under-curve (AUC) score, the hashtag and tweet indicators ROC AUCs of 0.64 and 0.58, Figure 1. Heatmap of geotagged Twitter activity. Twitter activity related to the Occupy Wall-Street (OWS) Movement, collected for hashtags, or topics, used by protests or members of the movement. The ‘‘redder’’ areas indicate regions with more tweets. Here we see two extremes of geotagging behavior. Panel (a) shows the tweets for 15 November 2011, when the New York Police Department attempted to remove protesters from Zuccotti Park. Panel (b) shows the tweets for 26 December 2011, when protesting had dwindled. In between these two extremes of activity, is a more general pattern of discussion centered around the protests in Zuccotti Park. doi:10.1371/journal.pone.0102001.g001 PLOS ONE | www.plosone.org 3 July 2014 | Volume 9 | Issue 7 | e102001 Kenett, Morstatter, Stanley, Liu Figure 2. Time evolution of the number of tweets (top), number of hashtags (middle), and Herfindahl-Hirsch Index (HHI) parameter (bottom) for the OWS dataset, on a daily time horizon. The HHI calculates how diverse the discussion is on Twitter, by calculating how many messages are associated with a given hashtag, and ranges from a value of 0, for highly diverse discussion, to 1, when all messages are focused on only one hashtag. doi:10.1371/journal.pone.0102001.g002 Figure 3. Comparison of the HHI to its underlying parameters: the number of tweets, and number of hashtags. Here, the diagonal figures represent the histogram of values for each of these three parameters, whereas the off diagonal panels represent a comparison of the values of two different parameters. It is clear by studying these figures that the HHI is not merely a function of either the number of tweets or number of users. doi:10.1371/journal.pone.0102001.g003 PLOS ONE | www.plosone.org 4 July 2014 | Volume 9 | Issue 7 | e102001 Kenett, Morstatter, Stanley, Liu Figure 4. HHI ROC analysis. (a) ROC curve of number tweets and number unique hashtags as classifiers for finding significant dates in the dataset. Number of tweets AUC = 0.42 and number of unique hashtags AUC = 0.36. (b) ROC curve of the HHI and Entropy classifiers. HHI AUC = 0.79, entropy AUC = 0.72. The focus-based classifiers provide the best classification when compared with the other methods, with the HHI being the best predictor. (c) ROC curve of the four classifiers - one minus number of tweets, one minus number of hashtags, and hashtag entropy - and their performance in identifying the ground truth. This is done as a below-random (,0.50) AUC means that the class labels should be inverted. (d) Distribution of the HHI AUC values for prediction of the ground truth for many random samples of the OWS dataset. The arrow in this figure represents the measure of the unshuffled data. doi:10.1371/journal.pone.0102001.g004 respectively–this approach has intuitive problems. Predicting Figure 4(b) and Figure 4(c) shows the results of performing all periods with few unique hashtags and few tweets is not relevant four indicators on the OWS dataset, with HHI and entropy to the problem of finding periods of intense discussion. Therefore, attaining ROC values of 0.79 and 0.72, respectively. The there is a need for a measure of social attention that focuses not attention-based indicators provide the best classification when only on the number of tweets or unique hashtags, but also on their compared with the other methods, with the HHI being the best ‘‘attention’’–the degree to which users congregate around them. predictor. To confirm that the classification accuracy of the HHI comes Social Attention as a Detector of Real-World from the hashtag selection made by the users and is not merely an Events artifact of the volume of tweets, we randomly shuffle the tweets based on the time they were produced. If the effectiveness of the We next use the HHI as a thermostat for social focus during HHI is due to the volume of tweets, then there should be no times of crisis. Alternate approaches would be to use the number significant difference between the initial AUC and those from the of tweets, the number of unique hashtags produced in a given day, datasets with the randomly shuffled timestamps. or the entropy of the hashtags used in the time period. PLOS ONE | www.plosone.org 5 July 2014 | Volume 9 | Issue 7 | e102001 Kenett, Morstatter, Stanley, Liu To this end, we create a unique set, T, of all the timestamps demonstrated through the Herfindahl index. In terms of classical from tweets in the dataset. For each tweet we then randomly information theory, this can be conversely related to a measure of choose a timestamp from T and assign it to the tweet, without entropy of the discussion topics, where our results show that replacement. Using this shuffled dataset we calculate the ROC significant events are related to drops in the entropy (or high HHI). AUC score. We repeat this process 100 times to determine the Entropy has been used in the past to study traditional media and distribution of the shuffled tweets. Finally, we compare the AUC online media [44–46]. Our results show that while the two score of the original data with the shuffled data to see if it differs measures are closely related, the HHI outperforms entropy as a significantly (m63s) from the center of the random shuffles. detector of significant events. This work presents a first use of the Figure 4(d) shows the distribution of ROC AUC scores of the HHI to study social attention on Twitter. randomly shuffled data. The Z–score of the original data, Although discussions in Twitter and in digital social media in calculated as general are extremely heterogeneous, when a significant event occurs discussions converge to the event and become extremely AUC{m homogeneous. The point at which this switching occurs indicates Zscore ~ , ð4Þ the magnitude of the event. Because of this, the proposed s Herfindahl index provides a means of detecting significant events, is +12.77, significantly outside of the control bounds. and provides a simple measure to filter significant events and centers of attention in the social online media. This simple yet Summary sophisticated measure can provide important insights to people of In this work we investigate the problem of finding real-world different background and needs, such as scientists, social-media events quickly as they unfold in large, noisy social media data. We based marketing professionals, policy and decision makers, and a seek to find a measure of attention in social media. The naive multitude of relief agency workers. choice for this aim is to investigate the number of tweets and number of unique hashtags, and we find that this approach is Acknowledgments unsatisfactory. One possible explanation for the poor performance We wish to thank Shlomo Havlin for all of his comments and suggestions of these measures could be that extraneous conversation on for this work. Twitter leads to spikes in activity not relevant to the event. We investigate two additional methods, HHI and entropy, and find Author Contributions that they are successful detectors of these periods of intense discussion. HHI, a measure borrowed from the economics Conceived and designed the experiments: DYK FM HES HL. Performed literature adapted for use in social media, yields the best results the experiments: DYK FM HES HL. Analyzed the data: DYK FM HES HL. Contributed reagents/materials/analysis tools: DYK FM HES HL. for identifying the times of intense discussion. Contributed to the writing of the manuscript: DYK FM HES HL. Our results indicate that significant social events cause the discussion on Twitter to move from many subjects to a few, as References 1. Tsukayama H (2013) Twitter turns 7: Users send over 400 million tweets per 14. Preis T, Moat HS, Stanley HE (2013) Quantifying trading behavior in financial day. The Washington Post. Available’’ http://www.washingtonpost.com/ markets using google trends. Scientific Reports 3: 1684. business/technology/twitter-turns-7-users-send-over-400-million-tweets-per- 15. Moat HS, Preis T, Olivola CY, Liu C, Chater N (2014) Using big data to predict day/2013/03/21/2925ef60-9222-11e2-bdea-e32ad90da239_story.html. Ac- collective behavior in the real world. Behavioral and Brain Sciences 37: 92–93. cessed 2014 Jul 4. 16. De Longueville B, Smith RS, Luraschi G (2009) ‘‘OMG, from here, I can see the 2. Kumar S, Morstatter F, Liu H (2014) Twitter Data Analytics. Springer. flames!’’: A use case of mining location based social networks to acquire spatio- 3. Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, et al. (2009) Life in the temporal data on forest fires. In: Proceedings of the 2009 International network: the coming age of computational social science. Science (New York, Workshop on Location Based Social Networks. New York, NY, USA: ACM, NY) 323: 721. LBSN’09, pp. 73–80. doi:10.1145/1629890.1629907. URL http://doi.acm. 4. Conte R, Gilbert N, Bonelli G, Cioffi-Revilla C, Deffuant G, et al. (2012) org/10.1145/1629890.1629907. Manifesto of computational social science. The European Physical Journal 17. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real- Special Topics 214: 325–346. time event detection by social sensors. In: Proceedings of the 19th international 5. Rybski D, Buldyrev SV, Havlin S, Liljeros F, Makse HA (2012) Communication conference on World wide web. New York, NY, USA: ACM, WWW’10, pp. activity in a social network: relation between long-term correlations and inter- 851–860. doi:10.1145/1772690.1772777. Available: http://doi.acm.org/10. event clustering. Scientific reports 2. 1145/1772690.1772777. Accessed 2014 Jul 4. 6. Gallos LK, Rybski D, Liljeros F, Havlin S, Makse HA (2012) How people 18. Morstatter F, Lubold N, Pon-Barry H, Pfeffer J, Liu H (2014) Finding eyewitness interact in evolving online affiliation networks. Physical Review X 2: 031014. tweets during crises. In: Association of Computational Lingustics Workshop on 7. Ciulla F, Mocanu D, Baronchelli A, Gonc¸alves B, Perra N, et al. (2012) Beating Language Technologies and Association of Computational Lingustics Workshop the news using social media: the case study of american idol. EPJ Data Science 1: on Language Technologies and Computational Social Science. 1–11. 19. Pastor-Satorras R, Vespignani A (2001) Epidemic spreading in scale-free 8. Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual networks. Phys Rev Lett 86: 3200–3203. human mobility patterns. Nature 453: 779–782. 20. Nagarajan M, Purohit H, Sheth A (2010) A qualitative examination of topical 9. Eagle N, Pentland AS, Lazer D (2009) Inferring friendship network structure by tweet and retweet practices. In: Fourth International AAAI Conference on using mobile phone data. Proceedings of the National Academy of Sciences 106: Weblogs and Social Media. AAAI. 15274–15278. 21. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a 10. Havlin S, Kenett DY, Ben-Jacob E, Bunde A, Cohen R, et al. (2012) Challenges news media? In: Proceedings of the 19th international conference on World wide in network science: Applications to infrastructures, climate, social systems and web. New York, NY, USA: ACM, WWW’10, pp. 591–600. doi:10.1145/ economics. European Physical Journal-Special Topics 214: 273. 1772690.1772751. 11. Gao J, Hu J, Mao X, Perc M (2012) Culturomics meets random fractal theory: 22. Boyd D, Golder S, Lotan G (2010) Tweet, tweet, retweet: Conversational aspects insights into long-range correlations of social and natural phenomena over the of retweeting on twitter. In: System Sciences (HICSS), 2010 43rd Hawaii past two centuries. Journal of The Royal Society Interface 9: 1956–1964. International Conference on. pp. 1–10. doi:10.1109/HICSS.2010.412. 12. Kenett DY, Portugali J (2012) Population movement under extreme events. 23. Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what Proceedings of the National Academy of Sciences 109: 11472–11473. we RT? In: Proceedings of the First Workshop on Social Media Analytics. New 13. Moat H, Curme C, Avakian A, Kenett DY, Stanley HE, et al. (2013) York, NY, USA: ACM, SOMA ’10, pp. 71–79. doi:10.1145/1964858.1964869. Quantifying wikipedia usage patterns quantifying wikipedia usage patterns Available: http://doi.acm.org/10.1145/1964858.1964869. Accessed 2014 Jul4. before stock market moves. Scientific Reports 3: 1801. PLOS ONE | www.plosone.org 6 July 2014 | Volume 9 | Issue 7 | e102001 Kenett, Morstatter, Stanley, Liu 24. Yang L, Sun T, Zhang M, Mei Q (2012) We know what @you #tag: does the on Knowledge discovery and data mining. New York, NY, USA: ACM, KDD dual role affect hashtag adoption? In: Proceedings of the 21st international ’12, pp. 1023–1031. doi:10.1145/2339530.2339692. conference on World Wide Web. New York, NY, USA: ACM, WWW’12, pp. 33. Bennett S (2011). Twitter: Faster than earthquakes. Media Bistro. Available: 261–270. doi:10.1145/2187836.2187872. Available: http://doi.acm.org/10. http://www.mediabistro.com/alltwitter/twitter-earthquake-video_b13147. Ac- 1145/2187836.2187872. Accessed 2014 Jul 4. cessed 2014 Jul 4. 25. Romero DM, Meeder B, Kleinberg J (2011) Differences in the mechanics of 34. Mourtada R, Salem F (2011) Civil movements: The impact of facebook and information diffusion across topics: idioms, political hashtags, and complex twitter. Arab Social Media Report 1. contagion on twitter. In: Proceedings of the 20th international conference on 35. Huang C (2011) Facebook and twitter key to arab spring uprisings: report. The World wide web. New York, NY, USA: ACM, WWW’11, pp. 695–704. doi: National Abu Dhabi Media 6. 10.1145/1963405.1963503. URL. 36. Campbell DG (2011) Egypt Unshackled: Using Social Media to @#:) the 26. EfronM(2010) Hashtag retrieval in a microblogging environment. In: Proceed- System. Amherst, NY: Cambria Books. ings of the 33rd international ACM SIGIR conference on Research and 37. Berrett D (2011) Intellectual roots of wall st. protest lie in academe. The development in information retrieval. New York, NY, USA: ACM, SIGIR ’10, Chronicle of Higher Education. Available: http://chronicle.com/article/ pp. 787–788. doi:10.1145/1835449.1835616. Intellectual-Roots-of-Wall/129428/. Accessed 2014 Jul 4. 27. Weng J, Lim EP, He Q, Leung CK (2010) What do people want in microblogs? 38. Chappell B (2011). Occupy wall street: From a blog post to a movement. http:// measuring interestingness of hashtags in twitter. In: Data Mining (ICDM), 2010 www.npr.org/2011/10/20/141530025/occupy-wall-street-from-a-blog-post-to- IEEE 10th International Conference on. pp. 1121–1126. doi:10.1109/ a-movement. Accessed 2014 Jul 4. ICDM.2010.34. 39. (2011) Occupy wall street gets union support. United Press International. 28. Yin Z, Cao L, Han J, Zhai C, Huang T (2011) Geographical topic discovery and Available: http://www.upi.com/Top_News/US/2011/09/30/Occupy-Wall- comparison. In: Proceedings of the 20th international conference on World wide Street-gets-union-support/UPI-89641317369600/. Accessed 2014 Jul 4. web. New York, NY, USA: ACM, WWW’11, pp. 247–256. doi:10.1145/ 40. Kumar S, Barbier G, Abbasi MA, Liu H (2011) Tweettracker: An analysis tool 1963405.1963443. for humanitarian and disaster relief. In: Fifth International AAAI Conference on 29. Pozdnoukhov A, Kaiser C (2011) Space-time dynamics of topics in streaming Weblogs and Social Media, ICWSM. text. In: Proc. of the 3rd ACM SIGSPATIAL Int’l Workshop on Location-Based 41. Swets JA (1996) Signal detection theory and ROC analysis in psychology and Social Networks. New York, NY, USA: ACM, LBSN’11, pp. 1–8. doi:10.1145/ diagnostics: Collected papers. Lawrence Erlbaum Associates Mahwah, NJ. 2063212.2063223. 42. Rhoades SA (1993) The herfindahl-hirschman index. Fed Res Bull 79: 188. 30. Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? 43. Cover TM, Thomas JA (2012) Elements of information theory. Wiley- comparing data from twitters streaming api with twitters firehose. In: Interscience. International Conference on Weblogs and Social Media. pp. 400–408. 44. McClelland CA (1961) The acute international crisis. World Politics 14: 182– 31. Cheng Z, Caverlee J, Lee K (2010) You Are Where You Tweet: A Content- 204. Based Approach to Geo-locating Twitter Users. In: Proceedings of The 19th 45. McClelland CA (1968). Access to berlin: the quantity and variety of events, ACM International Conference on Information and Knowledge Management. 1948-1963. Available: http://www.econbiz.de/Record/access-to-berlin-the- Toronto, Ontario, Canada: International Conference on Information and quantity-and-variety-of-events-1948-1963-mcclelland-charles/10002418818. Knowledge Management, pp. 759–768. doi:10.1145/1871437.1871535. Accessed 2014 Jul 4. 32. Li R, Wang S, Deng H, Wang R, Chang KCC (2012) Towards social user 46. Boydstun AE, Bevan DS, Thomas HF (2014) The importance of attention profiling: unified and discriminative influence model for inferring home diversity and how to measure it. Public Policy and Administration 42(2): 173– locations. In: Proceedings of the 18th ACM SIGKDD international conference 196. PLOS ONE | www.plosone.org 7 July 2014 | Volume 9 | Issue 7 | e102001

References (46)

  1. Tsukayama H (2013) Twitter turns 7: Users send over 400 million tweets per day. The Washington Post. Available'' http://www.washingtonpost.com/ business/technology/twitter-turns-7-users-send-over-400-million-tweets-per- day/2013/03/21/2925ef60-9222-11e2-bdea-e32ad90da239_story.html. Ac- cessed 2014 Jul 4.
  2. Kumar S, Morstatter F, Liu H (2014) Twitter Data Analytics. Springer.
  3. Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, et al. (2009) Life in the network: the coming age of computational social science. Science (New York, NY) 323: 721.
  4. Conte R, Gilbert N, Bonelli G, Cioffi-Revilla C, Deffuant G, et al. (2012) Manifesto of computational social science. The European Physical Journal Special Topics 214: 325-346.
  5. Rybski D, Buldyrev SV, Havlin S, Liljeros F, Makse HA (2012) Communication activity in a social network: relation between long-term correlations and inter- event clustering. Scientific reports 2.
  6. Gallos LK, Rybski D, Liljeros F, Havlin S, Makse HA (2012) How people interact in evolving online affiliation networks. Physical Review X 2: 031014.
  7. Ciulla F, Mocanu D, Baronchelli A, Gonc ¸alves B, Perra N, et al. (2012) Beating the news using social media: the case study of american idol. EPJ Data Science 1: 1-11.
  8. Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453: 779-782.
  9. Eagle N, Pentland AS, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences 106: 15274-15278.
  10. Havlin S, Kenett DY, Ben-Jacob E, Bunde A, Cohen R, et al. (2012) Challenges in network science: Applications to infrastructures, climate, social systems and economics. European Physical Journal-Special Topics 214: 273.
  11. Gao J, Hu J, Mao X, Perc M (2012) Culturomics meets random fractal theory: insights into long-range correlations of social and natural phenomena over the past two centuries. Journal of The Royal Society Interface 9: 1956-1964.
  12. Kenett DY, Portugali J (2012) Population movement under extreme events. Proceedings of the National Academy of Sciences 109: 11472-11473.
  13. Moat H, Curme C, Avakian A, Kenett DY, Stanley HE, et al. (2013) Quantifying wikipedia usage patterns quantifying wikipedia usage patterns before stock market moves. Scientific Reports 3: 1801.
  14. Preis T, Moat HS, Stanley HE (2013) Quantifying trading behavior in financial markets using google trends. Scientific Reports 3: 1684.
  15. Moat HS, Preis T, Olivola CY, Liu C, Chater N (2014) Using big data to predict collective behavior in the real world. Behavioral and Brain Sciences 37: 92-93.
  16. De Longueville B, Smith RS, Luraschi G (2009) ''OMG, from here, I can see the flames!'': A use case of mining location based social networks to acquire spatio- temporal data on forest fires. In: Proceedings of the 2009 International Workshop on Location Based Social Networks. New York, NY, USA: ACM, LBSN'09, pp. 73-80. doi:10.1145/1629890.1629907. URL http://doi.acm. org/10.1145/1629890.1629907.
  17. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real- time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web. New York, NY, USA: ACM, WWW'10, pp. 851-860. doi:10.1145/1772690.1772777. Available: http://doi.acm.org/10. 1145/1772690.1772777. Accessed 2014 Jul 4.
  18. Morstatter F, Lubold N, Pon-Barry H, Pfeffer J, Liu H (2014) Finding eyewitness tweets during crises. In: Association of Computational Lingustics Workshop on Language Technologies and Association of Computational Lingustics Workshop on Language Technologies and Computational Social Science.
  19. Pastor-Satorras R, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86: 3200-3203.
  20. Nagarajan M, Purohit H, Sheth A (2010) A qualitative examination of topical tweet and retweet practices. In: Fourth International AAAI Conference on Weblogs and Social Media. AAAI.
  21. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web. New York, NY, USA: ACM, WWW'10, pp. 591-600. doi:10.1145/ 1772690.1772751.
  22. Boyd D, Golder S, Lotan G (2010) Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In: System Sciences (HICSS), 2010 43rd Hawaii International Conference on. pp. 1-10. doi:10.1109/HICSS.2010.412.
  23. Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we RT? In: Proceedings of the First Workshop on Social Media Analytics. New York, NY, USA: ACM, SOMA '10, pp. 71-79. doi:10.1145/1964858.1964869. Available: http://doi.acm.org/10.1145/1964858.1964869. Accessed 2014 Jul4.
  24. Yang L, Sun T, Zhang M, Mei Q (2012) We know what @you #tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st international conference on World Wide Web. New York, NY, USA: ACM, WWW'12, pp. 261-270. doi:10.1145/2187836.2187872. Available: http://doi.acm.org/10. 1145/2187836.2187872. Accessed 2014 Jul 4.
  25. Romero DM, Meeder B, Kleinberg J (2011) Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th international conference on World wide web. New York, NY, USA: ACM, WWW'11, pp. 695-704. doi: 10.1145/1963405.1963503. URL.
  26. EfronM(2010) Hashtag retrieval in a microblogging environment. In: Proceed- ings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, SIGIR '10, pp. 787-788. doi:10.1145/1835449.1835616.
  27. Weng J, Lim EP, He Q, Leung CK (2010) What do people want in microblogs? measuring interestingness of hashtags in twitter. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on. pp. 1121-1126. doi:10.1109/ ICDM.2010.34.
  28. Yin Z, Cao L, Han J, Zhai C, Huang T (2011) Geographical topic discovery and comparison. In: Proceedings of the 20th international conference on World wide web. New York, NY, USA: ACM, WWW'11, pp. 247-256. doi:10.1145/ 1963405.1963443.
  29. Pozdnoukhov A, Kaiser C (2011) Space-time dynamics of topics in streaming text. In: Proc. of the 3rd ACM SIGSPATIAL Int'l Workshop on Location-Based Social Networks. New York, NY, USA: ACM, LBSN'11, pp. 1-8. doi:10.1145/ 2063212.2063223.
  30. Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? comparing data from twitters streaming api with twitters firehose. In: International Conference on Weblogs and Social Media. pp. 400-408.
  31. Cheng Z, Caverlee J, Lee K (2010) You Are Where You Tweet: A Content- Based Approach to Geo-locating Twitter Users. In: Proceedings of The 19th ACM International Conference on Information and Knowledge Management. Toronto, Ontario, Canada: International Conference on Information and Knowledge Management, pp. 759-768. doi:10.1145/1871437.1871535.
  32. Li R, Wang S, Deng H, Wang R, Chang KCC (2012) Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, KDD '12, pp. 1023-1031. doi:10.1145/2339530.2339692.
  33. Bennett S (2011). Twitter: Faster than earthquakes. Media Bistro. Available: http://www.mediabistro.com/alltwitter/twitter-earthquake-video_b13147. Ac- cessed 2014 Jul 4.
  34. Mourtada R, Salem F (2011) Civil movements: The impact of facebook and twitter. Arab Social Media Report 1.
  35. Huang C (2011) Facebook and twitter key to arab spring uprisings: report. The National Abu Dhabi Media 6.
  36. Campbell DG (2011) Egypt Unshackled: Using Social Media to @#:) the System. Amherst, NY: Cambria Books.
  37. Berrett D (2011) Intellectual roots of wall st. protest lie in academe. The Chronicle of Higher Education. Available: http://chronicle.com/article/ Intellectual-Roots-of-Wall/129428/. Accessed 2014 Jul 4.
  38. Chappell B (2011). Occupy wall street: From a blog post to a movement. http:// www.npr.org/2011/10/20/141530025/occupy-wall-street-from-a-blog-post-to- a-movement. Accessed 2014 Jul 4.
  39. 2011) Occupy wall street gets union support. United Press International. Available: http://www.upi.com/Top_News/US/2011/09/30/Occupy-Wall- Street-gets-union-support/UPI-89641317369600/. Accessed 2014 Jul 4.
  40. Kumar S, Barbier G, Abbasi MA, Liu H (2011) Tweettracker: An analysis tool for humanitarian and disaster relief. In: Fifth International AAAI Conference on Weblogs and Social Media, ICWSM.
  41. Swets JA (1996) Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Lawrence Erlbaum Associates Mahwah, NJ.
  42. Rhoades SA (1993) The herfindahl-hirschman index. Fed Res Bull 79: 188.
  43. Cover TM, Thomas JA (2012) Elements of information theory. Wiley- Interscience.
  44. McClelland CA (1961) The acute international crisis. World Politics 14: 182- 204.
  45. McClelland CA (1968). Access to berlin: the quantity and variety of events, 1948-1963. Available: http://www.econbiz.de/Record/access-to-berlin-the- quantity-and-variety-of-events-1948-1963-mcclelland-charles/10002418818. Accessed 2014 Jul 4.
  46. Boydstun AE, Bevan DS, Thomas HF (2014) The importance of attention diversity and how to measure it. Public Policy and Administration 42(2): 173- 196.
About the author
Boston University, Department Member

For over a decade I have been performing data driven quantitative research on financial markets, uncovering patterns, structure, dynamics, and sources of risk within and between different markets. I have been developing econometric and network based models to quantify patterns in different markets, in different data sampling frequencies. I have investigated different markets in the U.S. financial sector (equity, CDS, bonds, repo), as well as equity markets in other countries (for example Israel, U.K., Germany), as well as the banking sector in the U.S., Venezuela, Mexico and israel. My most recent position has been with the U.S. Department of the Treasury, Office of Financial Research, where I modeled and monitored changes, threats and vulnerabilities to the financial system. I have been studying cross border spillovers and vulnerabilities, and have developed new models to account for spillover and contagion channels. Some of the projects I worked on include developing models and tools to understand patterns of comovement in financial markets, model interconnectedness and transmission channels for cross asset classes and cross border spillover and contagion effects, develop new stress test models, develop models to study the interaction between operational and cyber risks with financial stability risks, and new measures of liquidity in equity markets. Prior to that I was a researcher at Boston University, and a visiting scientist in both the Israel Securities Authority, and the Bank of Israel.

Papers
53
Followers
148
View all papers from Dror Kenettarrow_forward