Measuring objective and subjective well-being: dimensions and data sources

Lorenzo Gabrielli

doi:10.1007/S41060-020-00224-2

Outline

Measuring objective and subjective well-being: dimensions and data sources

Lorenzo Gabrielli

International Journal of Data Science and Analytics

https://doi.org/10.1007/S41060-020-00224-2

visibility

…

description

31 pages

Cited by 32 papers

Abstract

Well-being is an important value for people's lives, and it could be considered as an index of societal progress. Researchers have suggested two main approaches for the overall measurement of well-being, the objective and the subjective well-being. Both approaches, as well as their relevant dimensions, have been traditionally captured with surveys. During the last decades, new data sources have been suggested as an alternative or complement to traditional data. This paper aims to present the theoretical background of well-being, by distinguishing between objective and subjective approaches, their relevant dimensions, the new data sources used for their measurement and relevant studies. We also intend to shed light on still barely unexplored dimensions and data sources that could potentially contribute as a key for public policing and social development. Keywords Well-being • Objective well-being • Subjective well-being • Well-being dimensions • New data sources • Data science for social good • Artificial intelligence for social good B Luca Pappalardo

International Journal of Data Science and Analytics https://doi.org/10.1007/s41060-020-00224-2 REVIEW Measuring objective and subjective well-being: dimensions and data sources Vasiliki Voukelatou1 · Lorenzo Gabrielli2 · Ioanna Miliou3 · Stefano Cresci4 · Rajesh Sharma5 · Maurizio Tesconi4 · Luca Pappalardo2 Received: 18 July 2019 / Accepted: 8 May 2020 © The Author(s) 2020 Abstract Well-being is an important value for people’s lives, and it could be considered as an index of societal progress. Researchers have suggested two main approaches for the overall measurement of well-being, the objective and the subjective well-being. Both approaches, as well as their relevant dimensions, have been traditionally captured with surveys. During the last decades, new data sources have been suggested as an alternative or complement to traditional data. This paper aims to present the theoretical background of well-being, by distinguishing between objective and subjective approaches, their relevant dimensions, the new data sources used for their measurement and relevant studies. We also intend to shed light on still barely unexplored dimensions and data sources that could potentially contribute as a key for public policing and social development. Keywords Well-being · Objective well-being · Subjective well-being · Well-being dimensions · New data sources · Data science for social good · Artificial intelligence for social good 1 Introduction being in society, mainly because it is strongly linked with the standard of living indicators [1]. However, GDP has been Economists and policy-makers have traditionally considered criticized as a weak indicator of well-being and, therefore, a gross domestic product (GDP) as a good indicator of well- misleading tool for public policies [2]. The Stiglitz Commis- sion [3] in 2009 observed that other statistical tools should B Luca Pappalardo be used, complementary to GDP, for the measurement of [email protected]; [email protected] well-being. Therefore, considering that well-being is diffi- Vasiliki Voukelatou cult to be captured only with GDP, researchers with various [email protected] backgrounds, from economists to psychologists, suggested Lorenzo Gabrielli two main approaches to measuring the overall well-being; [email protected] objective well-being and subjective well-being. Ioanna Miliou Defining objective well-being has always been considered [email protected] a challenging task, and therefore researchers have focused on Stefano Cresci exploring its dimensions rather than its definition [4,5]. It is [email protected] due to its objective nature that one could claim that objective Rajesh Sharma well-being could be measured in terms of GDP. However, it [email protected] must reflect both people’s material living conditions and the Maurizio Tesconi quality of their lives. In fact, the Organisation for Economic [email protected] Co-operation and Development (OECD) [6], the United Nations Development Programme (UNDP) [7] and the Italian 1 Scuola Normale Superiore and ISTI-CNR, Pisa, Italy Statistics Bureau (ISTAT) [8] have identified six major objec- 2 ISTI-CNR, Pisa, Italy tive and observable dimensions for its measurement: health, 3 University of Pisa, Pisa, Italy job opportunities, socioeconomic development, environment, 4 IIT-CNR, Pisa, Italy safety, and politics. All these dimensions together represent 5 the objective well-being, which is assessed through the extent University of Tartu, Tartu, Estonia 123 International Journal of Data Science and Analytics to which these “needs” are satisfied. The objective approach ficial Intelligence for Social Good” (AI4SG) [39], since it investigates the objective dimensions of a good life, whereas could work as a reference point for adequate measurement the subjective approach examines people’s subjective evalua- of well-being with the use of innovative data sources and tions of their own lives. In 2013, the OECD [9] recognized the tools. In particular, at this critical moment that the global importance of taking into consideration people’s perceived society is under financial and political crisis and instabil- well-being, labeled as subjective well-being when investigat- ity, policy-makers need frequent updates of well-being. This ing the overall well-being. Subjective well-being, also called could facilitate them to react on time on applying the right happiness, has been defined by Veenhoven [10], as the degree policies to prevent detrimental societal effects and contribute to which an individual assesses the overall quality of her effectively to societal progress. life-as-a-whole favorably. This might as well be different as The remainder of this paper is organized as follows: It is compared to GDP, which cannot be representative of societal divided into two main sections, as suggested from the liter- happiness. Indeed, GDP explains only a small proportion of ature, i.e. objective and subjective well-being. In particular, its variations on humans [11], and it might be different from Sect. 2 is dedicated to objective well-being and Sect. 3 is dedi- people’s perceptions of their well-being [12]. Therefore, sub- cated to subjective well-being. For both sections, we provide jective well-being has been traditionally captured through a theoretical background on objective and subjective well- studies based on data collected by self-reports. These stud- being and their dimensions respectively. We then provide the ies highlight five main dimensions of subjective well-being: data sources used for monitoring well-being. Besides, we the role of human genes, which seem to be fairly heritable present essential studies on well-being; to present them in an [13–21], universal needs, meaning basic and psychological organized flow, we categorize the presentation of the studies needs [22–24], social environment, such as education and by matching each well-being dimension separately with each health [25–29], economic environment, including a lot of data source. Finally, in Sect. 4, we provide a discussion on research on income [30–34], and political environment, such the study, highlighting the opportunities for future research as democracy and political freedom [35,36]. on well-being. Traditionally, both objective and subjective well-being are measured through surveys of household income and con- sumption [37]. Although these surveys have been considered 2 Measuring objective well-being accurate and valid, they bring some considerable disadvan- tages. For example, they cannot provide constant updates of Suggesting a single definition of objective well-being is a well-being to policy-makers, and they have high costs to be substantial challenge, mainly due to its multi-dimensionality. conducted, making it difficult for many developing coun- Therefore, researchers have focused on carefully specify- tries to estimate well-being frequently. The last few years ing its objectively measurable dimensions [4,5]. Objective have witnessed a drastic change in the approaches used to well-being is traditionally captured through surveys, such as measure well-being. Researchers of different disciplines pro- household income and consumption surveys [37]. However, pose several innovative data sources and methods, which usually, such surveys are very costly and time-consuming could potentially overcome the limitations of the traditional [40], making it difficult for many countries and global insti- methods for the individual and collective well-being mea- tutes to update their estimates frequently. Therefore, the last surement, both objective and subjective. few years have witnessed a change in the way of mea- To support research in this direction, the European project suring objective well-being. In particular, researchers of SoBigData [38] has created a virtual environment within a various disciplines propose several methodologies to mea- research infrastructure that provides theoretical knowledge, sure individual and collective objective well-being, based on data, and innovative methods to scholars that want to address a combination of new data sources and traditional surveys challenging questions involving both objective and subjec- [41–44]. The United Nations also stimulate this change of tive well-being. studying well-being in two recent reports, where the usage Therefore, in line with the purposes mentioned above and of new, mostly big, data sources, is encouraged for the inves- the support of SoBigData, the aim of this paper is to provide tigation of patterns of phenomena related to people’s health the theoretical background on objective and subjective well- and well-being [45,46]. being, including their relevant dimensions. Additionally, the article seeks to present to researchers the new data sources 2.1 The dimensions of objective well-being used for capturing well-being, as well as discuss indicative existing studies. During the last years, public institutions and non-governmen- We believe that this study offers great value to the scien- tal companies have worked on identifying dimensions that tific community and especially to researchers interested in are considered essential for the improvement of the societal “Data Science for Social Good” (DS4SG) or similarly “Arti- well-being and its comparison between countries and years. 123 International Journal of Data Science and Analytics Fig. 1 The figure relates the sources of data (left) with the dimensions of the objective well-being (right) For example, the Organisation for Economic Co-operation primary disability and mortality factors in OECD countries. and Development (OECD) has identified 11 essential top- Fortunately, some indicators can help prevent the diseases ics labeled as OECD well-being framework [6]; the United mentioned above. For example, the number of people who Nations Development Programme (UNDP) has identified 17 are driving carefully, who are non-smokers or who do not sustainable development goals, labeled as SDGs [7]; and the drink a large amount of alcohol, are risk-indicators, which, if Italian Statistics Bureau (ISTAT) has created an ambitious taken into consideration, could contribute to an improvement project named “Benessere Equo e Sostenibile” (BES) that in the health status of a territory. stands for “Fair and Sustainable Well-being” [8]. From the initiatives mentioned above, it is evident that for different 2.1.2 Job opportunities institutions, well-being dimensions might be different, some- times vague, and statistically hard to be captured. Therefore, This is a crucial dimension of well-being since it has obvi- based on the aforementioned official authorities, we suggest ous economic and societal benefits, contributing to people’s the following concrete and measurable dimensions of well- health and societal, political, and economic stability. The job being (Fig. 1). opportunities dimension is composed of three main deter- minants: employment rate, quality of work, and work–life 2.1.1 Health balance. The employment rate is a crucial aspect since indi- viduals in countries with a high level of employment, are Health status represents an essential factor for people’s well connected in society. In particular, it is a proxy used well-being, as shown by the WHO Commission on Macroe- by policy-makers to avoid poverty and social exclusion. conomics and Health in 2001 at global level [47], and by the The second determinant is the quality of work, in terms of Lisbon Strategy for Growth and Jobs in 2000 [48]. Health objective working stability, retribution, skills, and safety at brings together many other benefits, from job opportuni- work, which might show some differences between different ties to social relationships, from reduced health care costs working environments. Moreover, work-life balance is the to an increased life expectancy. Indeed, there have been determinant that mainly aims to capture the balance between remarkable gains in life expectancy over the past 50 years work and life. In the OECD countries, a full-time worker in OECD countries [49], due to the health care spending devotes 62% of the day on average (15 hours) on personal growth, lifestyle, educational, and environmental changes. care (e.g., eating, sleeping) and leisure (e.g., socializing with Chronic (non-communicable) diseases, such as cancer, dia- friends and family, hobbies) [50]. This determinant is mainly betes, and chronic respiratory conditions, are nowadays the created to capture women’s work-life balance. Indeed, the 123 International Journal of Data Science and Analytics quality of a country’s employment is measured by the bal- 2.1.5 Safety ance women have between family care and paid work. It includes the risk of people being physically assaulted, 2.1.3 Socioeconomic development falling victims, and suffering from other crimes, such as economic loss, physical damage, and psychological post- While socioeconomic indicators alone do not suffice to rep- traumas stress. Reducing violent crime, sex trafficking, resent societal well-being, it cannot be doubted that they forced labor, and child abuse are clear global goals, as sug- positively influence it. The variables that contribute to its gested by the United Nations [7]. Besides, the Italian BES measurement are income, wealth, consumption expenditure, project [8] suggests that safety is characterized by two deter- housing conditions, and possession of consumer durables, minants: criminality and violence. and it can implicitly influence access to university, health Criminality is one of the most common security threats in care, and more. In particular, the Organization for Economic developed and emerging countries, and it has both a direct Co-operation and Development (OECD) [6] and the Italian and indirect impact on people. It directly influences individ- Statistics Bureau (ISTAT) [8] suggest two main determinants uals’ health (physical and mental) and economic situation. that constitute the overall economic well-being: available According to the latest OECD data, the average homicide income and wealth, and consumption expenditure. rate in the OECD is 3.6 murders per 100,000 inhabitants In a market economy, income measures the purchasing [53]. Indirectly, criminality has an impact on non-victims’ capacity of individuals, and it is thus an essential predictor well-being when being on victims’ social network or by news of economic well-being. Wealth, on the other hand, takes spread on (social) media. into account savings, monetary gold, stocks, securities, and Another determinant is violence suffered inside and out- loans [51]. Therefore, wealth could be considered an essential side the family and it has both a direct and indirect impact on source of revenue, which could make people less vulnerable people. In particular, victims suffer from the direct effects, to difficult economic situations that might affect their life. which can last for long periods, if not for the whole life, Additionally, consumption expenditure is a direct estimate depending on individuals’ ability to manage their daily life, of the goods and services that contribute to determining the medical expenses, dependence on others, and capacity to living conditions of individuals. Unlike income, consumption achieve happiness. Indirectly, it causes insecurity and anxi- expenditure can contribute to making interpersonal compar- ety, which brings difficulties in their daily activities [54]. isons, since it captures whether each individual can acquire her desired goods and services. 2.1.6 Politics 2.1.4 Environment This dimension is also essential for objective well-being. Today, due to the economic crisis, more than ever, citizens A healthy natural environment is essential for all individuals’ demand greater transparency from their Governments and well-being in society. Clean water, clear air, and uncontam- the Public Institutions. Fair civic and political participation, inated food are examples of goods that can only be possible as well as transparency, do not only contribute directly to in an environmental context where humans’ productive and well-being but also indirectly since they allow greater effi- social activities are made with respect to the environment and ciency of public policies, a lower cost of transactions, and its natural resources. For the reasons mentioned above and the minimization of the risk of fraud. Therefore, two deter- due to the recent environmental crisis, the United Nations minants fall under this category, which are associated with set sustainable environmental goals [7], such as Clean Water the Public Sphere as a driver of the individuals’ well-being, and Sanitation, Climate Action, and more. Similarly, ISTAT on either local or national level: civic and political engage- [8] suggests five determinants for describing the interactions ment, and trust and social cohesion. Voter turnout is the best between society and the environment that are connected. existing means of measuring civic and political engagement, These determinants are quality of the water, quality of the and is measured as the percentage of the registered population air, quality of the soil and the land, biodiversity, and matter, that voted during elections. According to OECD data, voter energy, and climate change. Finally, the “OECD Environ- turnout, is averaged 69% in OECD countries, which shows mental Outlook to 2050” projects the number of premature that not everyone exercises the voting right [55]. Regarding deaths associated with exposure to PM10 and PM2.5 to trust and social cohesion, OECD suggests public engagement increase from just over 1 million worldwide in 2000 to about (e.g., stakeholder engagement) for developing regulations 3.5 million in 2050 [52]. Therefore, the more these deter- [55]. If citizens have the possibility to participate in the devel- minants are taken into consideration by policy-makers and opment of laws and regulations, it is more likely that they will by citizens’ activities, the more the citizens can contribute to trust the government institutions and they will comply with radical changes for the protection of societal well-being. the societal rules. 123 International Journal of Data Science and Analytics Table 1 Pros and cons for each data source used for the measurement Table 2 Example of Call Detail Records (CDRs). Every time a user of objective well-being makes a call, a record is created with timestamp, the phone tower serving the call, the caller identifier and the callee identifier (a). For each tower, Data source Pros Cons the latitude and longitude coordinates are available to map the tower on the territory (b) CDRs Temporal and social Not publicly available, dimensions, world sparsity, (a) Timestamp Tower Caller Callee wide diffusion, geographically repeatability imprecise 2007/09/10 23:34 36 4F80460 4F80331 GPS and Coverage of rural Privacy issues, indoor 2007/10/10 01:12 36 2B01359 9H80125 transporta- areas, unbiased and spatial inaccuracy 2007/10/10 01:43 38 2B19935 6W1199 tion classified, real-time monitoring .. .. .. .. . . . . Social Media Measuring social Privacy issues, dynamics, publicly overrepresentation, (b) Tower Latitude Longitude available social desirability bias 36 49.54 3.64 Health and Cost effective, Not publicly available, Fitness applicable for not necessarily 37 48.28 1.258 multiple studies, representative of the 38 48.22 -1.52 prediction of population, limited .. .. .. near-term risk of time slots . . . events News Variety of subject Gatekeeping bias, domains, range of coverage bias, targets, 24/h updated, statement bias For example, B3 indicates the link between GPS data (B) and archived historical socioeconomic development (3). news Retail Modeling of dynamic Dependency on Scanners household behavior, retailer’s permission, 2.2.1 CDRs control time-invariant legal constraints characteristics, long Many works in the literature are based on the analysis of term coverage, quality improvement mobile phone data, the so-called CDRs (Call Detail Records) of HICP of calling and texting activity of users, because they guarantee Web Search Publicly available, Population size varies the repeatability of experiments in different countries and speed, convenience, across domains, hard on different scales given the worldwide diffusion of mobile flexibility, ease of identifying relevant phones [56]. analysis queries CDRs collect geographical, temporal, and interaction Crowdsourcing Large number of data, Risk of low-quality information on mobile phone use [57–62], hence providing a speed, relative low results, trade-off cost between quality and comprehensive picture of human behavior at a societal scale. cost Each time an individual makes a call, the mobile phone oper- ator registers the connection between the caller and the callee, the duration of the call, and the coordinates of the phone tower communicating with the served phone. Table 2 illustrates an 2.2 Data sources for monitoring the dimensions of example of the structure of CDRs. objective well-being Note that CDRs suffer from different types of bias [63,64]. For example, the position of a user is known at the granularity Figure 1 describes the new data sources (left) that have been level of phone towers, and only when they make a phone used to estimate one or more dimensions of objective well- call. Moreover, phone calls are sparse in time, i.e., the time being (right). The presence of a link in Fig. 1 between a between consecutive calls follows a heavy tail distribution data source and a dimension indicates that there are papers [65,66]. In other words, since users are inactive most of their in the literature on monitoring that dimension with that data time, CDRs allow reconstructing only a subset of a user’s source. In this section, we describe, for each data source, its behavior. features (e.g., the process of data collection, its biases and CDRs are used to monitor several dimensions of well- limitations) and the main works in the literature that use it to being, notably health (A1), job opportunities (A2), socioe- measure several dimensions of objective well-being. Table 1 conomic development (A3), environment (A4), and safety provides a summary of the data sources used, highlighting (A5). the pros and cons of each one. We refer to a link between a CDRs provide one of today’s most exciting opportunities data source and a dimension using a letter-number notation. to study human mobility and its influence on disease dynam- 123 International Journal of Data Science and Analytics ics (A1). Many researchers use mobile phone data for public in the UK, regional communication diversity is positively health, as the analysis of individual and population mobility associated with a socioeconomic ranking [75]. Other works patterns is more objective and with finer spatiotemporal res- address the issue of mapping poverty [76] and other socioeco- olution in comparison to traditional methods. Furthermore, nomic determinants [77] with mobile phone communication mobile network data can also provide insights into human data, combined with airtime credit purchases data in the behavior that can support the assessment and monitoring of Ivory Coast [78]. Blumenstock et al. [79,80] show prelim- the health of specific communities at risk, thus paving the inary evidence of a relationship between individual wealth way toward improved health promotion and prevention [67]. and the history of mobile phone transactions. Frias-Martinez Taking into consideration that the spatiotemporal evolution et al. [81–84] analyze the relationship between human mobil- of human mobility and the related fluctuations of population ity and the socioeconomic status of urban zones, presenting density are essential drivers of disease outbreaks, Finger et al. which mobility indicators correlate best with socioeconomic [68] use CDRs to track the cholera outbreak in 2005 in Sene- levels and building a model to predict the socioeconomic gal. Findings show that a mass gathering taking place during level from mobile phone traces. Pappalardo et al. [85] analyze the initial phase of the outbreak has an essential impact on the mobile phone data and extract meaningful mobility measures course of the disease. Besides, Kafsi et al. [69] contribute to for cities, discovering an interesting correlation between the fight against epidemics of infectious diseases using CDRs human mobility aspects and socioeconomic determinants. provided by France Telecom-Orange. They use 2.5 billion Lotero et al. [86] analyze the architecture of urban mobility calls made by 5 million users in the Ivory Coast, recorded networks in two Latin-American cities from the multiplex over 5 months, from December 2011 to April 2012, to study perspective. They discover that the socioeconomic character- and model behavioral patterns of the affected population and istics of the population have an extraordinary impact on the propose several strategies for personalized behavioral rec- layer organization of these multiplex systems. In a successive ommendations to reduce the infections. Lima et al. [70] use work, Lotero et al. [86] analyze urban mobility in Colombia the same data set to build a model that describes how diseases representing cities by mobility networks. They encode the circulate the country as people move between regions, and origin-destination trips performed by a subset of the popula- they enhance the model with a concurrent process of relevant tion corresponding to a particular socioeconomic status and information spreading. This process corresponds to people they show that spatial and temporal patterns vary across these disseminating disease prevention information, e.g., hygiene socioeconomic groups. Amini et al. [87] use mobile phone practices, vaccination campaign notices, and others, within data to compare the human mobility patterns of a developing their social network. Finally, Madan et al. [71] use CDRs and country (the Ivory Coast) and a developed country (Portu- mobile phone-based co-location sensing to measure charac- gal). They show that cultural diversity in developing regions teristic behavior changes in symptomatic individuals. These can present challenges to mobility models defined in less cul- behavior changes are reflected in their total communica- turally diverse regions. Smith-Clarke et al. [88] analyze the tion, interactions with respect to time of day, diversity, and aggregated mobile phone data of two developing countries entropy of face-to-face interactions and movement. Using and extract features that are strongly correlated with poverty these extracted mobile features, they manage to predict the indexes derived from official statistics census data. health status of an individual, without having actual health Moreover, researchers use CDRs to monitor the quality measurements from the subject. of the environment and its impact on people’s lives (A4). Besides, researchers use CDRs to study job opportuni- For example, Picornell et al. [89] evaluate the population ties (A2). Pappalardo et al. [72] use CDRs to study the exposure to NO2 on a research published recently. They use link between human mobility and the employment rate of CDRs from one of the three most important Spanish mobile French cities, finding a strong correlation between measures phone network operators (MNOs), with around 30% mar- of mobility entropy and the unemployment rate in urban envi- ket share. The analysis is conducted for the capital of Spain, ronments. Toole et al. [73] show that changes in the calling Madrid, for the 17th of November 2014, as a typical day behaviors of individuals, aggregated at regional level, can in terms of population mobility and NO2 levels. Compar- improve forecasts of macro unemployment rates. Sunds et al. ing the results with traditional census-based methods, they [74], use CDRs to create a model which predicts unemploy- demonstrate relevant discrepancies at disaggregated levels ment with a 70.4% of accuracy. They also provide promising and underline the importance of integrating CDRs data for support to the collection of data for populations in develop- the evaluation of population exposure to NO2 . Lu et al. [90] ing countries, which are often under-represented in official study people’s behavior affected by climate stress. In partic- surveys. ular, by exploring the individuals’ behavioral response to the Most of researchers use CDRs to investigate socioeco- Cyclone Mahasen, which struck Bangladesh in May 2013, nomic development (A3). A seminal work analyzes landline they find out that anomalous patterns of mobility and call- calls and a nationwide mobile phone data set to show that, ing frequency correlate with rainfall intensity, showing the 123 International Journal of Data Science and Analytics affected regions and when the storm moves. Lu and Bengts- Table 3 Example of GPS records son [91,92] analyze the movement of 1.9 million mobile Vid Timestamp Latitude Longitude phone users before and after the 2010 Haiti earthquake, and they show that CDRs can be a valid data source for estimates 63 2014-06-18 06:31:24 43.557703 10.337913 of population movements during disasters. Wilson et al. [93] 63 2014-06-18 06:31:26 43.557725 10.33794 build a tool within nine days of the Nepal earthquake of 2015, 63 2014-06-18 06:31:27 43.557735 10.337955 to provide spatiotemporally detailed estimates of population .. .. .. .. . . . . displacements from CDRs based on movements of 12 million mobile phones users. Nyarku et al. [94] use CDRs to explore The collected GPS data consist of the sequence of space-time detec- whether mobile phones could be reliably used to monitor tions of vehicles on which the positioning device is installed. Every time a vehicle switches on, a record is created consisting of the vehicle individual exposure to selected air pollutants when moving identifier, timestamp, the latitude and longitude coordinates between indoor and outdoor microenvironments. In particu- lar, data are collected from two BROAD life mobile phones, which are equipped with sensors for direct measurements of air pollutants. The two phones bring similar results, both for GPS data can also cover rural areas, as opposed to other particles and formaldehyde, making them potentially suit- data, mostly collected among citizens of urban areas [104]. able for applications in polluted environments, even if there Comparing to the traditional ways of measuring mobility, seem to be some exceptions where the readings of the two usually by self-reports assessed with questionnaires, GPS phones do not correspond well to each other. Liu et al. [95] does not bring any biases and misclassification, [104,105], map personal trajectories using mobiles in an urban envi- as it eliminates the social desirability usually brought by self- ronment to assess the impact of traffic-related air pollution report participants [106,107]. Another advantage of GPS data in society. They estimate traffic pollution exposure to indi- is that they provide real-time monitoring. However, while viduals based on the exposure along the individual human there are studies based on GPS data covering hundreds of trajectories in the estimated pollution concentration fields by thousands of individuals [108] most of the GPS studies are utilizing modeling tools and manage to identify trajectory conducted with fewer than 300 participants [104,109], usu- patterns of particularly exposed human groups. In addition, ally due to privacy issues. Apart from this drawback, when Decuyper et al. [96] use CDRs to study food security indi- a GPS is used indoors, the spatial accuracy of the measure- cators finding a strong correlation between the consumption ments is fairly detected [110], which creates problems in of vegetables rich in vitamins and airtime purchase. specific fields, such as on epidemiology research. Other studies focus on the safety dimension (A5). Bogo- GPS data are used to explore several dimensions of molov et al. [97] use CDRs for 3 weeks from the 9th to the objective well-being, notably health (B1), socioeconomic 15th of December 2012 , and from the 23rd December 2012 development (B3), and safety (B5). to the 5th of January 2013, in combination with demographic Health (B1) exploration has also attracted the interest of data from December 2012 to January 2013, to predict crime in researchers. For example, Saelens et al. [111] track the move- the city of London. Experimental results show 70% of accu- ments of an individual through GPS devices and bring to the racy in predicting whether an area could be a crime hotspot surface growing evidence that transit users are more phys- or not. Similarly, Ferrara et al. [98] study criminal networks ically active than non-transit users, which could potentially to detect and characterize criminal organizations in networks lead to the health improvement of the first ones. Similarly, reconstructed from the CDRs. They also introduce an expert Rundle et al. [112] explore health in terms of physical activ- system to support law enforcement agencies in unveiling the ity, and conclude that neighborhood walkability influences underlying structure of criminal networks. other residents’ choice of space utility and is also associated with higher weekly physical activity. Additionally, Sadler et 2.2.2 GPS and transportation data al. [113] use GPS data to understand children’s exposure to junk food in Canada and compare the results to a validated Since the 1990s, Global Positioning Systems (GPS) have food environment database. They demonstrate that official been used for tracking the movements of the individuals results underestimate exposure to junk food up to 68%, which [99–102]. In particular, GPS data provide time and location should be taken into consideration by policy-makers. Finally, coordinates information, which can be used to link locations Canzian and Musolesi [114] analyze mobility patterns from with environments and to calculate the speed of movements GPS traces to answer whether mobile phones can be used [103]. For insurance reasons, some vehicles have a black box to monitor individuals affected by depressive mood disor- installed. The device records the position of the vehicle at reg- ders. They develop a smartphone application that periodically ular intervals and sends it to the database. Table 3 illustrates collects the locations of the users and the answers to daily an example of the structure of GPS records. questionnaires that quantify their depressive mood. They find 123 International Journal of Data Science and Analytics Table 4 The table contains a Id Coordinates Hashtags Mentions Text Profile info … subset of the information returned by a Twitter API 240556 null #ny #dinner [10214;452879] ….. {…..} … 4261063 NY null null ….. {…..} … 72096 42.10;10.2 #wellbeing [964215] ….. {…..} … If the user activates a localization system, the tweet also contains information on the position (longitude, latitude or city) from which the tweet is sent. Each tweet contains the information of the user profile and mentions or hashtags used in the text a significant correlation between mobility trace characteris- interactions with other users, or tags inserted in the tweet. tics and the depressive moods of individuals. Twitter also returns some information about the user pro- Some of these works using GPS data focus on exploring file. Table 4 illustrates an example of the structure of Twitter socioeconomic development (B3). Marchetti et al. [115] per- records. form a study at regional level, analyzing GPS tracks from cars Despite their indubitable usefulness, social media data in Tuscany to extract measures of human mobility at province may also encounter some concerns [121]. First of all, they and municipality level. They find that there is a strong cor- may reflect social desirability biases, since individuals man- relation between the mobility measures and a poverty index age their online profiles [122]. Besides, social media users independently surveyed by the Italian official statistics insti- may not be as representative of the general population as tra- tute. Smith et al. [116] use an automated fare collection data ditional anonymized self-reports conducted through a chosen set of journeys made on the London rail system to build a representative sample [123]. classifier that identifies areas of the city with high economic All dimensions of objective well-being are monitored deprivation. They highlight that, given its high precision, the through social media data, i.e., health (C1), job opportunities classifier provides potential benefits for city planning and (C2), socioeconomic development (C3), environment (C4), policy-making. Lathia et al. [117] use the same data set to find safety (C5) and politics (C6). that more deprived areas tend to receive passenger flow from Several studies provide valuable insights into how the a higher number of other areas compared to less deprived analysis of social media data can lead to next-generation areas, also uncovering some evidence of social segregation. automated methodologies for public health (C1). As an Another objective well-being dimension that is explored example, Eichstaedt et al. [123] use Twitter data, in com- with GPS data is safety (B5). Robinson et al. [118] com- bination with atherosclerotic heart disease (AHD) mortality pare the spatial distribution of objective crime incidents and rates and country-level socioeconomic variables. They pre- self-reported physical activity among adolescents in Mas- dict country-level heart disease mortality since the language sachusetts, between 2011 and 2012, and show that there is a expressed on Twitter reveals important psychological char- positive association between them (r = 0.72, p < 0.0001). acteristics that are significantly associated with heart disease Ariel et al. [119] use GPS data to replicate findings pub- mortality risk. Besides, De Choudhury et al. [124] use Twit- lished from US official research on the effect of hot spots ter data in combination with traditional depression screening policing for the prevention of crime in England and Wales test data for the detection and diagnose of the individuals’ and demonstrate that victim-generated crimes (the primary major depressive disorders and even to predict the likelihood outcome measured in previous studies) increase in both the of depression of individuals. Signorini et al. [125] use data near vicinity and in catchment areas. from Twitter to track rapidly-evolving public sentiment con- cerning H1N1 and to measure actual disease activity. They show that Twitter can be used as a measure of public interest 2.2.3 Social media data or concern about health-related events and that estimates of influenza-like illness derived from Twitter chatter accurately Social media, such as Twitter, Facebook, and Instagram, can track reported disease levels. Paul et al. [126] incorporate be considered as a digital database of information about in their forecasting models the historical influenza data and online users, hence rendering individuals’ online activi- Twitter data. Lampos et al. [127] measure the prevalence ties accessible for analysis. Given this enormous potential, of flu-like symptoms in the general UK population, based researchers, governments, and corporations are turning their on the contents of Twitter, searching for symptom-related interest on social media to understand human behavior and statements, turning this information into a flu-score and they interactions better [120]. Among all social media, Twitter obtain on average a statistically significant linear correlation is the most popular, since it provides public access to data which is higher than 95%. In a later work, the authors [128] through APIs with the least restrictive policy. The Twitter instead of choosing the keywords and phrases themselves, APIs return information about locations, date of the event, 123 International Journal of Data Science and Analytics they use machine learning algorithms to find out which words or nowcasting the damage produced by earthquakes by ana- in the database of tweets occurred more often at times of ele- lyzing social media communications in the aftermath of the vated levels of flu, and they obtained very positive results. event. The results of these models can also be displayed in They claim that flu epidemics can be detected based on Twit- real-time, interactive maps that highlight stricken areas and ter content. Chen and Yang [129] use individuals’ tweets with provide support to emergency responders. Notable examples spatiotemporally tagged information to demonstrate that peo- of this kind are the systems developed by Avvenuti et al. ple’s healthy diet is elicited by exposure to their immediate [141,142]. Preis et al. [143] find that the number of pho- food environment. tos taken and subsequently uploaded to Flickr with titles, Regarding the monitoring of job opportunities (C2), descriptions, or tags related to Hurricane Sandy bears a strik- Llorente et al. [130] quantify the extent to which deviations in ing correlation to the atmospheric pressure in the US state diurnal rhythm, mobility patterns, and communication styles New Jersey. They claim that appropriate leverage of such across regions relate to unemployment. For this purpose, they information could be useful to policy-makers and emergency examine country-wide Twitter data describing 19 million crisis managers. geo-located messages and find that the regions exhibiting Safety is another dimension that can be monitored using more diverse mobility fluxes, earlier diurnal rhythms, and data from social media (C5). For example, Chen et al. [144] more correct grammatical styles display lower unemploy- use Twitter data and create a model that predicts the specific ment rates. Antenucci et al. [131] use data from Twitter, time and location a crime occurs. This model combines ker- from July 2011 to early November 2013, to create indexes nel density estimation based on historical crime incidents and of job loss, job search, and job posting. They derive signals prediction via linear modeling with sentiment and weather by counting job-related phrases in tweets such as “lost my predictors. By adding the latter determinants, they show that job”. They construct social media indexes from the principal their model improves significantly with respect to existing components of these signals and manage to track events that models. Similarly, Boni et al. [145] use spatio-temporally affect the job market in real-time, such as Hurricane Sandy tagged tweets and create a model for crime prediction. In and the federal government shutdown. particular, they combine real crime data with individuals’ A large number of works in the literature focus on mon- micro-level movement patterns extracted from Twitter and itoring socioeconomic development from social media data demonstrate improved predictions. Likewise, Kadar et al. (C3). Bollen et al. [132], in a further study, analyze data [146] describe urban crime by using Foursquare and consid- from Twitter and consider the emotions of traders, rather ering these data as a measurement for the ambient population than their information gathering processes, suggesting that of a neighborhood, to further describe crime levels. They changes in the calmness of Twitter messages could be linked also confirm that such models improve the traditional mod- to changes in stock market prices. Still, regarding socioeco- els, based on census data. Additionally, the city of Chicago nomic development, social media data are also extensively applies text analytics on Twitter and 311 (the local emer- used to nowcast and forecast stock market prices and traded gency number) records to detect and prevent phenomena like volumes. Seminal works in this field leverage information rat infestations and to track civil unrest and violent crimes contained within investment discussion boards and blogs. (CrimeScan and CityScan software) [147–149]. For example, Bar-Haim et al. [133] use StockTwits data to Finally, the politics dimension (C6) is extensively stud- uncover relevant correlations between Web-derived indica- ied, in particular, during the last years with the rise of the tors and the stock market. In detail, they leverage sentiment political crisis across the world. Colleoni et al. [150] inves- scores of messages shared in the Yahoo message boards to tigate the political homophily on Twitter to classify users as find correlations with the stock market. In a different web Democrats or as Republicans based on their tweets. They platform study, De Choudhury et al. [134] try to find corre- show that, in general, the former exhibit higher levels of lations between the stock market and blog communications. political homophily than the latter. Goh et al. [151] use Face- Last, Cresci et al. [135,136] assess the risks and vulnerability book pages of a group of 12 politicians and demonstrate of stock markets to automation, manipulation, and disin- that political engagement can be achieved by creating social formation, with the ultimate goal of safeguarding people’s media consumption habits, as supported by the habit forma- investments. tion in consumption from macroeconomics. Similarly to the Researchers also use social media for the exploration of field of socioeconomic and financial analyses, social media the environment dimension (C4). Avvenuti et al. [137] claim data can be easily manipulated also for achieving political that the analysis of social media proves valuable for quickly goals [152,153]. As such, results of political analyses based acquiring situational awareness and estimates of the impact on social media should be carefully weighed to minimize of disasters. As an example of the predictive power of social issues related to biases and manipulations. media, Kryvasheyeu et al. [138], Avvenuti et al. [139] and Mendoza et al. [140] demonstrate the viability of predicting 123 International Journal of Data Science and Analytics Table 5 The table shows an example of clinical records, including the levels based on Continuous Glucose Monitoring (CGM). In pathology for which a patient is admitted to the hospital, the duration particular, they use data from the DirecNet Central Labo- of hospitalization and the medicines she/he took ratory, containing time series for 25 patients, who are less In date Out date Pathology Medicines than 18 years old. By training a deep learning model on a data set designed to explore the performance of CGM 01/02/2019 01/02/2019 Asthma m1,m2,m3 devices in children with Type I diabetes, they demonstrate 03/02/2019 08/03/2019 Head trauma m5 how deep neural networks can outperform shallow networks on this task. In addition, Santillana et al. [160] use a clin- ician’s database, named as UpToDate, to predict influenza 2.2.4 Health and fitness data epidemics in the United States promptly. They show that digital disease surveillance tools based on experts’ databases These data mainly consist of Electronic Health Records may be able to provide an alternative, reliable, and stable sig- (EHRs) and mobile application data that are mainly used nal for accurate predictions of influenza outbreaks. Besides for monitoring the health dimension (D1). EHRs, initially EHRs, mobile app data, such as lifestyle habits data concern- created for the facilitation of the billing and patient care, are ing eating and physical activity behaviors, are used for the widely used for clinical studies and clinical risk prediction. monitoring of objective well-being in terms of health (D1). Table 5 reports an example of clinical records concerning the These data demonstrate for once more that smartphones can hospitalization of some patients. contribute to research with valuable new insights, although Out of a systematic review, Goldstein et al. [154] demon- they might apply biases towards people with lower socioe- strate both opportunities and challenges of EHRs. On the conomic status or towards people who are more interested one hand, compared to the traditionally used cohort data in their health. In addition, such data collected through web developed and collected for research purposes (such as the surveys for research purposes might bring the disadvantages Framingham Heart Study [155]), EHRs are cost-effective. discussed before. A critical study using mobile app data is In contrast with cohort data, EHRs can indeed be used for conducted by Althoff et al.[161]. They use a data set consisted multiple health studies and, since they are collected at a high of physical activity for 717,527 Apple iPhone smartphone frequency, they allow a better prediction of near-term risk of users of the Azumio Argus app, which tracks users’ diet events. On the other hand, EHRs include only individuals that and fitness and other healthy behaviors, between July 2013 have been ill or at least have had a clinic visit, which could and December 2014. They demonstrate inequality in how the generate a problem of representativeness. Moreover, they are activity is distributed within countries and that this inequal- not publicly available and might include limited time slots. ity is a better predictor of obesity than average activity level. Researchers use EHRs to monitor several aspects of per- Similarly, Hayeri [162] uses continuous glucose monitors sonal health (D1). For example, Sultana et al. [156] use the (CGM) and fitness wearables (Fitbit) to predict blood glucose Integrated Primary Care Information (IPCI) database to look values. The study uses data gathered from each participant for elements that could contribute to traditional methodolo- for 60 days, where the data from the first 30 days are used to gies. For example, multimorbidity and polypharmacy are train the algorithm and the remaining 30 days to test the pre- elements that could help in identifying frailty methodologies. dictions. On average, the software is able to predict a user’s They demonstrate that the Mini-Mental State Examination future glucose values with a 93% accuracy rate for 60-mins score, which is the most commonly recorded data item, could ahead of time. be potentially used as a frailty identifier. Ghaderighahfarokhi et al. [157] use medical records of newborns in the educa- 2.2.5 News tional Hospitals affiliated to the Ilam University of Medical Sciences (from April 2015 to April 2016) to identify accurate News data sources, such as the GDELT database [163], predictors of Low Birth Weight (LBW). They demonstrate contain information extracted from the news of newspapers that LBW is a multi-factorial condition requiring a system- around the world. News records generally describe a variety atic and accurate program to be reduced, such as education of subject domains (e.g., economic events, political events), through mass media, repeated monitoring of pregnancy, and represent a wide range of targets (e.g., opposing politicians) others. Metzger et al. [158] use EHRs with Emergency [164] and are continuously updated, containing even archived Department patient visits in 2012, from Lyon University Hos- historical news of the last decades. Nevertheless, such data pital, to demonstrate that machine learning can contribute to contain three main biases [165]: the gatekeeping bias, i.e., more accurate estimations of suicide attempts in France, in the editors or the journalists decide on which event to pub- relation to the current national surveillance system based on lish; the coverage bias, related to the coverage of an event manual coding by emergency practitioners. Mhaskar et al. (e.g., western countries are over-covered, whereas African [159] investigate the 30 minutes prediction of blood glucose countries are under-covered); the statement bias, when the 123 International Journal of Data Science and Analytics Table 6 Subset of the main fields provided by GDELT platform EventCode EventCategory EventTone Date Country code Url 815176338 Arrest, detain − 70 20180110 US http://tiny.cc/s5s16y 815176339 Use conventional military force − 30 20180110 UK … 815176340 Consider policy option + 25 20180110 IT … content written by the journalist, even if tried to be objective, US news broadcasts (e.g., ABC World News Tonight) for is favorable or unfavorable towards certain events. Table 6 the period between 1995 and 2004. He demonstrates that shows an example of news records. 70% of the US television news provide balanced coverage News records are used to measure health (E1), socioe- on anthropogenic contributions to climate change compared conomic development (E3), environment (E4), and politics to natural radiative forcing. He also shows that there is a (E6) dimensions of objective well-being. significant difference between this television coverage and Emerging infectious diseases and the rise of modern tech- scientific consensus on the topic. nology have generated new demands and possibilities for News records are also used to understand the coverage disease surveillance and response (E1). Growing numbers of political issues (E6). Van Aelst and De Swert [175] use of outbreak reports must be assessed rapidly so that control daily news of politics of campaign periods, extracted from efforts can be initiated. For example, the World Health Orga- the Electronical News Archive over the 2003 to 2006 period, nization (WHO) sets up a process for timely disease outbreak and show that campaign periods have a high impact on the verification to convert large amounts of data from some 600 amount, style and actors of the political news in Belgium. sources, including all major news wires, newspapers, and To the best of our knowledge, the dimension politics (E6) biomedical journals, into accurate information for suitable has not yet been adequately explored through news data and action [166,167]. Brownstein et al. [168] in a similar effort, constitutes inspiration for future research. create HealthMap, a freely accessible, automated real-time system that monitors, organizes, integrates, filters, visual- izes, and disseminates online information about emerging 2.2.6 Scanner data diseases. Wilson et al. [169] use the HealthMap project to monitor listeriosis. Chunara et al. [170] use social and news Scanner data are generated by point-of-sales terminals in media to validly estimate the 2010 Haitian cholera outbreak. shops and provide information at the level of the single prod- News records on financial affairs and financial markets uct. Sales terminals record each transaction, and the resultant are intrinsically interlinked (E3). Alanyali et al. [171] quan- data can provide considerable insights into consumer pur- tify the relation between movements in financial news and chasing patterns. They can be obtained from a wide variety movements in financial markets by exploiting a corpus of six of retailers: supermarkets, pharmacies, do-it-yourself stores, years of financial news from 2007 to 2012 from the Finan- home electronics or clothing shops, and many others [176]. cial Times. Their results suggest that greater interest in a Scanner data are used from social researchers, as they can company in the news is related to greater interest in the cor- offer useful detailed information and the possibility to model responding company in stock markets. Lillo et al. [172] show the dynamic behavior of households, as well as to control that the flux of news of the previous day affects the trading for unobservable time-invariant characteristics [177]. Also, activity of companies, households, and foreign investors and scanner data provide information over long periods of time the dynamics of volatility. than only one day or a couple of weeks. This happens because News can also help capturing the environmental dimen- the final data used are produced from customers that purchase sion of well-being (E4). As an example, Kleinschmit et al. several items on each store visit, for several store visits, over [173] investigate 394 articles on forest and climate change a period of time [178,179]. It is also worth mentioning that published in the Swedish newspaper Dagens Nyheter from scanner data can contribute to the improvement of the quality 1992 to 2009. They show that there has been an increas- of the Harmonized Index of Consumer Prices (HICP) [180]. ing discussion on forests in a changing climate over the last However, using scanner data is challenging since researchers 18 years from both scientists and politicians. The increased are dependent on the retailer’s permission, and they should number of these news events correlate with real environ- also overcome the legal constraints in order to obtain them mental events happening internationally. Similarly, Boykoff [179]. Table 7 shows an example of supermarket records. [174] uses data extracted from the Vanderbilt University Tele- Scanner data are used to measure health (F1), socioeco- vision News Archive, consisting of television news from nomic development (F3), and environment (F4) dimensions of objective well-being. 123 International Journal of Data Science and Analytics Table 7 Subset of the main fields provided by a supermarket database for the purchases in different shops Id Customer Timestamp Place Receipt Items 2018020156287 109745368 2018-02-01 17:30:14 Pisa, Italy 2018020101567 [bread, milk, eggs, tissues] 2018020578256 104827423 2018-02-05 10:14:57 Torino, Italy 2018020500234 … 2018020743624 012753862 2018-02-07 19:57:00 Florence, Italy 2018020721987 … To begin with, researchers use scanner data to monitor other hand, low-ranked, low purchase volume customers tend several aspects of public health (F1). For example, phar- to buy only high-ranked products, very popular products that maceutical sales may be used to predict changes in clinical everyone buys. In addition, Sobolevsky et al. [189] use a conditions with a useful time lead. Magruder et al. [181] find a complete set of bank card transactions in 2011 in Spain and 90% correlation between flu-related drug sales and physician demonstrate that there is a clear correlation between individ- diagnoses of acute respiratory conditions, at several subre- ual spending behavior and official socioeconomic indexes gions of the National Capital Area. They show that these sales denoting the quality of life. occur approximately three days before the physician-patient Finally, researchers use scanner data to monitor the impact encounter. Scanner data are also used to study the nutrients of humans on the environment (F4). Panzone et al. [190] use and saturated fat of several food categories and their implica- scanner data from the largest UK food retailer for the creation tions on personal health. For example, Griffith et al. [177] use of an Environmentally Sensitive Shopper (ESS) index mea- supermarket scanner data from the UK to study the nutrients suring the environmental sustainability of food consumption in foods. They show that there is a lot of variation in nutri- at household level. In addition, Gadema et al. [191] use data ents at individual product level, even with food categories from UK supermarket shoppers to examine whether carbon such as butter, which are very narrow. Bonnet et al. [182] use footprinting and labeling food products are tools that could data from French supermarkets to explore consumer behav- facilitate consumers to make greener purchasing decisions. ior with respect to the consumption of saturated fat, while They claim that this could be a sensible way to potentially Griffith et al.[183] model the potential impact of a tax on sat- achieve a low carbon future. Food waste is a significant urated fats. Finally, Janssen et al. [184] use scanner data from problem in modern society and carries considerable social, the Nielsen Consumer Panel data set that covers the years economic, and environmental costs. For example, Brancoli from 2004 to 2017. They aim to identify households with a et al. [192] use scanner data to analyze the impacts of food pregnant household member and also to estimate the effect waste at a supermarket in Sweden. They discover the impor- during and after pregnancy on alcohol purchases and rela- tance of not only measuring food waste in terms of mass but tive expenditure on fruit and vegetables. Results show that also in terms of environmental impacts and economic costs. during and after pregnancy, households reduce their alcohol They also show that meat and bread waste contribute the purchases by 22–27%. In contrast, the relative expenditure most to the environmental footprint of the supermarket. Last, on fruit and vegetables does not increase during pregnancy Scholz et al. [193] analyze food waste data of six Swedish but decreases post-pregnancy by 19%. supermarkets from 2010 to 2012 in terms of mass and car- The majority of studies with scanner data focus on explor- bon footprint. They calculate the wastage carbon footprint ing the socioeconomic development (F3). Van der et al. [186] for fresh products such as meat, deli, cheese, dairy, and fruits introduce a new method for computing the Dutch Consumer and vegetables. Price Index (CPI) based on supermarket scanner data. In the meanwhile, in 2017, Eurostat issued a practical guide for pro- cessing supermarket scanner data to calculate the CPIs of EU 2.2.7 Web search queries countries in order to ensure the comparability of the values across Europe, as well as to modernize the official statistics Web search queries data report the frequency of specific terms [179]. Silver et al. [187] outline the potential use of scanner over time, entered into a web search engine from users to data from retailers for the measurement of inflation. They use satisfy their information needs. Data are represented as time monthly scanner data for television sets in 1998 in the UK series of the frequency, and therefore we do not provide an to study the two primary forms of bias in CPIs. Moreover, example of search queries records in this paper. Pennacchioli et al. [188] study the retail activity of the cus- Comparing to other data sources that require customized tomer subset of an Italian supermarket chain. They discover and often complicated collection strategies, search data can that highly ranked customers, with more sophisticated needs, be collected for many domains simultaneously. They can tend to buy niche products, i.e., low-ranked products. On the also be easily analyzed across several countries or regions in real-time. Search data are often helpful in making fore- 123 International Journal of Data Science and Analytics casts. However, their utility for predicting real-world events Searches for “major depression” and “divorce”, for exam- is based on convenience, speed, and flexibility and has less ple, account for at most, 30.2% of the variance in suicide to do with their superiority over other data sources. Goel et data. McCarthy [209] uses annually-averaged Google search al. [194] provide a useful survey in this area and describe activity for “suicide” from the same period, from 2004 to some of the limitations of this data source. First, for different 2009 to study suicide rate data in the United States. The study domains, the size of the relevant population varies consid- shows that searches for most medical, familial, and socioe- erably, along with difficulty in identifying relevant queries. conomic terms precede suicide deaths, and most searches for Additionally, in specific domains, searching may be more psychiatric-related terms coincide with suicide data. In a later closely tied to the measured outcomes than in others. work, Kristoufek et al. [210], use Google data from 2004 to Web search queries data are used to measure health (G1), 2013 in combination with suicide occurrences data to esti- job opportunities (G2), socioeconomic development (G3), mate the number of suicide occurrences in England. Finally, safety (G5), and politics (G6) dimensions of objective well- Adler et al. [211] combine official statistics on demographic being. information with data generated through search queries from Public health is a dimension of well-being that is explored Bing, between November 2016 and February 2017, to gain through web search queries (G1). In order to improve early insight into suicide rates per state in India. In this way, their detection, researchers monitor health-seeking behavior in the search data work as a proxy for unmeasured (hidden) factors form of web search queries, which are submitted by millions corresponding to suicide rates. of users around the world every day. For example, Cooper The first to explore the job opportunities dimension (G2), et al. [195] study Yahoo! search activity related to cancer are Ettredge et al. [212] as they find that counts of the top 300 in the USA. They find out that the Yahoo! search activity search terms during from 2001 to 2003 are correlated with US associated with cancer correlates with the estimated can- Bureau of Labor Statistics unemployment figures. Later on, cer incidence and estimated cancer mortality. Polgreen et Askitas et al. [213], D’Amuri et al. [214], Suhoy et al. [215] al. [196] show that search volume for handpicked influenza- confirm the value of search data in forecasting unemployment related queries is correlated with the reported number of cases in the US, Germany, and Israel. Baker et al. [216] use Google over the period from 2004 to 2008. Hulth et al. [197] find search data to examine how job search responds to extensions similar results in a study of search queries submitted on a of unemployment payments. Finally, McLaren et al. [217] Swedish medical Web site. Yuan et al.[198] monitor influenza summarise how online search data can be used for economic epidemics in China with search queries from Baidu. Addi- nowcasting by central banks. They show that the volume tionally, an automated procedure for identifying informative of online searches can be used as indicators of economic queries is described by Ginsberg et al. [199]. Based on that, activity, more specifically for unemployment and housing Google Flu Trends [200] was introduced by Google in 2008 markets in the United Kingdom. to provide real-time estimates of flu incidence for more than Researchers use search queries to monitor socioeco- 25 countries and to help predict outbreaks of flu. Nsoesie et nomic development (G3) as well. Choi and Varian [218,219] al. [201] present a framework for near real-time forecast of consider Google Trends as a source of data on real-time eco- influenza epidemics using web-based estimates of influenza nomic activity, and they show that by using its query indices activity from Google Flu Trends for 2004–2005, 2007–2008, accurate predictions can, for example, be made for retail, and 2012–2013 flu seasons. Yang et al. [202] use Google Flu automotive, etc., and could be helpful for short-term eco- Trends and historical data to infer the evolving epidemio- nomic prediction or nowcasting. Koop and Onorante [220] logical features of influenza and its impacts among the large use Dynamic Model Selection (DMS) methods, which allow population during 2003–2013, including the 2009 pandemic. for model switching between time-varying parameter regres- Wilson et al. [203] use data from Google Flu Trends to study sion models. They extend the DMS methodology by allowing the spread of the pandemic H1N1 influenza in New Zealand Google variables to determine the nowcasting model to be during 2009. Furthermore, Chan and Althouse [204,205] used at each point in time. Guzman [221] examines Google use Google queries to monitor Dengue epidemics, Dukic et data as a predictor of inflation. Additionally, Preis et al. al. [206] to predict hospitalizations for methicillin-resistant [222] provide evidence that search engine query data and Staphylococcus aureus infections and Ocampo et al. [207] US stock market fluctuations are correlated. In a later [223] for malaria surveillance. Moreover, Yang et al. [208] eval- work, they analyze changes in Google query volumes for uate the association between suicide and Google searches search terms related to finance, and they find patterns that trends for 37 suicide-related terms representing major known may be interpreted as “early warning signs” of stock market risks of suicide in Taipei City, Taiwan, from 2004 to 2009. moves. Furthermore, Curme et al. [224] present a method that Their results show that a set of suicide-related search terms, allows identifying topics for which levels of online interest the trends of which either temporally coincided or preceded change before large movements of the Standard & Poor’s 500 trends of suicide data, are associated with suicide death. index (S&P 500). They find that search volumes from Google 123 International Journal of Data Science and Analytics Table 8 Example of the User Age Gender Date Highest temperature Symptoms information provided by users of influenzanet 784590 35 M 2017-12-03 38.0◦ [cough, sore throat] 275173 28 F 2018-01-05 36.6◦ [no symptoms] 428415 64 M 2018-04-13 38.2◦ [tired, runny nose] related to politics and business can be linked to subsequent establish the validity for this data for a critical topic in state stock market moves. This demonstration of a connection politics research. between stock market transaction volume and search volume is also replicated using Yahoo! data, where Bordino et al. [225] show that query volumes precede in many cases peaks 2.2.8 Crowdsourced data of trading by one day or more. Finally, Moat et al. [226] show that data on views of Wikipedia pages can also be related to Kleemann and Rieder [231], in 2008, have defined crowd- market movements, providing evidence that increases in the sourcing as the “the intentional mobilization for commercial number of views of financially related pages on Wikipedia exploitation of creative ideas and other forms of work can be detected before stock market falls. performed by consumers”. In other words, crowdsourcing Search data are also used for the exploration of safety involves obtaining work, information, or opinions from a (G5). Qi et al. [227] show that a simple low-level indicator large group of people who submit their data via the Inter- of civil unrest can be obtained from online data at an aggre- net, smartphone apps, etc. Naturally, crowdsourcing brings gate level through Google Trends or similar tools. The study several advantages. Crowdsourcing can provide researchers covers countries across Latin America from 2011 to 2014 in with a huge amount of data, which can be accessed quickly which diverse civil unrest events took place. In each case, and at a relatively low cost. Besides, comparing to traditional they find that the combination of the volume and momen- research (such as studies using traditional surveys), the use tum of searches from Google Trends surrounding pairs of of crowdsourcing can provide researchers with data from simple keywords, tailored for the specific cultural setting, samples that are more diverse [232]. However, crowdsourc- provide useful indicators of periods of civil unrest. Qi et al. ing yields various challenges, as well. Firstly, crowdsourcing [228] study online search activity from Google Trends sur- may bring relatively low-quality results, e.g., a participant of rounding the topics of social unrest over several countries in a crowdsourced study may intentionally give wrong answers. Latin America from 2011 to 2014. They find that the vol- Secondly, mobile platforms pose new challenges for crowd- ume and momentum of searches surrounding mass protest sourced data management. Table 8 shows an example of language, can detect—and may even pre-empt—the macro- crowdsourced data. scopic on-street activity. They also find that the most crucial Crowdsourced data are used to capture all dimensions of search keywords differ subtlety from country to country, even objective well-being, i.e., health (H1), job opportunities (H2), though the language may be the same. They explain this by socioeconomic development (H3), environment (H4), safety the fact that civil unrest is a time-varying coordinated inter- (H5) and politics (H6) dimensions of well-being. action between individuals, groups, or populations within a To improve early detection, researchers started monitoring given cultural and socioeconomic setting. the health of individuals (H1) through crowdsourced self- Finally, the politics dimension is explored with search data reporting mobile apps, such as Influenzanet (Europe) [233], (G6). Chykina et al. [229] study how Google Trends can be Flutracking (Australia) [234], and Flu Near You (United used to examine issue salience for hard-to-survey mass popu- States) [235]. Hashemian et al. [236] introduce iEpi, an lations in the US, from 2010 to 2017. They apply this method end-to-end system for epidemiologists and public health to immigrant concerns over deportation. They show that anx- workers to collect, visualize, and analyze contextual micro- ieties over removal increase in response to (potential) policy data through smartphones. Additionally, Madan et al. [237] changes, such as immigration policies that are considered use data from a smartphone application provided to univer- in the wake of Donald Trump’s election. Reilly et al. [230] sity students to study their health state. Participants fill out use Google search activity for ballot measures’ names and self-report surveys related to their health habits, diet, exer- topics in a state one week before the 2008 Presidential elec- cise, weight changes, daily symptoms related to common tion, and they find that they correlate with actual participation colds, fever, influenza, and mental health. The researchers on those ballot measures. Their result demonstrates that the find that phone-based features can be used to predict changes more Internet searches there are for a ballot measure, the less in health, such as common colds, influenza, and stress. For likely voters are to roll-off (not answering the question) and longer-term health outcomes such as obesity, they find that weight changes of participants are correlated with exposure 123 International Journal of Data Science and Analytics to peers who gain weight in the same period. Finally, Mar- Crowdsourcing is also used to capture the environmen- tinucci et al. [238] study Gastroesophageal Reflux Disease tal dimension of well-being (H4). There are plenty of (GERD) symptoms among Italian university students from examples of crowdsourcing platforms for emergency man- a data set collected from a web-app. The app allows users a agement, such as Ushahidi [246], where volunteers provide self–diagnosis for the gastrointestinal disturbances through updated environmental information in the aftermath of mass a simple questionnaire and data about the students’ food emergencies. These platforms are shown to contribute sig- consumption at the university canteen. They show that 792 nificantly to organizing a prompt emergency response [247]. students reported typical GERD symptoms to occur at least Another category of crowdsourced platforms is the so-called weekly. Among all users, females, smokers, and high in BMI citizens’ observatories [248], a community-based network students tend to show increased GERD values. of environmental monitoring and information systems. On Researchers use crowdsourced data to explore the job these platforms volunteers monitor and provide data about opportunities dimension (H2) and the direct socioeconomic a plethora of environmental dimensions, such as comprising benefits associated with it. For example, Green et al. [239] water availability and water quality, air pollution, land use, use the crowdsourced employer review website named Glass- and flood risk management [249]. As an example, Schneider door, an online crowdsourced employer branding platform, to et al. [250] combine crowdsourced data from the EU-funded explore employees’ satisfaction and work–life balance. This CITI-SENSE project, which measures the air-quality with exploration is preliminary for the direct economic benefit data obtained from statistical or deterministic air quality and most important finding of the study; companies expe- models. Their goal is to present a novel data fusion-based riencing improvements in employer ratings are significantly technique for combining real-time crowdsourced observa- associated with future stock returns, comparing to compa- tions with model output that maps the urban air quality in nies with declines in employer rating. Similarly, Dabirian et detail. This could help users find the least polluted routes al. [240] analyze reviews of the highest and lowest-ranked or control their exposure to pollution while moving around employers on Glassdoor. Using IBM Watson to analyze the the city. Besides, Meier et al. [251] use crowdsourced atmo- data, they show how employers could use crowdsourced spheric data from Netatmo weather stations in the city of employer branding intelligence to turn into a workplace that Berlin, as well as available metadata to explore the urban attracts highly qualified employees. Furthermore, Könsgen et atmosphere. Results show a distinctive urban heat island al. [241] analyze employee reviews data, listed on the Ger- pattern in Berlin during the night and are also validated, con- man employee review site named Kununu.de, combined with firming that crowdsourced atmospheric data can contribute 2×2×2 between-subjects experimental design. Results show to advancement in climate research. Similarly, Chapman et that such studies can complement the research on the online al. [252] use Netatmo weather station crowdsourced data to reputation by underlying the relevance of discrepant reviews quantify the urban heat island in the city of London over the for job candidates’ application intentions. summer of 2015. Their results are similar to previous studies Crowdsourced data are also used to estimate the socioe- with official data and are therefore validated. conomic (H3) well-being. For example, Tingzon et al. Crowdsourced data are considered an important data [242] show the feasibility to map poverty by combing source for studying safety (H5). Suzanne Goodney et al. crowdsourced geospatial information with nighttime lights, [253] map violence against women with the use of a crowd- daytime satellite imagery, and human settlement data. In par- sourced app named as Safecity.in, which includes anonymous ticular, they use the popular geospatial data crowd-sourcing reporting of violence against women. The goal of the study platform named OpenStreetMap [243] to map poverty in is to highlight the importance of crowd mapping violence, as the Philippines. Similarly, Piaggesi et al. [244] use Open- it can make women aware of potentially dangerous locales, StreetMap [243] crowdsourced data merged with official encourage violence reporting, and provide advice on practi- data at a city scale. They demonstrate the possibility of cal solutions for navigating street harassment and assault in estimating the socioeconomic conditions of different neigh- public buses. Furthermore, Gosselt et al. [254] use the Inter- borhoods of five different cities in North and South America. net Movie Database (www.imdb.com) to study the violent In order to increase the efficiency of direct money transfers behavior and victimization of male and female film charac- to impoverished villages in Kenya and Uganda, Abelson et ters over time in the United States. In particular, using IMDb al. [245] develop and deploy a crowdsourcing interface to synopsis texts, they analyze reviewers’ movie descriptions. obtain labeled satellite imagery training data. They train and They demonstrate that both perpetrators and victims are deploy a predictive model for detecting impoverished vil- mainly male, as well as that violence becomes less severe and lages. Their estimations are leveraged to build a fine-scale more often non-deadly over the years. Researchers under- heat map of poverty that is used to recommend donations to line the future potentiality of using such data sources to the most impoverished villages. explore matching results with actual crime figures. Addi- tionally, Ozkan et al. [255] use crowdsourced police-involved 123 International Journal of Data Science and Analytics killings data from FatalEncounters.org, as well as media data, larly, psychologist Diener [261] defines happiness as people’s to control whether police killings is counted and reported cor- affective and cognitive evaluations of life. Veenhoven [262] rectly in the aforementioned unofficial data, as compared to shows that people use two sources of information to evaluate official data in the city of Dallas. Results mostly show con- their appreciation with life-as-a-whole: affects and thoughts. sistency between all data sources. In conjunction with social The first source of information captures people’s feelings, media and crowdsourcing data sources, as well as environ- emotions, and moods, the so-called hedonic level of affect mental and safety dimensions, Avvenuti et al. [256] collect (or simply called emotional component). In particular, he targeted and detailed information from people involved in underlines that to avoid neglecting crucial information about natural disasters through crowdsourcing surveys via social precedent and subsequent events, researchers should separate media. These data are used to monitor unfolding disasters bet- between positive and negative affects. On the other hand, the ter and to monitor their consequences (i.e., damage caused) second source of information is the contentment component Last, crowdsourced data are also used to study the pol- (or simply called structural component), concerning people’s itics dimension (H6) of objective well-being. For instance, thoughts and capturing whether people’s life expectations crowdsourced data have been used within NGOs to set strate- have been fulfilled, according to their cultural or societal gic priorities and involvement in the referendum activities standards, and lead them to evaluate their life satisfaction. based on participants’ responses to a survey [257]. Yasseri These two components, the hedonic level of affect and the and Bright [258] use Wikipedia traffic data for electoral contentment component, determine the overall happiness. prediction. In particular, they get insights about changes in This concept of happiness, compared to the traditional overall turnout at elections and changes in vote share for macroeconomic measurements, such as GDP, inflation and certain parties. Furthermore, Gellers [259] explores whether national income (see, e.g., Alesina et al. [263]) can capture crowdsourcing can overcome the democratic deficit in global the variations of people’s perceived well-being [11,12]. It is environmental governance. He uses data from the United also worth mentioning the controversy surrounding the rela- Nations MY World survey, a multi-year (2012–2015) global tionship between national income and national happiness, poll designed to identify post-2015 development priorities, identified by Easterlin [30]. According to the Easterlin para- as well as e-discussions data, organized by the UNDG and dox, temporary changes in income both within and between the thematic consultation on environmental sustainability nations directly affect happiness, but over time happiness ran from November 2012 to July 2013. Results suggest that does not trend upward as income continues to grow. although crowdsourcing may present an attractive technolog- Considering its subjective nature, researchers frequently ical approach to enhance participation in global governance, measure happiness by self-report rating scales. Nevertheless, ultimately, the representativeness of this participation and the the most widely used are global reports, using the single- legitimacy of the policy results depend on the way the contri- item scale, such as the Positive And Negative Affect Scale butions are sought and filtered by international organizations. (PANAS) [264,265]. Self-report measures are reliable since they provide accuracy and temporal stability, they are valid for community surveys and cross-cultural comparisons, and 3 Measuring subjective well-being they can capture happiness as life-as-a-whole, as well as domain satisfactions [266–269]. Examples of self-reported “Subjective well-being”, the scientific term of happiness, is a surveys are the Gallup World Poll (e.g., study by Deaton central value in people’s lives, and reflections for its definition [270]) and the World Values survey (e.g., study by Easter- have arisen ever since antiquity. Aristotle has expressed his lin et al. [271]), which capture the worldwide happiness; the interest on the topic claiming that human well-being, labeled Gallup-Healthways Well-being index (e.g., study by Kahne- as eudaimonia (εvδαιµoνία: Eu=Good, Daimon=spirit), is man and Deaton [272]), the British Household Panel Survey an activity of the soul expressing complete virtue [260]. Dur- (e.g., study by Frijters et al. [273]) and the Eurobarometer ing the last decades, researchers have focused on identifying (e.g., study by Stevenson et al. [274]), which capture the the critical dimensions and the relevant determinants that happiness at local level. Although self-report surveys are can positively or negatively affect human well-being, hence widely used for the measurement of happiness, some factors providing a perspective different from the philosophical def- might influence the results. For example, the type of ques- inition that Aristotle has been contemplating about. Since tions asked before the happiness questions, as well as the humans are conscious beings, they can subjectively evalu- individuals’ mood at the time of the well-being rating, might ate their appreciation of life, labeled “subjective well-being” disturb the results. Deaton and Stone [275] demonstrate a or happiness. In particular, happiness can be defined as sat- high item-order effect because of political questions coming isfaction with life in general, or as sociologist Veenhoven before happiness questions. Also, substantial current-mood (1984) suggests, as the degree to which an individual judges effects on happiness judgments are generated because of the overall quality of her life-as-a-whole favorably. Simi- weather conditions, since they affect people’s thoughts, feel- 123 International Journal of Data Science and Analytics Table 9 Pros and cons for each traditional data source and new data source used for the measurement of subjective well-being Data source Pros Cons Surveys - traditional data Accurate, temporal stability, valid for community Item-order effect bias, current-mood effects, source surveys and cross-cultural comparisons, valid for neglected temporal resolution capturing happiness as-a-whole and satisfaction domains Ecological Momentary Measurement of the affective component, reduced Disturbance of normal activities Assessment (EMA) - tradi- retrospective biases, measurement of moment-to- tional moment variation of emotions data source Day Reconstruction Method Measurement of the affective component, time- Neglected moment-to-moment variation of emo- (DRM)- traditional budget information, reduced respondent burden tions data source Social Media (Twitter, Continuously updated user-generated content, Social desirability biases, non-population repre- etc)-new data source elimination of social desirability effect, few barri- sentative ers in data extraction (Twitter) Google Trends-new Timeliness, observation of people’s behavior Interpretability of the value of the series, compa- data source rability of time series of different terms on a given day Crowdsourcing-new Measurement of daily behavior and activity Use of self-reports, paid participation of users data source News-new data source Variety of data (e.g., text data), variety of subject Gatekeeping bias, coverage bias, statement bias domains, range of targets, archived historical news ings, and behavior [276,277]. Finally, because global reports 3.1 The dimensions of subjective well-being are abstract general ratings of happiness over a long period, they neglect temporal resolution. Over the years, researchers studied subjective well-being and Diversely, researchers use Ecological Momentary Assess- have identified the dimensions and the relevant determinants ment (EMA) and Day Reconstruction Method (DRM) that that can positively or negatively affect human well-being. are momentary diary self-report measures of happiness. They Some studies rely on small data sets (e.g., review by Diener are designed to capture the affective components of happiness and Seligman[283]) reflecting the psychologists’ interest, and reduce recall biases and heuristics [269]. In particular, such as personality, and some others use larger data sets, EMA is a longitudinal research methodology that asks par- such as panel data (e.g., review by Dolan et al. [29]) reflect- ticipants to report their feelings, thoughts, and emotions at ing the economists’ interest. These studies, conducted with the moment or right after each of their activities, avoid- the use of traditional data sources, and in particular with ing retrospective biases and maximizing the accuracy of surveys, have shed more light on identifying in detail the the assessments [278]. Similarly, DRM asks participants to determinants of happiness, which we divide into five main reconstruct their daily life activities systemically and their dimensions explained below: experiences of the preceding days. It does not capture the moment-to-moment variation of emotions, as EMA does, but 3.1.1 Human genes it avoids disturbing normal activities, requires less respon- dent burden [279] and captures time-budget information Evidence shows that one of the most important predictors more efficiently [280]. Shiffman et al. [281] show that global of happiness is human genes, which is fairly heritable, with reports of happiness are more predictive of future behaviors 30% to 50% range, since there is a variation on the results than momentary methodologies. Therefore, taking into con- across studies [13–20]. Therefore, on average, about 40% of sideration the pros and cons discussed above, researchers the variance of individual differences in happiness scores suggest a multi-method assessment, combing both global is accounted for by genes. Personality, which falls under and momentary methods, to reach valid and accurate results our genetic makeup, can distinguish between happy and [269,282]. unhappy personalities. For example, extraverted individu- The first rows of Table 9 provide a summary of the tra- als are happier to anxious and worried ones [284]. People ditional data sources, as well as their pros and their cons, higher in self-esteem are less likely to suffer from depres- as discussed previously. The remaining rows are explained sion [29]. In addition, studies undertaken with data across later. different countries and periods of time, find influences of the 123 International Journal of Data Science and Analytics following results: age has a U-shaped effect on happiness, psychological health to be more strongly correlated to hap- with the highest level of happiness on the youngest and the piness, than physical health (e.g., review by Dolan et al. oldest age and the lowest level of happiness on the middle [29]). Climate is another determinant, which appears to have age, between 32 and 50 years [29]; women are either hap- effects on happiness. Rehdanz and Maddison’s [290] study pier than men, or there is no significant difference between gives a reasonable indication that extreme weather is dam- them in almost all 73 countries investigated [285]. However, aging to happiness. Moreover, living in an urban or rural these results should be carefully interpreted. For example, area seems to influence happiness. In particular, living in big Deaton and Tortora [286] show that the U-shaped relation- cities negatively affects happiness, whereas living in rural ship between happiness and age in West countries turns into areas positively affects it (e.g., Hudson and Kyklos [291] a linear relationship in sub-Saharan countries, where there is for Europe; Hayo [292] for Eastern Europe). On the con- unavailability of social services for older people. trary, Rehdanz and Maddison [290] show different results on urbanization and ruralization. They demonstrate that popu- 3.1.2 Universal needs lation density does not affect happiness. Another important determinant is exercising. Naturally, Ferrer-i-Carbonell and According to the evolutionary theory [22] and human’s inher- Gowdy [293] show that people who exercise tend to have ent growth tendencies [23], basic and psychological needs higher levels of happiness. play an important role on happiness and are considered to be universal. In fact, Tay et al. [24], in a research conducted across 123 countries, show that life evaluation is associated with having basic and psychological needs, such as food and shelter, met (r = 0.31)1 ; positive affects are associated with 3.1.4 Economic environment the fulfillment of social needs (r = 0.29)1 and the respect gained from other people (r = 0.36)1 ; negative affects are Income is one of the most discussed economic determinants associated with the fulfillment of basic needs (r = −0.17)1 , of happiness. Easterlin [30] argues that while happiness and respect gained from others (r = −0.20)1 and autonomy income show a positive relationship within nations, they needs (r = −0.18)1 in terms of the degree of freedom in show weak or no association between nations. He also shows life. Therefore, according to Veenhoven and Ehrhardt’s Liv- that, across countries, although a relationship between hap- ability theory [287], some societies have a better quality of piness and income holds in the short-run, this is not the case life because they highly satisfy the aforementioned universal over time. However, time-series and panel analyses across needs. It should be noted that each of these basic and psy- countries show that there is a positive relationship between chological needs is independent of one another, meaning that income and happiness also in the long-run [32–34]. Veen- each of them is influencing happiness beyond the effects of hoven [31] challenges Easterlin’s findings by arguing that others. people’s happiness highly depends on the satisfaction of basic and psychological needs covered by income, which is more 3.1.3 Social environment an absolute standard, than a relative standard. To the present day, studies support both arguments. Another source con- Many determinants fall under this dimension and can explain tributing to this debate is individual-level income or wealth changes in the reported level of happiness. To begin with, and happiness data studies. For example, a longitudinal study education is an important determinant, which needs to be with a sample of about 33,000 individuals shows that after carefully studied since there is controversial evidence of its two years, lottery winners rated higher happiness than non- effects on happiness. Some studies of happiness economics lottery winners [294]. These contradicting findings confirm suggest an insignificant relationship between higher educa- the complexity of the interpretation of the role of income and tion and happiness, whereas some others show a negative wealth on happiness, since the potential positive relationship relationship between them [25–28]. On the other hand, other between them may be moderated by other factors. For exam- studies show that educated individuals tend to report more ple, in the case of natural disasters, wealthier countries are positive emotions and less negative ones, as well as more sat- more economically capable of providing financial aid to the isfaction with most domains of their life, such as financial, people affected by the event [295]. Employment falls under employment opportunities, etc., even when controlling for the economic dimension category (see, e.g., [296,297]), like- non-economic factors, such as marriage [288,289]. Besides, wise income. Evidence shows that unemployed individuals studies show that health is an important determinant, with report lower happiness than employed individuals [298]. In particular, Knabe et al. [299] demonstrate that unemploy- 1Zero-Order Correlations of needs and subjective well-being for the ment has a substantial relationship with diminished cognitive world. well-being, but does not decrease affective well-being. 123 International Journal of Data Science and Analytics Fig. 2 The figure relates the sources of data (left) with the dimensions of the subjective well-being (right) 3.1.5 Political environment as discussed previously. We aim to highlight the advantages and disadvantages of using each data source as a useful guide As discussed previously, there are also political determinants for future research on happiness. associated with happiness. For example, Radcliff et al. [35] With the growth of technology, researchers are inclined examine the effect of direct democracy, and in particular, the to use more innovative approaches for the measurement of effect of the use of initiatives on happiness. They show that happiness. In fact, over the last years, researchers use novel an individual’s happiness is higher in states where not only methodologies and data sources, which offer new opportu- initiatives are permitted, but also policy-makers depend on nities to study happiness and to circumvent the limitations these initiatives to form the political system. Political free- carried from traditional methodologies and data sources. dom falls under this dimension, as well. Veenhoven [36] Fowler and Christakis study [303] is one of the first and shows that political freedom is highly correlated with hap- most important to help the transition of happiness research piness in developed countries. Another political determinant from the traditional to the innovative era. The researchers associated with happiness is social hierarchy in terms of the computerised information from archived handwritten admin- differences in power and prestige. Brule and Veenhoven [300] istrative tracking sheets from the Framingham Heart Study. show that in northern and southern European countries, peo- They study happiness as a network phenomenon, by using ple are less happy in hierarchical societies. Last, social trust data of 4739 people, from 1983 to 2003. Comparing to pre- [301] and government quality [302] are political determi- vious traditional work on happiness, which main focus is on nants that are substantially associated with happiness. socioeconomic, political, and genetic factors, this study is the first one to study happiness as a spreading phenomenon and its characteristics. In particular, they suggest that happiness 3.2 Data sources for monitoring the dimensions of is a network phenomenon, which clusters happy and unhappy subjective well-being people and spreads across various social relationships (e.g., relatives, friends) up to three degrees of separation (e.g., to Similarly to Fig. 1 on objective well-being, Fig. 2 describes one’s friends’ friends’ friends). Additionally, individuals that the new data sources (left) that have been used to estimate are central in the network are more likely to be happy in the one or more dimensions of subjective well-being (right). The future. presence of a link in Fig. 2 between a data source and a There are more than the study mentioned above in the dimension indicates that there are papers in the literature on innovative era, predominantly with the use of innovative big monitoring that dimension with that data source. For exam- data sources. Although measuring happiness with new data ple, b4 indicates the link between Google Trends data (b) and approaches appears to be adequate in predicting the emo- economic environment (4). tional component of happiness, most studies seem to neglect In this section, we describe, for each data source, its fea- the structural component of happiness [304]. Below new data tures (e.g., the process of data collection, its biases and sources are described, and relevant studies are provided. We limitations) and the main works in the literature that use would like to underline that in comparison to objective well- it to measure several dimensions of subjective well-being. being studies, researchers of subjective well-being usually Table 9 provides a summary of the new data sources used explore more than one dimension. to explore happiness, including the traditional data sources, 123 International Journal of Data Science and Analytics Table 10 The table contains a Id Hashtags Mentions Text Profile info subset of the information returned by a Twitter API 240556 #dinner #ny [10214] #dinner bihday your majesty @user #ny {…..} 4261063 #lyft [964215] @user thanks for #lyft credit {…..} 72096 null null factsguide society now {…..} Each tweet contains the information of the user profile and mentions or hashtags used in the text 3.2.1 Social media awareness of the influenced individuals. Indeed, by reduc- ing the amount of emotional content in the Facebook News Nowadays, people are highly involved in social media, and Feed on an experiment conducted on Facebook users, they they are motivated to share their emotions and thoughts demonstrate that emotional contagion can also happen with- online, leaving a large and continuously updated user- out direct interaction between the users and even without generated content. Studying happiness from users’ posts may non-verbal cues. eliminate the social desirability effect that traditional self- Social media is also used for the exploration of happi- reports bring, due to participants’ inaccurate and dishonest ness as influenced by the social environment dimension (a3). evaluation of happiness [305]. Thus, researchers and policy- For example, Lim et al. [311] collect a set of geotagged makers are attracted by these intellectual opportunities to tweets, of users in Melbourne, Australia, between the period explore happiness, with wider use of Twitter data accessed of November 2016 to January 2017. They use sentiment anal- through Twitter’s public API. Twitter has the least barriers ysis to demonstrate that people show more positive emotions in data extraction, while the other social media have strict and less negative emotions in green spaces or close to them. policies, and the acquisition of data has turned to be diffi- This could potentially be taken into consideration by policy- cult. Social media data may also encounter some concerns. makers aiming to improve the societal well-being by urban They may reflect social desirability biases since individu- greening interventions. Besides, Mitchell et al. [312] use als manage their online profiles [122]. Also, Twitter users Twitter data to study happiness and the 2010 United States may not be as representative of the general population [123] Census Bureau’s MAF/TIGER database to define the urban as anonymized self-reports conducted through a chosen rep- areas. They use the Language Assessment by Mechanical resentative sample. Table 10 illustrates an example of the Turk (labMT) sentiment analysis tool to study the similari- structure of Twitter records. ties in word use in urban areas in the United States, to map There are several studies on social media (mostly on Twit- areas according to the happiness level and score individual ter) showing the variations on happiness as influenced by the states and cities for average word happiness. Golder and universal needs (a2), and in particular, the interaction with Macy [313] identify individual-level diurnal and seasonal other people. For example, Quercia et al. [306] use Twit- mood rhythms in cultures across the globe, using data from ter data in order to monitor the gross community happiness Twitter between February 2008 and January 2010. They find in the city of London. In particular, they suggest that Twit- that people like the weekend as people are much happier on ter friends, on average, have similar sentiment. They also Saturdays and Sundays. They also find that even individuals’ show that the relationship between sentiment and well-being good mood deteriorates as the day progresses, which is con- can hold at individual and community level. Bollen et al. sistent with the effects of sleep and circadian rhythm. They [307] use the OpinionFinder (OF) subjectivity lexicon [308] also show that seasonal change in baseline positive affect in order to analyze the sentiment of an online social network varies with change in day length. Landsdall et al. [314] turn of 39,110 Twitter users. They show the first direct observation their attention to the issue of the public mood or sentiment— of a significant Happiness Paradox, meaning that on average the mood of the nation. They use tweets sampled from the 54 most of the individuals are less happy than their friends are. largest cities in the UK from July 2009 to January 2012, and Similarly, by using the OF, Bollen et al. [309] analyze the they associate each of the basic emotions (fear, joy, anger, emotional content of a set of Twitter users over 6 months, to sadness) with a list of words. They find out that each of the examine whether happiness is assortative in online social net- four key emotions changes over time in a manner that is partly works. They find significant levels of happiness assortativity predictable (or at least interpretable). Joy rises in Christmas, across Twitter, since users might be propense to connect to fear in Halloween, and especially negative mood started in users with similar happiness values (homophilic attachment) October 2010, where massive cuts were announced in the or converge on their friends’ happiness level (contagion). UK. Cresci et al. [315] use Instagram data to explore, among This result suggests that real social networks may work sim- others, the differences that the cultural and social environ- ilarly. With the use of Facebook, Kramer et al. [310] test ment bring on people’s smiles. They perform face recognition whether emotions are contagious between users without the in a case study of over 2 million selfies shared from January 123 International Journal of Data Science and Analytics to February 2015. In particular, they use a Face++ algorithm 3.2.2 Google trends function to measure the smiling degree of the individuals in their selfies. Results reveal that El Salvador, Brazil, and Another new data source is Google Trends, which provides Panama have the highest smiling average. data on the frequency of specific search terms over time. Other researchers use social media to study the variations Algan et al. [324] present Google Trends as a new data of happiness as influenced by more than one dimension. For source for exploring happiness and its relevant dimensions. example, Bollen et al. [316] conduct sentiment analysis on They consider it a promising data source for its timeli- Twitter data from 2008. They find that events in the social and ness, since it provides computational social scientists with cultural (a3), political (a5), and economic sphere (a4) have a immediate data, as well as offers the possibility to observe significant effect on happiness. Dodds et al. [317] construct people’s behavior, as compared to analyzing textual opinions. the Hedonometer to measure temporal patterns of societal On the other hand, working with Google Trends challenges happiness, as influenced by basic needs (a2), as well as by researchers since the value of the series obtained directly various social (a3), economic (a4) and political (a5) deter- from Google Trends is difficult to interpret, and this value on minants. For indicating happiness using Hedonometer, they a given day cannot be compared between terms since they create a data set of users’ tweets over 3 years (from September are normalized to the maximum value by term. In this study 2008 to September 2011 approximately). The results show [324], researchers cover 300 weeks from January 6, 2008, to that in general, at an annual level, the average happiness January 4, 2014. Results reveal that happiness is associated appears to increase till April 2009 and then to decrease grad- with job security, financial security (b4), family life (b2), and ually. On a weekly basis, the average happiness peaks during leisure determinants (b3). An example of Google Trends data the weekend and on an hourly basis, the happiest hour of the set is not provided since data are represented as time series day is between 5 to 6 a.m. (US local time). Another example of the frequency. is Iacus et al. [318], who analyze tweets from Italy, written in the Italian language. In particular, they use the iSA (inte- 3.2.3 Crowdsourced data grated Sentiment Analysis) method [319,320] to capture a set of determinants that influence happiness, such as self-esteem Crowdsourcing, as discussed in Sect. 2.2, involves obtaining (a1) and family relationships (a2), and aggregate them into work, information, or opinions from a large group of people an index labeled SWBI (Social Well Being Index). Results who submit their data via the Internet, smartphone apps, etc. suggest that the environmental and health conditions (a3) In particular, smartphones are lately appealing to happiness anticipate several determinants of happiness as measured by researchers since they give access to previously inaccessible SWBI. This study is one of the few to study both the emo- data related to daily social behavior [325,326]. Innovative tional and structural components of happiness. Curini et al. smartphone sensor technology, such as accelerometers, GPS, [321] use tweets posted in 2012 in Italy to build a happi- and Bluetooth, are used in combination with self-reports, ness index, labeled iHappy. They demonstrate that variables such as mood tracking self-reports, in the form of EMA. such as the overall quality of institutions (a5) seem to have a However, such methodologies bring the limitations of the minor effect on the average level of happiness of the Italian traditional data sources (see the first rows of Table 9), since provinces. In contrast, meteorological variables, such as rain happiness fluctuations are collected through self-reports. and snow (a3), as well as events related to specific days, such Moreover, when hiring individuals to participate in crowd- as the payday (a4), have a stronger impact on happiness. Fur- sourcing platforms, the crowd is not anymore for free, and the thermore, Durahim et al. [322] use Twitter data to create the study might result in high costs. It is, therefore, hard to keep a Gross National Happiness (GNH) for the country of Turkey. trade-off between initial objectives with results of quality and The GNH created measures people’s happiness as varied due cost [327]. Additionally, some studies are conducted with a to specific events, such as Saint Valentine’s Day (a2), Start- small number of data and might need to be replicated. Table ing day of Gezi Park Protests (#occupygezi), and Day of 11 shows an example of crowdsourced data. Ergenekon lawsuit verdict (a5). Last, Coviello et al. [323] For example, Lathia et al. [328] collect data of over 10,000 compare what people post on Facebook to data they have on individuals, by combining smartphone-based self-reports (in the weather (a3), specifically the rainfall amount. They find the form of EMA) and the accelerator in the smartphones, to that people tend to post less happy messages on Facebook if investigate the relationship between happiness and physical it rains. This emotion seems to pass along their network (a2). activity (c3). Results show that there is indeed a relation- For example, if a friend on Facebook is in a rainy area and ship between happiness and physical activities, including the this affects the emotional content of her posts on Facebook, non-exercise ones, such as standing and walking. Asai et al. then more likely, her friends might post a sadder message, [329] study 100,000 happy moments from HappyDB over even though where they are the weather is better. 3 months, to find which are the short and long term deter- minants of happiness. In particular, HappyDB is a database 123 International Journal of Data Science and Analytics Table 11 The table contains a Id Reflection period Text Num. sentences subset of the information returned by HappyDB, a 28775 24 h Donated blood. Painful 2 crowdsourced database capturing happy moments 32612 24 h Morning yoga class 1 42663 24 h Children with butterflies 1 created through Amazon Mechanical Turk, for capturing peo- being or happiness, as well as their relevant dimensions ple’s happy moments by asking every 24 h and once over 3 necessary for the conduction of a meaningful study. In addi- months, people’s happiness status, and analyzing with NLP tion, we present a review of the data sources used for the people’s responses. Results show that exercise, nature, and exploration of well-being, and we discuss existing related leisure (c3) are short-term determinants, whereas social rela- studies. More specifically, we present the structure and the tionships with loved ones (c2) and achievements (c3) are opportunities that each data source offers and the problems long-term determinants. Bogomolov et al. [330] exploit a that researchers might encounter when working with these data set of 117 individuals, who are equipped with a sens- data. ing software between 2010 and 2011. This software collects The paper is primarily targeted at researchers interested smartphone activity data of call logs, SMS and proximity in “Data Science for Social Good” (DS4SG) or similarly data (acquired by scanning nearby phones and other Blue- “Artificial Intelligence for Social Good” (AI4SG). Harnessed tooth devices every five minutes). It also collects personality correctly, artificial intelligence can inform and empower the traits (the “Big Five” [331]) and daily happiness data by social good decision-making [335,336]. DS4SG or AI4SG is self-report questionnaires. Results demonstrate that by using a vague concept, and there is not an adequate definition yet. mobile phone data reflecting social interactions (c2), infor- However, Shi et al. [39] propose several societal application mation concerning weather conditions (c3), and personality domains to shed light on this concept, such as healthcare traits (c1), individuals’ daily happiness can be predicted. and well-being. In this study, we specifically aim to con- tribute to the exploration of well-being through data science. Researchers from various disciplines, from social science to 3.2.4 News data computer science, could use this paper to understand data sci- ence for well-being better and make a positive and tangible Similarly to objective well-being, news data are a new social impact. promising data source for the further exploration of sub- We would like to underline that this is not a complete jective well-being. Its advantages and its disadvantages, as review of studies conducted on well-being with the use of well as a data set example, are discussed and presented in innovative data sources. We aim to provide some examples of Sect. 2.2. Carlquist et al. [332] study happiness with the the most important evidence on these data sources and well- use of news data. In particular, they study the concept of being dimensions so that this study works as a reference point well-being in Norwegian society by examining word use for future research. We do not fully cover existing research patterns in four electronically archived Norwegian newspa- on a given link that is present in Figs. 1 and 2, but to the pers media from 1992 to 2014. They demonstrate that about best of our knowledge, a missing link entails that there is no half of the words referring to affective approaches, cognitive existing study connecting the two nodes. For example, there or life satisfaction approaches, eudaimonic and humanistic is no adequate literature on news data for the exploration of approaches, and character strengths show systematic and sta- the safety dimension (E5) of objective well-being. Therefore, tistically significant patterns of change. The most notable rise since nowadays, safety is an important dimension, due to concerns the eudaimonic words (related to mastery, motiva- constant conflicts around the world (e.g., political instability, tion, and self-development), which show increasing trends terrorist attacks), it shows great potential for future research. in all newspapers. The authors state that certain happiness Moreover, new data sources seem to be particularly terms appearing more frequently could be interpreted as an promising for a more in-depth exploration of subjective well- increased and liberating focus on individual opportunity (d1) being. Taking into consideration the subjective nature of [333] or could demonstrate neoliberal ideology (d5) [334]. happiness, it has been traditionally measured through self- reports. Although they have been proved to be valid, they are very costly, and depending on the study might neglect to 4 Discussion capture either the emotional or the structural component of well-being. Therefore, new data sources could be used, and In this study, we provide researchers with the theoretical innovative methodologies, such as text analysis, could be background on both the objective and the subjective well- 123 International Journal of Data Science and Analytics applied for a complete, according to its definition, measure- 2. Fleurbaey, M.: Beyond gdp: the quest for a measure of social ment of subjective well-being. Still, most studies using new welfare. J. Econ. Lit. 47(4), 1029–75 (2009) 3. Stiglitz, J.E., Sen, A., Fitoussi, J.P.: Report by the Commission on data sources tap into the emotional component of subjective the Measurement of Economic Performance and Social Progress. well-being and neglect the structural component. Conse- The Commission Paris (2009) quently, we suggest further exploration of the novel data 4. Dodge, R., Daly, A.P., Huyton, J., Sanders, L.D.: The challenge sources for the measurement of subjective well-being, cap- of defining wellbeing. Int. J. Wellbeing 2(3), 11 (2012) 5. Alkire, S.: Dimensions of human development. World Dev. 30(2), turing both components. 181–205 (2002) Undoubtedly, the research opportunities opened up by the 6. Organisation for Economic Co-operation and Development innovative data sources discussed in this paper are plenty. How’s life? Measuring Well-Being. OECD, Paris (2011) However, with the use of these data sources, researchers are 7. UNDP Sustainable Development Goals. https:// sustainabledevelopment.un.org/sdgs. Accessed Oct 2019 called to deal with new challenges comparing to traditional (2015) research. Since, usually, the data used are personal, if not sen- 8. Rapporto, BES Il benessere equo e sostenibile in Italia. ISTAT sitive, and are analyzed to shape policy and to make decisions (2015) [337,338], ethical concerns may arise, such as privacy and 9. Organisation for Economic Co-operation and Development (OECD) OECD Guidelines on Measuring Subjective Well-Being. respect to human rights. In the European Union, additional OECD Publishing (2013) attention to the topic has been brought after the implemen- 10. Veenhoven, R.: Conditions of Happiness, Reidel. Springer, Dor- tation of the General Data Protection Regulation (GDPR). drecht (1984) Researchers need to take into consideration the ethical chal- 11. Frey, B.S., Stutzer, A.: What can economists learn from happiness research? J. Econ. Lit. 40(2), 402–435 (2002) lenges and not overlook them but address them successfully. 12. Stiglitz, J.E., Sen, A., Fitoussi, J.P.: Measurement of economic Only by facing ethical problems, researchers can maximize performance and social progress. Online document. http://www. the contributing value of data science studies for society. bitly/JTwmG Accessed 26 June 2012 (2009) 13. Bartels, M., Boomsma, D.I.: Born to be happy? The etiology of Acknowledgements This work was supported by the European Com- subjective well-being. Behav. Genet. 39(6), 605 (2009) mission through the Horizon2020 European project “SoBigData Resea- 14. Bartels, M., Saviouk, V., De Moor, M.H., Willemsen, G., van rch Infrastructure—Big Data and Social Mining Ecosystem” (Grant Beijsterveldt, T.C., Hottenga, J.J., De Geus, E.J., Boomsma, D.I.: Agreement 654024). We would like to thank Daniele Fadda for support Heritability and genome-wide linkage scan of subjective happi- on data visualization. ness. Twin Res. Hum. Genet. 13(2), 135–142 (2010) 15. Nes, R.B., Røysamb, E.: The heritability of subjective well-being: Author contributions VV: conceptualization, writing, tables and fig- review and meta-analysis. In: The Genetics of Psychological ures, LG: conceptualization and writing, IM: writing, tables and figures, Well-Being: The Role of Heritability and Genetics in Positive SC: writing, RS: writing, MT: writing, LP: conceptualization, writing Psychology, pp. 75–96 (2015) and managing. 16. Nes, R.B., Czajkowski, N., Tambs, K.: Family matters: happi- ness in nuclear families and twins. Behav. Genet. 40(5), 577–590 (2010) Compliance with ethical standards 17. Nes, R., Røysamb, E., Tambs, K., Harris, J., Reichborn- Kjennerud, T.: Subjective well-being: genetic and environmental contributions to stability and change. Psychol. Med. 36(7), 1033– Conflict of interest On behalf of all authors, the corresponding author 1042 (2006) states that there is no conflict of interests. 18. Røysamb, E., Harris, J.R., Magnus, P., Vittersø, J., Tambs, K.: Subjective well-being. Sex-specific effects of genetic and environ- Open Access This article is licensed under a Creative Commons mental factors. Personal. Individ. Differ. 32(2), 211–223 (2002) Attribution 4.0 International License, which permits use, sharing, adap- 19. Røysamb, E., Tambs, K., Reichborn-Kjennerud, T., Neale, M.C., tation, distribution and reproduction in any medium or format, as Harris, J.R.: Happiness and health: environmental and genetic long as you give appropriate credit to the original author(s) and the contributions to the relationship between subjective well-being, source, provide a link to the Creative Commons licence, and indi- perceived health, and somatic illness. J. Pers. Soc. Psychol. 85(6), cate if changes were made. The images or other third party material 1136 (2003) in this article are included in the article’s Creative Commons licence, 20. Schnittker, J.: Happiness and success: genes, families, and the psy- unless indicated otherwise in a credit line to the material. If material chological effects of socioeconomic position and social support. is not included in the article’s Creative Commons licence and your Am. J. Sociol. 114(S1), S233–S259 (2008) intended use is not permitted by statutory regulation or exceeds the 21. Pleeging, E., Burger, M., van Exel, J.: The relations between hope permitted use, you will need to obtain permission directly from the copy- and subjective well-being: a literature overview and empirical right holder. To view a copy of this licence, visit http://creativecomm analysis. Appl. Res. Qual. Life 1, 1–23 (2020) ons.org/licenses/by/4.0/. 22. Kenrick, D.T., Griskevicius, V., Neuberg, S.L., Schaller, M.: Ren- ovating the pyramid of needs: contemporary extensions built upon ancient foundations. Perspect. Psychol. Sci. 5(3), 292–314 (2010) 23. Ryan, R.M., Deci, E.L.: Self-determination theory and the facili- tation of intrinsic motivation, social development, and well-being. References Am. Psychol. 55(1), 68 (2000) 24. Tay, L., Diener, E.: Needs and subjective well-being around the 1. Reinhart, C.M., Reinhart, V.R.: After the fall. Technical report. world. J. Pers. Soc. Psychol. 101(2), 354 (2011) National Bureau of Economic Research (2010) 123 International Journal of Data Science and Analytics 25. Clark, A.E., Oswald, A.J.: Satisfaction and comparison income. 50. OECD.: OECD Better Life Index: Jobs. http://www. J. Public Econ. 61(3), 359–381 (1996) oecdbetterlifeindex.org/topics/jobs/. Accessed Oct 2019 (2011a) 26. Shields, M.A., Price, S.W., Wooden, M.: Life satisfaction and the 51. OECD.: OECD Better Life Index: Income. http://www. economic and social characteristics of neighbourhoods. J. Popul. oecdbetterlifeindex.org/topics/income/. Accessed Oct 2019 Econ. 22(2), 421–443 (2009) (2011b) 27. Powdthavee, N.: How much does money really matter? Estimating 52. OECD.: OECD Better Life Index: Environment. http://www. the causal effects of income on happiness. Empir. Econ. 39(1), oecdbetterlifeindex.org/topics/environment/. Accessed Oct 2019 77–92 (2010) (2011c) 28. Nikolaev, B.: Living with mom and dad and loving it... or are you? 53. OECD.: OECD Better Life Index: Safety. http://www. J. Econ. Psychol. 51, 199–209 (2015) oecdbetterlifeindex.org/topics/safety/. Accessed Oct 2019 29. Dolan, P., Peasgood, T., White, M.: Do we really know what makes (2011d) us happy? A review of the economic literature on the factors asso- 54. Amerio, P., Roccato, M.: Psychological reactions to crime in Italy: ciated with subjective well-being. J. Econ. Psychol. 29(1), 94–122 2002–2004. J. Commun. Psychol. 35(1), 91–102 (2007) (2008) 55. OECD.: OECD Better Life Index: Civic Engagement. http://www. 30. Easterlin, R.A.: Does economic growth improve the human lot? oecdbetterlifeindex.org/topics/civic-engagement/. Accessed Oct Some empirical evidence. In: Nations and Households in Eco- 2019 (2011) nomic Growth, pp 89–125. Elsevier (1974) 56. Blondel, V.D., Decuyper, A., Krings, G.: A survey of results on 31. Veenhoven, R.: Is happiness relative? Soc. Indic. Res. 24(1), 1–34 mobile phone datasets analysis. EPJ Data Sci. 4(1), 10 (2015) (1991) 57. Eagle, N., Pentland, A.S.: Eigenbehaviors: identifying structure 32. Diener, E., Tay, L., Oishi, S.: Rising income and the subjective in routine. Behav. Ecol. Sociobiol. 63(7), 1057–1066 (2009) well-being of nations. J. Pers. Soc. Psychol. 104(2), 267 (2013) 58. Pappalardo, L., Simini, F., Rinzivillo, S., Pedreschi, D., Giannotti, 33. Veenhoven, R., Vergunst, F.: The Easterlin illusion: economic F., Barabási, A.L.: Returners and explorers dichotomy in human growth does go with greater happiness. Int. J. Happiness Dev. mobility. Nat. Commun. 6, 8166 (2015) 1(4), 311–343 (2014) 59. Pappalardo, L., Rinzivillo, S., Simini, F.: Human mobility mod- 34. Sacks, D.W., Stevenson, B., Wolfers, J.: The new stylized facts elling: exploration and preferential return meet the gravity model. about income and subjective well-being. Emotion 12(6), 1181 Proc. Comput. Sci. 83, 934–939 (2016). https://doi.org/10.1016/ (2012) j.procs.2016.04.188 35. Radcliff, B., Shufeldt, G.: Direct democracy and subjective well- 60. Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: A being: the initiative and life satisfaction in the American states. data mining approach to assess privacy risk in human mobility Soc. Indic. Res. 128(3), 1405–1423 (2016) data. ACM Trans. Intell. Syst. Technol. 9(3), 31:1–31:27 (2017). 36. Veenhoven, R.: Social conditions for human happiness: a review https://doi.org/10.1145/3106774 of research. Int. J. Psychol. 50(5), 379–391 (2015) 61. Pappalardo, L., Simini, F.: Data-driven generation of spatio- 37. Deaton, A.: The Analysis of Household Surveys: A Microecono- temporal routines in human mobility. Data Min. Knowl. Disc. metric Approach to Development Policy. The World Bank (1997) 32(3), 787–829 (2018) 38. European Project.: SoBigData. http://sobigdata.eu/index. 62. Giannotti, F., Pappalardo, L., Pedreschi, D., Wang, D.: A Com- Accessed Oct 2019 (2015) plexity Science Perspective on Human Mobility, pp. 297–314. 39. Shi, Z.R., Wang, C., Fang, F.: Artificial Intelligence for Social Cambridge University Press, Cambridge (2013). https://doi.org/ Good: A Survey. arXiv preprint arXiv:2001.01818 (2020) 10.1017/CBO9781139128926.016 40. Solomon, D.J.: Conducting web-based surveys. Pract. Assess. 63. Ranjan, G., Zang, H., Zhang, Z.L., Bolot, J.: Are call detail records Res. Eval. 7(19), 12 (2001) biased for sampling human mobility? ACM SIGMOBILE Mob. 41. Daas, P.J., Puts, M.J., Buelens, B., Van den Hurk, P.A.: Big data Comput. Commun. Rev. 16(3), 33–44 (2012) and official statistics. In: Proceedings of the NTTS, pp. 5–7. New 64. Iovan, C., Olteanu-Raimond, A.M., Couronné, T., Smoreda, Z,: Techniques and Technologies for Statistics (2013) Moving and calling: mobile phone data quality measurements and 42. Struijs, P., Daas, P.: Quality approaches to big data in official spatiotemporal uncertainty in human mobility studies. In: Geo- statistics. In: European Conference on Quality in Official Statistics graphic Information Science at the Heart of Europe, pp. 247–265. (2014) Springer (2013) 43. Jahani, E., Sundsøy, P., Bjelland, J., Bengtsson, L., de Montjoye, 65. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understand- Y.A., et al.: Improving official statistics in emerging markets using ing individual human mobility patterns. Nature 453(7196), 779 machine learning and mobile phone data. EPJ Data Sci. 6(1), 3 (2008) (2017) 66. Barabasi, A.L.: The origin of bursts and heavy tails in human 44. Blumenstock, J.E.: Fighting poverty with data. Science dynamics. Nature 435(7039), 207 (2005) 353(6301), 753–754 (2016) 67. Oliver, N., Matic, A., Frias-Martinez, E.: Mobile network data for 45. United Nations.: A world that counts: mobilizing the data revolu- public health: opportunities and challenges. Front. Public Health tion for sustainable development. Technical report (2014) 3, 189 (2015) 46. Sustainable Development Solutions Network: Indicators and a 68. Finger, F., Genolet, T., Mari, L., de Magny, G.C., Manga, N.M., Monitoring Framework for the Sustainable Development Goals. Rinaldo, A., Bertuzzo, E.: Mobile phone data highlights the role Launching a Data Revolution for the SDGs, United Nations, New of mass gatherings in the spreading of cholera outbreaks. Proc. York (2015) Nat. Acad. Sci. 113(23), 6421–6426 (2016) 47. WHO, World Health Organization: Geneva Macroeconomics and 69. Kafsi, M., Kazemi, E., Maystre, L., Yartseva, L., Grossglauser, M., health: investing in health for economic development-report of Thiran, P.: Mitigating epidemics through mobile micro-measures. the commission on macroeconomics and health. Commission on arXiv preprint arXiv:1307.2084 (2013) Macroeconomics and Health (2001) 70. Lima, A., De Domenico, M., Pejovic, V., Musolesi, M.: Disease 48. European Commission: The Lisbon strategy for growth and jobs containment strategies based on mobility and information dissem- (2000) ination. Sci. Rep. 5, 10650 (2015) 49. OECD.: OECD Better Life Index: Health. http://www. 71. Madan, A., Cebrian, M., Lazer, D., Pentland, A.: Social sensing oecdbetterlifeindex.org/topics/health/. Accessed Oct 2019 (2011) for epidemiological behavior change. In: Proceedings of the 12th 123 International Journal of Data Science and Analytics ACM International Conference on Ubiquitous Computing, pp. data to improve air pollution exposure assessments. J. Expos. Sci. 291–300. ACM (2010) Environ. Epidemiol. 29(2), 278 (2019) 72. Pappalardo, L., Pedreschi, D., Smoreda, Z., Giannotti, F.: Using 90. Lu, X., Wrathall, D.J., Sundsøy, P.R., Nadiruzzaman, M., Wetter, big data to study the link between human mobility and socio- E., Iqbal, A., Qureshi, T., Tatem, A.J., Canright, G.S., Engø- economic development. In: 2015 IEEE International Conference Monsen, K., et al.: Detecting climate adaptation with mobile on Big Data (Big Data), pp. 871–78 (2015) https://doi.org/10. network data in bangladesh: anomalies in communication, mobil- 1109/BigData.2015.7363835 ity and consumption patterns during cyclone mahasen. Clim. 73. Toole, J.L., Lin, Y.R., Muehlegger, E., Shoag, D., González, M.C., Change 138(3–4), 505–519 (2016) Lazer, D.: Tracking employment shocks using mobile phone data. 91. Lu, X., Bengtsson, L., Holme, P.: Predictability of population J. R. Soc. Interface 12(107), 20150185 (2015) displacement after the 2010 haiti earthquake. Proc. Nat. Acad. 74. Sundsøy, P., Bjelland, J., Reme, B.A., Jahani, E., Wetter, E., Sci. 109(29), 11576–11581 (2012) Bengtsson, L.: Towards real-time prediction of unemployment 92. Bengtsson, L., Lu, X., Thorson, A., Garfield, R., Von Schreeb, and profession. In: International Conference on Social Informat- J.: Improved response to disasters and outbreaks by tracking ics, pp. 14–23. Springer (2017) population movements with mobile phone network data: a post- 75. Eagle, N., Macy, M., Claxton, R.: Network diversity and economic earthquake geospatial study in haiti. PLoS Med. 8(8), e1001083 development. Science 328(5981), 1029–1031 (2010) (2011) 76. Steele, J.E., Sundsøy, P.R., Pezzulo, C., Alegana, V.A., Bird, T.J., 93. Wilson, R., Zu Erbach-Schoenberg, E., Albert, M., Power, D., Blumenstock, J., Bjelland, J., Engø-Monsen, K., de Montjoye, Tudge, S., Gonzalez, M., Guthrie, S., Chamberlain, H., Brooks, Y.A., Iqbal, A.M., et al.: Mapping poverty using mobile phone C., Hughes, C., et al.: Rapid and near real-time assessments of and satellite data. J. R. Soc. Interface 14(127), 20160690 (2017) population displacement using mobile phone data following dis- 77. Mao, H., Shuai, X., Ahn, Y.Y., Bollen, J.: Quantifying socio- asters: the 2015 Nepal earthquake. PLoS Curr. 8, 1 (2016) economic indicators in developing countries from mobile phone 94. Nyarku, M., Mazaheri, M., Jayaratne, R., Dunbabin, M., Rahman, communication data: applications to côte d’ivoire. EPJ Data Sci. M.M., Uhde, E., Morawska, L.: Mobile phones as monitors of 4(1), 15 (2015) personal exposure to air pollution: Is this the future? PLoS ONE 78. Gutierrez, T., Krings, G., Blondel, V.D.: Evaluating socio- 13(2), e0193150 (2018) economic state of a country analyzing airtime credit and mobile 95. Liu, H.Y., Skjetne, E., Kobernus, M.: Mobile phone tracking: in phone datasets. arXiv preprint arXiv:1309.4496 (2013) support of modelling traffic-related air pollution contribution to 79. Blumenstock, J.: Calling for better measurement: estimating an individual exposure and its implications for public health impact individual’s wealth and well-being. ACM KDD (Data Mining for assessment. Environ. Health 12(1), 93 (2013) Social Good) (2014) 96. Decuyper, A., Rutherford, A., Wadhwa, A., Bauer, J.M., Krings, 80. Blumenstock, J., Cadamuro, G., On, R.: Predicting poverty and G., Gutierrez, T., Blondel, V.D., Luengo-Oroz, M.A.: Estimating wealth from mobile phone metadata. Science 350(6264), 1073– food consumption and poverty indices with mobile phone data. 1076 (2015) arXiv preprint arXiv:1412.2595 (2014) 81. Frias-Martinez, V., Virseda, J.: On the relationship between socio- 97. Bogomolov, A., Lepri, B., Staiano, J., Oliver, N., Pianesi, F., economic factors and cell phone usage. In: Proceedings of the Pentland, A.: Once upon a crime: towards crime prediction Fifth International Conference on Information and Communica- from demographics and mobile data. In: Proceedings of the 16th tion Technologies and Development, pp. 76–84. ACM (2012) International Conference on Multimodal Interaction, pp. 27–434. 82. Soto, V., Frias-Martinez, V., Virseda, J., Frias-Martinez, E.: Pre- ACM (2014) diction of socioeconomic levels using cell phone records. In: 98. Ferrara, E., De Meo, P., Catanese, S., Fiumara, G.: Detecting crim- International Conference on User Modeling, Adaptation, and Per- inal organizations in mobile phone networks. Expert Syst. Appl. sonalization, pp. 377–388. Springer (2011) 41(13), 5733–5750 (2014) 83. Frias-Martinez, V., Soguero-Ruiz, C., Frias-Martinez, E., Josephi- 99. Elgethun, K., Fenske, R.A., Yost, M.G., Palcisko, G.J.: Time- dou, M.: Forecasting socioeconomic trends with cell phone location analysis for exposure assessment studies of children records. In: Proceedings of the 3rd ACM Symposium on Com- using a novel global positioning system instrument. Environ. puting for Development, p. 15. ACM (2013) Health Perspect. 111(1), 115–122 (2003) 84. Hernandez, M., Hong, L., Frias-Martinez, V., Frias-Martinez, 100. Dias, D., Tchepel, O.: Modelling of human exposure to air pollu- E.: Estimating poverty using cell phone data: evidence from tion in the urban environment: a GPS-based approach. Environ. Guatemala. The World Bank (2017) Sci. Pollut. Res. 21(5), 3558–3571 (2014) 85. Pappalardo, L., Vanhoof, M., Gabrielli, L., Smoreda, Z., 101. Beekhuizen, J., Kromhout, H., Huss, A., Vermeulen, R.: Perfor- Pedreschi, D., Giannotti, F.: An analytical framework to now- mance of gps-devices for environmental exposure assessment. J. cast well-being using mobile phone data. Int. J. Data Sci. Anal. Eposure Sci. Environ. Epidemiol. 23(5), 498 (2013) 2(1), 75–92 (2016). https://doi.org/10.1007/s41060-016-0013-2 102. Pappalardo, L., Simini, F., Barlacchi, G., Pellungrini, R.: Scikit- 86. Lotero, L., Cardillo, A., Hurtado, R., Gómez-Gardeñes, J.: Several mobility: a python library for the analysis, generation and risk multiplexes in the same city: the role of socioeconomic differences assessment of mobility data. arXiv:1907.07062 (2019) in urban mobility. In: Interconnected Networks, pp. 149–164. 103. Jankowska, M.M., Schipperijn, J., Kerr, J.: A framework for using Springer (2016) GPS data in physical activity and sedentary behavior studies. 87. Amini, A., Kung, K., Kang, C., Sobolevsky, S., Ratti, C.: The Exerc. Sport Sci. Rev. 43(1), 48 (2015) impact of social segregation on human mobility in developing 104. Kelly, P., Krenn, P., Titze, S., Stopher, P., Foster, C.: Quantify- and industrialized regions. EPJ Data Sci. 3(1), 6 (2014) ing the difference between self-reported and global positioning 88. Smith-Clarke, C., Mashhadi, A., Capra, L.: Poverty on the cheap: systems-measured journey durations: a systematic review. Transp. estimating poverty maps using aggregated mobile communication Rev. 33(4), 443–459 (2013) networks. In: Proceedings of the SIGCHI Conference on Human 105. Meurs, H., Haaijer, R.: Spatial structure and mobility. Transp. Res. Factors in Computing Systems, pp. 511–520. , ACM (2014) Part D Transp. Environ. 6(6), 429–446 (2001) 89. Picornell, M., Ruiz, T., Borge, R., García-Albertos, P., de la Paz, 106. Oliver, M., Badland, H., Mavoa, S., Duncan, M.J., Duncan, S.: D., Lumbreras, J.: Population dynamics based on mobile phone Combining GPS, GIS, and accelerometry: methodological issues 123 International Journal of Data Science and Analytics in the assessment of location and intensity of travel behaviors. J. 124. De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting Phys. Activity Health 7(1), 102–108 (2010) depression via social media. ICWSM 13, 1–10 (2013) 107. Adams, S.A., Matthews, C.E., Ebbeling, C.B., Moore, C.G., 125. Signorini, A., Segre, A.M., Polgreen, P.M.: The use of Twitter Cunningham, J.E., Fulton, J., Hebert, J.R.: The effect of social to track levels of disease activity and public concern in the US desirability and social approval on self-reports of physical activ- during the influenza A H1N1 pandemic. PLoS ONE 6(5), e19467 ity. Am. J. Epidemiol. 161(4), 389–398 (2005) (2011) 108. Pappalardo, L., Rinzivillo, S., Qu, Z., Pedreschi, D., Giannotti, 126. Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves F.: Understanding the patterns of car travel. Eur. Phys. J. Spec. influenza forecasting. PLoS Curr. 6, 12 (2014) Top. 215(1), 61–73 (2013). https://doi.org/10.1140/epjst/e2013- 127. Lampos, V., Cristianini, N.: Tracking the flu pandemic by mon- 01715-5 itoring the social web. In: 2010 2nd International Workshop on 109. Chaix, B., Kestens, Y., Duncan, D.T., Brondeel, R., Méline, J., Cognitive Information Processing, pp. 411–416. IEEE (2010) El Aarbaoui, T., Pannier, B., Merlo, J.: A GPS-based methodol- 128. Lampos, V., Cristianini, N.: Nowcasting events from the social ogy to analyze environment-health associations at the trip level: web with statistical learning. ACM Trans. Intell. Syst. Technol. case-crossover analyses of built environments and walking. Am. 3(4), 72 (2012) J. Epidemiol. 184(8), 579–589 (2016) 129. Chen, X., Yang, X.: Does food environment influence food 110. Kerr, J., Duncan, S., Schipperjin, J.: Using global positioning sys- choices? A geographical analysis through “tweets”. Appl. Geogr. tems in health research: a practical approach to data collection and 51, 82–89 (2014) processing. Am. J. Prev. Med. 41(5), 532–540 (2011) 130. Llorente, A., Garcia-Herranz, M., Cebrian, M., Moro, E.: Social 111. Saelens, B.E., Vernez Moudon, A., Kang, B., Hurvitz, P.M., Zhou, media fingerprints of unemployment. PLoS ONE 10(5), e0128692 C.: Relation between higher physical activity and public transit (2015) use. Am. J. Public Health 104(5), 854–859 (2014) 131. Antenucci, D., Cafarella, M., Levenstein, M., Ré, C., Shapiro, 112. Rundle, A.G., Sheehan, D.M., Quinn, J.W., Bartley, K., Eisen- M.D.: Using social media to measure labor market flows. Tech- hower, D., Bader, M.M., Lovasi, G.S., Neckerman, K.M.: Using nical report. National Bureau of Economic Research (2014) GPS data to study neighborhood walkability and physical activity. 132. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock Am. J. Prev. Med. 50(3), e65–e72 (2016) market. J. Comput. Sci. 2(1), 1–8 (2011) 113. Sadler, R.C., Gilliland, J.A.: Comparing children’s GPS tracks 133. Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., Goldstein, G. with geospatial proxies for exposure to junk food. Spat. Spat. Identifying and following expert investors in stock microblogs. In: Temp. Epidemiol. 14, 55–61 (2015) Proceedings of the Conference on Empirical Methods in Natural 114. Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive Language Processing, pp 1310–1319. Association for Computa- monitoring of depressive states by means of smartphone mobility tional Linguistics (2011) traces analysis. In: Proceedings of the 2015 ACM International 134. De Choudhury, M., Sundaram, H., John, A., Seligmann, D.D.: Can Joint Conference on Pervasive and Ubiquitous Computing, pp. blog communication dynamics be correlated with stock market 1293–1304. ACM (2015) activity? In: Proceedings of the Nineteenth ACM Conference on 115. Marchetti, S., Giusti, C., Pratesi, M., Salvati, N., Giannotti, F., Hypertext and Hypermedia, pp. 55–60. ACM (2008) Pedreschi, D., Rinzivillo, S., Pappalardo, L., Gabrielli, L.: Small 135. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: $FAKE: area model-based estimators using big data sources. J. Off. Stat. Evidence of spam and bot activity in stock microblogs on Twitter. 31(2), 263–281 (2015) In: Proceedings of the 12th International Conference on Web and 116. Smith, C., Quercia, D., Capra, L.: Finger on the pulse: identifying Social Media (ICWSM’18), pp. 580–583. AAAI (2018) deprivation using transit flow analysis. In: Proceedings of the 2013 136. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cash- Conference on Computer Supported Cooperative Work, pp. 683– tag piggybacking: uncovering spam and bot activity in stock 692. ACM (2013) microblogs on twitter. ACM Trans. Web (TWEB) 13(2), 11 (2019) 117. Lathia, N., Quercia, D., Crowcroft, J.: The hidden image of the 137. Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., Tesconi, M.: city: sensing community well-being from urban mobility. In: Predictability or early warning: using social media in modern International Conference on Pervasive Computing, pp. 91–98. emergency response. IEEE Internet Comput. 20(6), 4–6 (2016) Springer (2012) 138. Kryvasheyeu, Y., Chen, H., Obradovich, N., Moro, E., Van Hen- 118. Robinson, A.I., Carnes, F., Oreskovic, N.M.: Spatial analysis of tenryck, P., Fowler, J., Cebrian, M.: Rapid assessment of disaster crime incidence and adolescent physical activity. Prev. Med. 85, damage using social media activity. Sci. Adv. 2(3), e1500779 74–77 (2016) (2016) 119. Ariel, B., Partridge, H.: Predictable policing: measuring the crime 139. Avvenuti, M., Cresci, S., La Polla, M.N., Meletti, C., Tesconi, M.: control benefits of hotspots policing at bus stops. J. Quant. Crim- Nowcasting of earthquake consequences using big social data. inol. 33(4), 809–833 (2017) IEEE Internet Comput. 6, 37–45 (2017) 120. Spinsanti, L., Berlingerio, M., Pappalardo, L.: Mobility and Geo- 140. Mendoza, M., Poblete, B., Valderrama, I.: Nowcasting earthquake Social Networks, pp. 315–333. Cambridge University Press, Cam- damages with twitter. EPJ Data Sci. 8(1), 3 (2019) bridge (2013). https://doi.org/10.1017/CBO9781139128926.017 141. Avvenuti, M., Cresci, S., Del Vigna, F., Tesconi, M.: Impromptu 121. Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, crisis mapping to prioritize emergency response. Computer 49(5), methodological pitfalls, and ethical boundaries. Front. Big Data 28–37 (2016) 2, 13 (2019) 142. Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., Tesconi, M.: 122. Rost, M., Barkhuus, L., Cramer, H., Brown, B.: Representation CrisMap: a big data crisis mapping system based on damage detec- and communication: challenges in interpreting large social media tion and geoparsing. Inf. Syst. Front. 1, 1–19 (2018) datasets. In: Proceedings of the 2013 Conference on Computer 143. Preis, T., Moat, H.S., Bishop, S.R., Treleaven, P., Stanley, H.E.: Supported Cooperative Work, pp. 357–362. ACM (2013) Quantifying the digital traces of hurricane sandy on flickr. Sci. 123. Eichstaedt, J.C., Schwartz, H.A., Kern, M.L., Park, G., Labarthe, Rep. 3, 3141 (2013) D.R., Merchant, R.M., Jha, S., Agrawal, M., Dziurzynski, L.A., 144. Chen, X., Cho, Y, Jang, S.Y.: Crime prediction using twitter senti- Sap, M., et al.: Psychological language on twitter predicts county- ment and weather. In: 2015 Systems and Information Engineering level heart disease mortality. Psychol. Sci. 26(2), 159–169 (2015) Design Symposium, pp. 63–68. IEEE (2015) 123 International Journal of Data Science and Analytics 145. Al Boni, M., Gerber, M.S.: Predicting crime with routine activity 163. Leetaru, K.: The GDELT Project. https://www.gdeltproject.org/. patterns inferred from social media. In: 2016 IEEE Interna- Accessed Oct 2019 (2013) tional Conference on Systems, Man, and Cybernetics (SMC), pp. 164. Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van 001233–001238. IEEE (2016) Der Goot, E., Halkia, M., Pouliquen, B., Belyaeva, J.: Sentiment 146. Kadar, C., Brüngger, R.R., Pletikosa, I.: Measuring ambient pop- analysis in the news. arXiv preprint arXiv:1309.6202 (2013) ulation from location-based social networks to describe urban 165. Dehghan, A., Montgomery, L., Arciniegas-Mendez, M., Ferman- crime. In: International Conference on Social Informatics, pp. Guerra, M.: Predicting News Bias (2016) 521–535. Springer (2017) 166. Grein, T.W., Kamara, K., Rodier, G., Plant, A.J., Bovier, P., Ryan, 147. Chen, F., Neill, D.B.: Non-parametric scan statistics for event M.J., Ohyama, T., Heymann, D.L.: Rumors of disease in the global detection and forecasting in heterogeneous social media graphs. village: outbreak verification. Emerg. Infect. Dis. 6(2), 97 (2000) In: Proceedings of the 20th ACM SIGKDD International Confer- 167. Heymann, D.L., Rodier, G.R., et al.: Hot spots in a wired world: ence on Knowledge Discovery and Data Mining, pp. 1166–1175. Who surveillance of emerging and re-emerging infectious dis- ACM (2014) eases. Lancet. Infect. Dis 1(5), 345–353 (2001) 148. Nobles, M., Neill, D.B., Flaxman, S.: Predicting and Preventing 168. Brownstein, J.S., Freifeld, C.C., Reis, B.Y., Mandl, K.D.: Surveil- Emerging Outbreaks of Crime (2014) lance sans frontieres: Internet-based emerging infectious disease 149. Neill, D.B., Gorr, W.L.: Detecting and preventing emerging epi- intelligence and the healthmap project. PLoS Med. 5(7), e151 demics of crime. Adv. Dis. Surveill. 4(13), 18 (2007) (2008) 150. Colleoni, E., Rozza, A., Arvidsson, A.: Echo chamber or public 169. Wilson, K., Brownstein, J.S.: Early detection of disease outbreaks sphere? Predicting political orientation and measuring political using the internet. CMAJ 180(8), 829–831 (2009) homophily in Twitter using big data. J. Commun. 64(2), 317–332 170. Chunara, R., Andrews, J.R., Brownstein, J.S.: Social and news (2014) media enable estimation of epidemiological patterns early in the 151. Goh, T.T., Xin, Z., Jin, D.: Habit formation in social media con- 2010 haitian cholera outbreak. Am. J. Trop. Med. Hyg. 86(1), sumption: a case of political engagement. Behav. Inf. Technol. 39–45 (2012) 38(3), 273–288 (2019) 171. Alanyali, M., Moat, H.S., Preis, T.: Quantifying the relationship 152. Ferrara, E.: Manipulation and abuse on social media. ACM SIG- between financial news and the stock market. Sci. Rep. 3, 3578 WEB Newsl. 2015(Spring), 4 (2015) (2013) 153. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, 172. Lillo, F., Miccichè, S., Tumminello, M., Piilo, J., Mantegna, R.N.: M.: The paradigm-shift of social spambots: evidence, theories, How news affects the trading behaviour of different categories of and tools for the arms race. In: Proceedings of the 26th Interna- investors in a financial market. Quant. Finance 15(2), 213–229 tional Conference on World Wide Web Companion, International (2015) World Wide Web Conferences Steering Committee, pp 963–972 173. Kleinschmit, D., Sjöstedt, V.: Between science and politics: (2017) Swedish newspaper reporting on forests in a changing climate. 154. Goldstein, B.A., Navar, A.M., Pencina, M.J., Ioannidis, J.: Oppor- Environ. Sci. Policy 35, 117–127 (2014) tunities and challenges in developing risk prediction models with 174. Boykoff, M.T.: Lost in translation? united states television news electronic health records data: a systematic review. J. Am. Med. coverage of anthropogenic climate change, 1995–2004. Clim. Inform. Assoc. 24(1), 198–208 (2017) Change 86(1–2), 1–11 (2008) 155. Wilson, P.W., D’Agostino, R.B., Levy, D., Belanger, A.M., Silber- 175. Van Aelst, P., De Swert, K.: Politics in the News: Do Campaigns shatz, H., Kannel, W.B.: Prediction of coronary heart disease using Matter? A Comparison of Political News During Election Periods risk factor categories. Circulation 97(18), 1837–1847 (1998) and Routine Periods in Flanders (Belgium). Walter de Gruyter 156. Sultana, J., Leal, I., de Wilde, M., de Ridder, M., van der Lei, GmbH & Co, KG, Belgium (2009) J., Sturkenboom, M., et al.: Identifying data elements to measure 176. Eurostat Practical Guide for Processing Supermarket Scanner frailty in a dutch nationwide electronic medical record database Data (2017) for use in postmarketing safety evaluation: an exploratory study. 177. Griffith, R., O’Connell, M.: The use of scanner data for research Drug Saf. 12, 1–7 (2019) into nutrition. Fiscal Stud. 30(3–4), 339–365 (2009) 157. Ghaderighahfarokhi, S., Sadeghifar, J.: A model to predict low 178. Baron, S., Lock, A.: The challenges of scanner data. J. Oper. Res. birth weight infants and affecting factors using data mining tech- Soc. 46(1), 50–61 (1995) niques. J. Basic Res. Med. Sci. 5(3), 1–8 (2018) 179. Eurostat Practical Guide for Processing Supermarket Scanner 158. Metzger, M.H., Tvardik, N., Gicquel, Q., Bouvry, C., Poulet, E., Data. https://circabc.europa.eu/sd/a/8e1333df-ca16-40fc-bc6a- Potinet-Pagliaroli, V.: Use of emergency department electronic 1ce1be37247c/Practical-Guide-Supermarket. Accessed Oct medical records for automated epidemiological surveillance of 2019 (2017) suicide attempts: a french pilot study. Int. J. Methods Psychiatric 180. Diewert, W.E.: Harmonized indexes of consumer prices: their con- Res. 26(2), e1522 (2017) ceptual foundations (2002) 159. Mhaskar, H.N., Pereverzyev, S.V., van der Walt, M.D.: A deep 181. Magruder, S.: Evaluation of over-the-counter pharmaceutical learning approach to diabetic blood glucose prediction. Front. sales as a possible early warning indicator of human disease. Johns Appl. Math. Stat. 3, 14 (2017) Hopkins Univ. APL Tech. Dig. 24(4), 349–353 (2003) 160. Santillana, M., Nsoesie, E.O., Mekaru, S.R., Scales, D., Brown- 182. Bonnet, C., Dubois, P., Réquillart, V.: The dynamics of satured fat stein, J.S.: Using clinicians’ search query data to monitor influenza consumption in france. Technical. report. Toulouse mimeo (2008) epidemics. Clin. Infect. Dis. Off. Publ. Infect. Dis. Soc. Am. 183. Griffith, R., Leibtag, E., Leicester, A., Nevo, A.: Consumer shop- 59(10), 1446 (2014) ping behavior: how much do consumers save? J. Econ. Perspect. 161. Althoff, T., Hicks, J.L., King, A.C., Delp, S.L., Leskovec, J., 23(2), 99–120 (2009) et al.: Large-scale physical activity data reveal worldwide activity 184. Janssen, A., Parslow, E.: Pregnancy and alcohol purchases: evi- inequality. Nature 547(7663), 336 (2017) dence from scanner data. Avail. SSRN 3446559, 12 (2019) 162. Hayeri, A.: Predicting future glucose fluctuations using machine 185. Rider, J., Berck, P., Villas-Boas, S.B.: Eating Healthy in Lean learning and wearable sensor data. Diabetes (2018). https://doi. Times: The Relationship Between Unemployment and Grocery org/10.2337/db18-738-P Purchasing Patterns (2012) 123 International Journal of Data Science and Analytics 186. Van der Grient, H.A., de Haan, J.: The use of supermarket scanner 209. McCarthy, M.J.: Internet monitoring of suicide risk in the popu- data in the dutch cpi. In: Joint ECE/ILO Workshop on Scanner lation. J. Affect. Disord. 122(3), 277–279 (2010) Data, vol. 10 (2010) 210. Kristoufek, L., Moat, H.S., Preis, T.: Estimating suicide occur- 187. Silver, M., Heravi, S.: Scanner data and the measurement of infla- rence statistics using google trends. EPJ Data Sci. 5(1), 32 (2016) tion. Econ. J. 111(472), 383–404 (2001) 211. Adler, N., Cattuto, C., Kalimeri, K., Paolotti, D., Tizzoni, M., 188. Pennacchioli, D., Coscia, M., Rinzivillo, S., Giannotti, F., Verhulst, S., Yom-Tov, E., Young, A.: How search engine data Pedreschi, D.: The retail market as a complex system. EPJ Data enhance the understanding of determinants of suicide in india Sci. 3(1), 33 (2014) and inform prevention: observational study. J. Med. Internet Res. 189. Sobolevsky, S., Massaro, E., Bojic, I., Arias, J.M., Ratti, C.: Pre- 21(1), e10179 (2019). https://doi.org/10.2196/10179 dicting regional economic indices using big data of individual 212. Ettredge, M., Gerdes, J., Karuga, G.: Using web-based search bank card transactions. In: 2017 IEEE International Conference data to predict macroeconomic statistics. Commun. ACM 48(11), on Big Data (Big Data), pp. 1313–1318. IEEE (2017) 87–92 (2005) 190. Panzone, L.A., Wossink, A., Southerton, D.: The design of an 213. Askitas, N., Zimmermann, K.: Google econometrics and unem- environmental index of sustainable food consumption: a pilot ployment forecasting. Appl. Econ. Quart. 55(2), 107–120 (2009) study using supermarket data. Ecol. Econ. 94, 44–55 (2013) 214. Francesco/FD D, Marcucci J “google it!” forecasting the us 191. Gadema, Z., Oglethorpe, D.: The use and usefulness of carbon unemployment rate with a google job search index. Mpra paper. labelling food: a policy perspective from a survey of uk super- University Library of Munich, Germany. https://EconPapers. market shoppers. Food Policy 36(6), 815–822 (2011) repec.org/RePEc:pra:mprapa:18248 (2009) 192. Brancoli, P., Rousta, K., Bolton, K.: Life cycle assessment of 215. Suhoy, T., et al.: Query indices and a 2008 downturn: Israeli data. supermarket food waste. Resour. Conserv. Recycl. 118, 39–46 Technical report. Bank of Israel (2009) (2017) 216. Baker, S., Fradkin, A., et al.: What drives job search? evidence 193. Scholz, K., Eriksson, M., Strid, I.: Carbon footprint of supermar- from google search data. Discussion Papers, pp. 10–20 (2011) ket food waste. Resour. Conserv. Recycl. 94, 56–65 (2015) 217. McLaren, N., Shanbhogue, R.: Using internet search data as eco- 194. Goel, S., Hofman, J.M., Lahaie, S., Pennock, D.M., Watts, D.J.: nomic indicators. Bank Engl. Quart. Bull. 51(2), 134–140 (2011) Predicting consumer behavior with web search. Proc. Nat. Acad. 218. Choi, H., Varian, H.: Predicting initial claims for unemployment Sci. 107(41), 17486–17490 (2010) benefits. Google Inc, pp. 1–5 (2009) 195. Cooper, C.P., Mallon, K.P., Leadbetter, S., Pollack, L.A., Peipins, 219. Choi, H., Varian, H.: Predicting the present with google trends. L.A.: Cancer internet search activity on a major search engine, Econ. Rec. 88, 2–9 (2012) united states 2001–2003. J. Med. Internet Res. 7(3), e36 (2005) 220. Koop, G., Onorante, L.: Macroeconomic nowcasting using google 196. Polgreen, P.M., Chen, Y., Pennock, D.M., Nelson, F.D., Wein- probabilities. In: First International Conference on Advanced stein, R.A.: Using internet searches for influenza surveillance. Research Methods and Analytics, CARMA2016. https://doi.org/ Clin. Infect. Dis. 47(11), 1443–1448 (2008) 10.4995/CARMA2016.2016.4213 (2016) 197. Hulth, A., Rydevik, G., Linde, A.: Web queries as a source for 221. Guzman, G.: Internet search behavior as an economic forecasting syndromic surveillance. PLoS ONE 4(2), e4378 (2009) tool: the case of inflation expectations. J. Econ. Soc. Meas. 36(3), 198. Yuan, Q., Nsoesie, E.O., Lv, B., Peng, G., Chunara, R., Brown- 119–167 (2011) stein, J.S.: Monitoring influenza epidemics in china with search 222. Preis, T., Reith, D., Stanley, H.E.: Complex dynamics of our eco- query from baidu. PLoS ONE 8(5), e64323 (2013) nomic life on different scales: insights from search engine query 199. Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, data. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 368(1933), M.S., Brilliant, L.: Detecting influenza epidemics using search 5707–5719 (2010). https://doi.org/10.1098/rsta.2010.0284 engine query data. Nature 457(7232), 1012 (2009) 223. Preis, T., Moat, H.S., Stanley, H.E.: Quantifying trading behavior 200. Google: Google Flu Trends. http://www.google.org/flutrends. in financial markets using google trends. Sci. Rep. (2013). https:// Accessed Oct 2019 (2008) doi.org/10.1038/srep01684 201. Nsoesie, E., Mararthe, M., Brownstein, J.: Forecasting peaks of 224. Curme, C., Preis, T., Stanley, H.E., Moat, H.S.: Quantifying the seasonal influenza epidemics. PLoS Curr. 5, 8 (2013) semantics of search behavior before stock market moves. Proc. 202. Yang, W., Lipsitch, M., Shaman, J.: Inference of seasonal and Natl. Acad. Sci. 111(32), 11600–11605 (2014). https://doi.org/ pandemic influenza transmission dynamics. Proc. Nat. Acad. Sci. 10.1073/pnas.1324054111 112(9), 2723–2728 (2015) 225. Bordino, I., Battiston, S., Caldarelli, G., Cristelli, M., Ukkonen, 203. Wilson, N., Mason, K., Tobias, M., Peacey, M., Huang, Q., Baker, A., Weber, I.: Web search queries can predict stock market vol- M.: Interpreting “google flu trends” data for pandemic h1n1 umes. PLoS ONE 7(7), e40014 (2012) influenza: the new zealand experience. Eurosurveillance 14(44), 226. Moat, H.S., Curme, C., Avakian, A., Kenett, D.Y., Stanley, H.E., 19386 (2009) Preis, T.: Quantifying wikipedia usage patterns before stock mar- 204. Chan, E.H., Sahai, V., Conrad, C., Brownstein, J.S.: Using web ket moves. Sci. Rep. 3, 1801 (2013) search query data to monitor dengue epidemics: a new model for 227. Qi, H., Manrique, P., Johnson, D., Restrepo, E., Johnson, N.F.: neglected tropical disease surveillance. PLoS Neglect. Trop. Dis. Open source data reveals connection between online and on-street 5(5), e1206 (2011) protest activity. EPJ Data Sci. 5(1), 18 (2016a) 205. Althouse, B.M., Ng, Y.Y., Cummings, D.A.: Prediction of dengue 228. Qi, H., Manrique, P., Johnson, D., Restrepo, E., Johnson, N.F.: incidence using search query surveillance. PLoS Neglect. Trop. Association between volume and momentum of online searches Dis. 5(8), e1258 (2011) and real-world collective unrest. Results Phys. 6, 414–419 (2016b) 206. Dukic, V.M., David, M.Z., Lauderdale, D.S.: Internet queries and 229. Chykina, V., Crabtree, C.: Using google trends to mea- methicillin-resistant staphylococcus aureus surveillance. Emerg. sure issue salience for hard-to-survey populations. Socius 4, Infect. Dis. 17(6), 1068 (2011) 2378023118760414 (2018) 207. Ocampo, A.J., Chunara, R., Brownstein, J.S.: Using search queries 230. Reilly, S., Richey, S., Taylor, J.B.: Using google search data for for malaria surveillance, Thailand. Malaria J. 12(1), 390 (2013) state politics research: an empirical validity test using roll-off data. 208. Yang, A.C., Tsai, S.J., Huang, N.E., Peng, C.K.: Association of State Polit. Policy Quart. 12(2), 146–159 (2012) internet search trends with suicide death in taipei city, taiwan, 2004–2009. J. Affect. Disord. 132(1–2), 179–184 (2011) 123 International Journal of Data Science and Analytics 231. Kleemann, F., Voß, G.G., Rieder, K.: Un (der) paid innovators: the 249. Grainger, A.: Citizen observatories and the new earth observation commercial utilization of consumer work through crowdsourcing. science. Remote Sens. 9(2), 153 (2017) Sci. Technol. Innov. Stud. 4(1), 5–26 (2008) 250. Schneider, P., Castell, N., Vogt, M., Lahoz W., Bartonova A.: 232. Behrend, T.S., Sharek, D.J., Meade, A.W., Wiebe, E.N.: The via- Making sense of crowdsourced observations: data fusion tech- bility of crowdsourcing for survey research. Behav. Res. Methods niques for real-time mapping of urban air quality. In: EGU General 43(3), 800 (2011) Assembly Conference Abstracts, p. 17 (2015) 233. Paolotti, D., Carnahan, A., Colizza, V., Eames, K., Edmunds, J., 251. Meier, F., Fenner, D., Grassmann, T., Jänicke, B., Otto, M., Gomes, G., Koppeschaar, C., Rehn, M., Smallenburg, R., Turbe- Scherer, D.: Challenges and benefits from crowd sourced atmo- lin, C., et al.: Web-based participatory surveillance of infectious spheric data for urban climate research using Berlin, Germany, as diseases: the influenzanet participatory surveillance experience. testbed. In: ICUC9–9th International Conference on Urban Cli- Clin. Microbiol. Infect. 20(1), 17–21 (2014) mate jointly with 12th Symposium on the Urban Environment 234. Dalton, C., Durrheim, D., Fejsa, J., Francis, L., Carlson, S., (2015) d’Espaignet, E.T., Tuyl, F., et al.: Flutracking: a weekly australian 252. Chapman, L., Bell, C., Bell, S.: Can the crowdsourcing data community online survey of influenza-like illness in 2006, 2007 paradigm take atmospheric science to a new level? a case study of and 2008. Commun. Dis. Intell. Quart. Rep. 33(3), 316 (2009) the urban heat island of london quantified using netatmo weather 235. Smolinski, M.S., Crawley, A.W., Baltrusaitis, K., Chunara, R., stations. Int. J. Climatol. 37(9), 3597–3605 (2017) Olsen, J.M., Wójcik, O., Santillana, M., Nguyen, A., Brownstein, 253. Lea, S.G., D’Silva, E., Asok, A.: Women’s strategies addressing J.S.: Flu near you: crowdsourced symptom reporting spanning sexual harassment and assault on public buses: an analysis of 2 influenza seasons. Am. J. Public Health 105(10), 2124–2130 crowdsourced data. Crime Prev. Commun. Saf. 19(3–4), 227–239 (2015) (2017) 236. Hashemian, M., Knowles, D., Calver, J., Qian, W., Bullock, MC., 254. Gosselt, J.F., Van Hoof, J.J., Gent, B.S., Fox, J.P.: Violent frames: Bell, S., Mandryk, R.L., Osgood, N., Stanley, K.G.: iepi: an end analyzing internet movie database reviewers’ text descriptions of to end solution for collecting, conditioning and utilizing epi- media violence and gender differences from 39 years of us action, demiologically relevant data. In: Proceedings of the 2nd ACM thriller, crime, and adventure movies. Int. J. Commun. 9, 547–567 International Workshop on Pervasive Wireless Healthcare. pp. 3– (2015) 8. ACM (2012) 255. Ozkan, T., Worrall, J.L., Zettler, H.: Validating media-driven and 237. Madan, A., Cebrian, M., Moturu, S., Farrahi, K., et al.: Sensing crowdsourced police shooting data: a research note. J. Crime Jus- the “health state” of a community. IEEE Pervasive Comput. 11(4), tice 41(3), 334–345 (2018) 36–45 (2011) 256. Avvenuti, M., Bellomo, S., Cresci, S., La Polla, M.N., Tesconi, M.: 238. Martinucci, I., Natilli, M., Lorenzoni, V., Pappalardo, L., Mon- Hybrid crowdsensing: A novel paradigm to combine the strengths reale, A., Turchetti, G., Pedreschi, D., Marchi, S., Barale, R., of opportunistic and participatory crowdsensing. In: Proceed- de Bortoli, N.: Gastroesophageal reflux symptoms among ital- ings of the 26th International Conference on World Wide Web ian university students: epidemiology and dietary correlates using Companion, International World Wide Web Conferences Steer- automatically recorded transactions. BMC Gastroenterol. 18(1), ing Committee, pp. 1413–1421 (2017) 116 (2018) 257. Dennis, J.: United by what divides us: 38 degrees and the eu 239. Green, T.C., Huang, R., Wen, Q., Zhou, D.: Crowdsourced referendum. In: EU Referendum Analysis 2016: Media, Voters employer reviews and stock returns. J. Financ. Econ. 2, 18 (2019) and the Campaign. Bournemouth University, p. 100 (2016) 240. Dabirian, A., Kietzmann, J., Diba, H.: A great place to work!? 258. Yasseri, T., Bright, J.: Wikipedia traffic data and electoral predic- understanding crowdsourced employer branding. Bus. Horiz. tion: towards theoretically informed models. EPJ Data Sci. 5(1), 60(2), 197–205 (2017) 22 (2016) 241. Könsgen, R., Schaarschmidt, M., Ivens, S., Munzel, A.: Finding 259. Gellers, J.C.: Crowdsourcing global governance: sustainable meaning in contradiction on employee review sites-effects of dis- development goals, civil society, and the pursuit of democratic crepant online reviews on job application intentions. J. Interact. legitimacy. Int. Environ. Agreements Polit. Law Econ. 16(3), 415– Mark. 43, 165–177 (2018) 432 (2016) 242. Tingzon, I., Orden, A., Sy, S., Sekara, V., Weber, I., Fatehkia, 260. Burger, R.: Aristotle’s Dialogue with Socrates: On the “Nico- M., Herranz, M.G., Kim, D.: Mapping Poverty in the Philippines machean Ethics”. University of Chicago Press, Chicago (2009) Using Machine Learning, Satellite Imagery, and Crowd-sourced 261. Diener, E.: Subjective well-being. Psychol. Bull. 95(3), 542 Geospatial Information (missing year) (1984) 243. OpenStreetMap Community Openstreetmap. https://www. 262. Veenhoven, R.: How do we assess how happy we are? tenets, openstreetmap.org/#map=5/42.088/12.564. Accessed Oct 2019 implications and tenability of three theories. Happiness Econ. (2004) Polit. 25, 45–69 (2009) 244. Piaggesi, S., Gauvin, L., Tizzoni, M., Cattuto, C., Adler, N., Ver- 263. Alesina, A., Di Tella, R., MacCulloch, R.: Inequality and hap- hulst, S., Young, A., Price, R., Ferres, L., Panisson, A.: Predicting piness: are europeans and americans different? J. Public Econ. city poverty using satellite imagery. In: Proceedings of the IEEE 88(9–10), 2009–2042 (2004) Conference on Computer Vision and Pattern Recognition Work- 264. Watson, D., Clark, L.A., Tellegen, A.: Development and valida- shops, pp. 90–96 (2019) tion of brief measures of positive and negative affect: the PANAS 245. Abelson, B., Varshney, K.R., Sun, J.: Targeting direct cash trans- scales. J. Pers. Soc. Psychol. 54(6), 1063 (1988) fers to the extremely poor. In: Proceedings of the 20th ACM 265. Watson, D., Clark, L.A.: The Panas-x: Manual for the Positive SIGKDD International Conference on Knowledge Discovery and and Negative Affect Schedule-Expanded Form. Psychology Pub- Data Mining, pp. 1563–1572. ACM (2014) lications, New York (1999) 246. Hersman, E., Okolloh, O., Rotich, J., Kobia, D.: Ushahidi. https:// 266. Diener, E., Oishi, S., Tay, L.: Advances in subjective well-being www.ushahidi.com. Accessed Oct 2019 (2008) research. Nat. Hum. Behav. 2, 1 (2018) 247. Meier, P.: Digital Humanitarians: How Big Data is Changing the 267. Hudson, N.W., Anusic, I., Lucas, R.E., Donnellan, M.B.: Com- Face of Humanitarian Response. Routledge, London (2015) paring the reliability and validity of global self-report measures of 248. European Commission Citizens’ Observatories. https://www. subjective well-being with experiential day reconstruction mea- ushahidi.com. Accessed Oct 2019 (2016) sures. Assessment 2, 26 (2017) 123 International Journal of Data Science and Analytics 268. Anusic, I., Schimmack, U.: Stability and change of personality 291. Hudson, J.: Institutional trust and subjective well-being across the traits, self-esteem, and well-being: introducing the meta-analytic eu. Kyklos 59(1), 43–62 (2006) stability and change model of retest correlations. J. Pers. Soc. 292. Hayo, B. Happiness in Eastern Europe. Marburg Economic Work- Psychol. 110(5), 766 (2016) ing Paper No 12 (2004) 269. Tay, L., Chan, D., Diener, E.: The metrics of societal happiness. 293. Ferrer-i Carbonell, A., Gowdy, J.M.: Environmental degradation Soc. Indic. Res. 117(2), 577–600 (2014) and happiness. Ecol. Econ. 60(3), 509–516 (2007) 270. Deaton, A.: Income, health, and well-being around the world: 294. Gardner, J., Oswald, A.J.: Money and mental wellbeing: a longi- evidence from the gallup world poll. J. Econ. Perspect. 22(2), tudinal study of medium-sized lottery wins. J. Health Econ. 26(1), 53–72 (2008) 49–60 (2007) 271. Easterlin, R.A., Angelescu, L.: Happiness and growth the world 295. Tay, L., Zyphur, M., Batz, C.: Income and Subjective Well-Being: over: time series evidence on the happiness-income paradox. Review, Synthesis, and Future Research. Handbook of Well- Technical report. Institute of Labor Economics (IZA) (2009) Being. DEF Publishers, Salt Lake City (2017) 272. Kahneman, D., Deaton, A.: High income improves evaluation of 296. Wijngaards, I., Hendriks, M., Burger, M.J.: Steering towards hap- life but not emotional well-being. Proc. Nat. Acad. Sci. 107(38), piness: an experience sampling study on the determinants of 16489–16493 (2010) happiness of truck drivers. Transp. Res. Part A Policy Pract. 128, 273. Frijters, P., Beatton, T.: The mystery of the u-shaped relationship 131–148 (2019) between happiness and age. J. Econ. Behav. Organ. 82(2–3), 525– 297. van der Zwan, P., Hessels, J., Burger, M.: Happy free willies? 542 (2012) Investigating the relationship between freelancing and subjective 274. Stevenson, B., Wolfers, J.: The paradox of declining female hap- well-being. Small Bus. Econ. 8, 1–17 (2019) piness. Am. Econ. J. Econ. Policy 1(2), 190–225 (2009) 298. Blanchflower, D.G., Bell, D.N., Montagnoli, A., Moro, M.: 275. Deaton, A., Stone, A.A.: Understanding context effects for a mea- The happiness trade-off between unemployment and inflation. J. sure of life evaluation: how responses matter. Oxf. Econ. Pap. Money Credit Bank. 46(S2), 117–141 (2014) 68(4), 861–870 (2016) 299. Knabe, A., Schöb, R., Weimann, J.: Partnership, gender, and the 276. Yap, S.C., Wortman, J., Anusic, I., Baker, S.G., Scherer, L.D., well-being cost of unemployment. Soc. Indic. Res. 129(3), 1255– Donnellan, M.B., Lucas, R.E.: The effect of mood on judgments 1275 (2016) of subjective well-being: nine tests of the judgment model. J. Pers. 300. Brulé, G., Veenhoven, R.: Why are Latin Europeans less happy? Soc. Psychol. 113(6), 939 (2017) Polyphonic Anthropology-Theoretical and Empirical Cross- 277. Lucas, R.E., Lawless, N.M.: Does life seem better on a sunny Cultural Fieldwork. The Impact of Hierarchy. InTech (2012) day? Examining the association between daily weather conditions 301. Bartolini, S., Mikucka, M., Sarracino, F.: Money, trust and happi- and life satisfaction judgments. J. Pers. Soc. Psychol. 104(5), 872 ness in transition countries: evidence from time series. Soc. Indic. (2013) Res. 130(1), 87–106 (2017) 278. Kahneman, D., Diener, E., Schwarz, N.: Well-Being: Founda- 302. Ott, J.C.: Good governance and happiness in nations: technical tions of Hedonic Psychology. Russell Sage Foundation, New York quality precedes democracy and quality beats size. J. Happiness (1999) Stud. 11(3), 353–368 (2010) 279. Kahneman, D., Krueger, A.B., Schkade, D.A., Schwarz, N., Stone, 303. Fowler, J.H., Christakis, N.A.: Dynamic spread of happiness in A.A.: A survey method for characterizing daily life experience: a large social network: longitudinal analysis over 20 years in the the day reconstruction method. Science 306(5702), 1776–1780 framingham heart study. BMJ 337, a2338 (2008) (2004) 304. Luhmann, M.: Using big data to study subjective well-being. Curr. 280. Courvoisier, D.S., Eid, M., Lischetzke, T.: Compliance to a cell Opin. Behav. Sci. 18, 28–33 (2017) phone-based ecological momentary assessment study: the effect 305. Nederhof, A.J.: Methods of coping with social desirability bias: of time and personality characteristics. Psychol. Assess. 24(3), a review. Eur. J. Soc. Psychol. 15(3), 263–280 (1985) 713 (2012) 306. Quercia, D., Ellis, J., Capra, L., Crowcroft, J.: Tracking gross 281. Shiffman, S., Stone, A.A., Hufford, M.R.: Ecological momentary community happiness from tweets. In: Proceedings of the ACM assessment. Annu. Rev. Clin. Psychol. 4, 1–32 (2008) 2012 Conference on Computer Supported Cooperative Work, pp. 282. Eid, M.E., Diener, E.E.: Handbook of Multimethod Measurement 965–968. ACM (2012) in Psychology. American Psychological Association, New York 307. Bollen, J., Gonçalves, B., van de Leemput, I., Ruan, G.: The hap- (2006) piness paradox: your friends are happier than you. EPJ Data Sci. 283. Diener, E., Seligman, M.E.: Beyond money: toward an economy 6(1), 4 (2017) of well-being. Psychol. Sci. Public Interest 5(1), 1–31 (2004) 308. Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, 284. Costa, P.T., McCrae, R.R.: Influence of extraversion and neuroti- J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: cism on subjective well-being: happy and unhappy people. J. Pers. a system for subjectivity analysis. In: Proceedings of hlt/emnlp on Soc. Psychol. 38(4), 668 (1980) Interactive Demonstrations. Association for Computational Lin- 285. Zweig, J.S.: Are women happier than men? Evidence from the guistics, pp. 34–35 (2005) Gallup World Poll. J. Happiness Stud. 16(2), 515–541 (2015) 309. Bollen, J., Gonçalves, B., Ruan, G., Mao, H.: Happiness is assor- 286. Deaton, A.S., Tortora, R.: People in Sub-Saharan Africa rate their tative in online social networks. Artif. Life 17(3), 237–251 (2011) health and health care among the lowest in the world. Health Aff. 310. Kramer, A.D., Guillory, J.E., Hancock, J.T.: Experimental evi- 34(3), 519–527 (2015) dence of massive-scale emotional contagion through social net- 287. Veenhoven, R., Ehrhardt, J.: The cross-national pattern of hap- works. In: Proceedings of the National Academy of Sciences, p. piness: test of predictions implied in three theories of happiness. 201320040 (2014) Soc. Indic. Res. 34(1), 33–68 (1995) 311. Lim, K.H., Lee, K.E., Kendal, D., Rashidi, L., Naghizade, E., 288. Cuñado, J., de Gracia, F.P.: Does education affect happiness? Evi- Winter, S., Vasardani, M.: The grass is greener on the other side: dence for spain. Soc. Indic. Res. 108(1), 185–196 (2012) Understanding the effects of green spaces on twitter user senti- 289. Nikolaev, B.: Does higher education increase hedonic and eudai- ments. In: Companion of the The Web Conference 2018 on The monic happiness? J. Happiness Stud. 19(2), 483–504 (2018) Web Conference 2018, International World Wide Web Confer- 290. Rehdanz, K., Maddison, D.: Climate and happiness. Ecol. Econ. ences Steering Committee, pp. 275–282 (2018) 52(1), 111–125 (2005) 123 International Journal of Data Science and Analytics 312. Mitchell, L., Frank, M.R., Harris, K.D., Dodds, P.S., Danforth, 327. Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced C.M.: The geography of happiness: connecting twitter sentiment data management: overview and challenges. In: Proceedings of and expression, demographics, and objective characteristics of the 2017 ACM International Conference on Management of Data, place. PLoS ONE 8(5), e64417 (2013) pp. 1711–1716. ACM (2017) 313. Golder, S.A., Macy, M.W.: Diurnal and seasonal mood vary 328. Lathia, N., Sandstrom, G.M., Mascolo, C., Rentfrow, P.J.: Happier with work, sleep, and daylength across diverse cultures. Science people live more active lives: using smartphones to link happiness 333(6051), 1878–1881 (2011) and physical activity. PLoS ONE 12(1), e0160589 (2017) 314. Lansdall-Welfare, T., Lampos, V., Cristianini, N.: Nowcasting the 329. Asai, A., Evensen, S., Golshan, B., Halevy, A., Li, V., Lopatenko, mood of the nation. Significance 9(4), 26–28 (2012) A., Stepanov, D., Suhara, Y., Tan, W.C., Xu, Y. Happydb: a cor- 315. Cresci, S., La Polla, M.N., Mazza, M., Tesconi, M., Del Vigna, pus of 100,000 crowdsourced happy moments. arXiv preprint F.: #selfie: mapping the phenomenon. Consiglio Nazioonale delle arXiv:1801.07746 (2018) Ricerche IIT TR-08/2016 Technical Report (2016) 330. Bogomolov, A., Lepri, B., Pianesi, F.: Happiness recognition 316. Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: from mobile phone data. In: Social Computing (SocialCom), twitter sentiment and socio-economic phenomena. ICWSM 11, 2013 International Conference on Social Computing, pp. 790– 450–453 (2011) 795. IEEE (2013) 317. Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Dan- 331. Goldberg, L.R.: An alternative “description of personality”: the forth, C.M.: Temporal patterns of happiness and information in big-five factor structure. J. Pers. Soc. Psychol. 59(6), 1216 (1990) a global social network: hedonometrics and twitter. PLoS ONE 332. Carlquist, E., Nafstad, H.E., Blakar, R.M., Ulleberg, P., Delle 6(12), e26752 (2011) Fave, A., Phelps, J.M.: Well-being vocabulary in media language: 318. Iacus, S.M., Porro, G., Salini, S., Siletti, E.: Social networks, an analysis of changing word usage in Norwegian newspapers. J. happiness and health: from sentiment analysis to a multidi- Positive Psychol. 12(2), 99–109 (2017) mensional indicator of subjective well-being. arXiv preprint 333. Seligman, M.E.: Flourish: A New Understanding of Happiness arXiv:1512.01569 (2015) and Well-Being and How to Achieve Them. Nicholas Brealey, 319. Ceron, A., Curini, L., Iacus, S.M.: Social Media e Sentiment Anal- Boston (2011) ysis: L’evoluzione dei fenomeni sociali attraverso la Rete, vol. 9. 334. Greco, M., Stenner, P.: Happiness and the art of life: diagnosing the Springer, New York (2014) psychopolitics of wellbeing. Health Cult. Soc. 5(1), 1–19 (2013) 320. Ceron, A., Curini, L., Iacus, S.M.: ISA: a fast, scalable and accu- 335. Coulton, C.J., Goerge, R., Putnam-Hornstein, E., de Haan, B.: rate algorithm for sentiment analysis of social media content. Inf. Harnessing Big Data for Social Good: A Grand Challenge for Sci. 367, 105–124 (2016) Social Work, pp. 1–20. American Academy of Social Work and 321. Curini, L., Iacus, S., Canova, L.: Measuring idiosyncratic happi- Social Welfare, Cleveland (2015) ness through the analysis of twitter: an application to the italian 336. Lepri, B., Staiano, J., Sangokoya, D., Letouzé, E., Oliver, N.: The case. Soc. Indic. Res. 121(2), 525–542 (2015) tyranny of data? The bright and dark sides of data-driven decision- 322. Durahim, A.O., Coşkun, M.: # iamhappybecause: gross national making for social good. In: Transparent Data Mining for Big and happiness through twitter analysis and big data. Technol. Forecast. Small Data, pp. 3–24. Springer (2017) Soc. Change 99, 92–105 (2015) 337. Floridi, L., Taddeo, M.: What is data ethics? The Royal Society 323. Coviello, L., Sohn, Y., Kramer, A.D., Marlow, C., Franceschetti, (2016) M., Christakis, N.A., Fowler, J.H.: Detecting emotional contagion 338. Hand, D.J.: Aspects of data ethics in a changing world: where are in massive social networks. PLoS ONE 9(3), e90315 (2014) we now? Big Data 6(3), 176–190 (2018) 324. Algan, Y., Beasley, E., Guyot, F., Higa, K., Murtin, F., Senik, C., et al. Big Data Measures of Well-Being: Evidence from a Google Well-Being Index in the United States. OECD Statistics Working Publisher’s Note Springer Nature remains neutral with regard to juris- Papers 2016 (2016) dictional claims in published maps and institutional affiliations. 325. Lane, N.D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T., Campbell, A.T.: A survey of mobile phone sensing. IEEE Com- mun. Mag. 48(9), 140–150 (2010) 326. Staiano, J., Lepri, B., Aharony, N., Pianesi, F., Sebe, N., Pentland, A.: Friends don’t lie: inferring personality traits from social net- work structure. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 321–330. ACM (2012) 123

References (336)

Reinhart, C.M., Reinhart, V.R.: After the fall. Technical report. National Bureau of Economic Research (2010)
Fleurbaey, M.: Beyond gdp: the quest for a measure of social welfare. J. Econ. Lit. 47(4), 1029-75 (2009)
Stiglitz, J.E., Sen, A., Fitoussi, J.P.: Report by the Commission on the Measurement of Economic Performance and Social Progress. The Commission Paris (2009)
Dodge, R., Daly, A.P., Huyton, J., Sanders, L.D.: The challenge of defining wellbeing. Int. J. Wellbeing 2(3), 11 (2012)
Alkire, S.: Dimensions of human development. World Dev. 30(2), 181-205 (2002)
Rapporto, BES Il benessere equo e sostenibile in Italia. ISTAT (2015)
Organisation for Economic Co-operation and Development (OECD) OECD Guidelines on Measuring Subjective Well-Being. OECD Publishing (2013)
Veenhoven, R.: Conditions of Happiness, Reidel. Springer, Dor- drecht (1984)
Frey, B.S., Stutzer, A.: What can economists learn from happiness research? J. Econ. Lit. 40(2), 402-435 (2002)
Stiglitz, J.E., Sen, A., Fitoussi, J.P.: Measurement of economic performance and social progress. Online document. http://www. bitly/JTwmG Accessed 26 June 2012 (2009)
Bartels, M., Boomsma, D.I.: Born to be happy? The etiology of subjective well-being. Behav. Genet. 39(6), 605 (2009)
Bartels, M., Saviouk, V., De Moor, M.H., Willemsen, G., van Beijsterveldt, T.C., Hottenga, J.J., De Geus, E.J., Boomsma, D.I.: Heritability and genome-wide linkage scan of subjective happi- ness. Twin Res. Hum. Genet. 13(2), 135-142 (2010)
Nes, R.B., Røysamb, E.: The heritability of subjective well-being: review and meta-analysis. In: The Genetics of Psychological Well-Being: The Role of Heritability and Genetics in Positive Psychology, pp. 75-96 (2015)
Nes, R.B., Czajkowski, N., Tambs, K.: Family matters: happi- ness in nuclear families and twins. Behav. Genet. 40(5), 577-590 (2010)
Nes, R., Røysamb, E., Tambs, K., Harris, J., Reichborn- Kjennerud, T.: Subjective well-being: genetic and environmental contributions to stability and change. Psychol. Med. 36(7), 1033- 1042 (2006)
Røysamb, E., Harris, J.R., Magnus, P., Vittersø, J., Tambs, K.: Subjective well-being. Sex-specific effects of genetic and environ- mental factors. Personal. Individ. Differ. 32(2), 211-223 (2002)
Røysamb, E., Tambs, K., Reichborn-Kjennerud, T., Neale, M.C., Harris, J.R.: Happiness and health: environmental and genetic contributions to the relationship between subjective well-being, perceived health, and somatic illness. J. Pers. Soc. Psychol. 85(6), 1136 (2003)
Schnittker, J.: Happiness and success: genes, families, and the psy- chological effects of socioeconomic position and social support. Am. J. Sociol. 114(S1), S233-S259 (2008)
Pleeging, E., Burger, M., van Exel, J.: The relations between hope and subjective well-being: a literature overview and empirical analysis. Appl. Res. Qual. Life 1, 1-23 (2020)
Kenrick, D.T., Griskevicius, V., Neuberg, S.L., Schaller, M.: Ren- ovating the pyramid of needs: contemporary extensions built upon ancient foundations. Perspect. Psychol. Sci. 5(3), 292-314 (2010)
Ryan, R.M., Deci, E.L.: Self-determination theory and the facili- tation of intrinsic motivation, social development, and well-being. Am. Psychol. 55(1), 68 (2000)
Tay, L., Diener, E.: Needs and subjective well-being around the world. J. Pers. Soc. Psychol. 101(2), 354 (2011)
Clark, A.E., Oswald, A.J.: Satisfaction and comparison income. J. Public Econ. 61(3), 359-381 (1996)
Shields, M.A., Price, S.W., Wooden, M.: Life satisfaction and the economic and social characteristics of neighbourhoods. J. Popul. Econ. 22(2), 421-443 (2009)
Powdthavee, N.: How much does money really matter? Estimating the causal effects of income on happiness. Empir. Econ. 39(1), 77-92 (2010)
Nikolaev, B.: Living with mom and dad and loving it... or are you? J. Econ. Psychol. 51, 199-209 (2015)
Dolan, P., Peasgood, T., White, M.: Do we really know what makes us happy? A review of the economic literature on the factors asso- ciated with subjective well-being. J. Econ. Psychol. 29(1), 94-122 (2008)
Easterlin, R.A.: Does economic growth improve the human lot? Some empirical evidence. In: Nations and Households in Eco- nomic Growth, pp 89-125. Elsevier (1974)
Veenhoven, R.: Is happiness relative? Soc. Indic. Res. 24(1), 1-34 (1991)
Diener, E., Tay, L., Oishi, S.: Rising income and the subjective well-being of nations. J. Pers. Soc. Psychol. 104(2), 267 (2013)
Veenhoven, R., Vergunst, F.: The Easterlin illusion: economic growth does go with greater happiness. Int. J. Happiness Dev. 1(4), 311-343 (2014)
Sacks, D.W., Stevenson, B., Wolfers, J.: The new stylized facts about income and subjective well-being. Emotion 12(6), 1181 (2012)
Radcliff, B., Shufeldt, G.: Direct democracy and subjective well- being: the initiative and life satisfaction in the American states. Soc. Indic. Res. 128(3), 1405-1423 (2016)
Veenhoven, R.: Social conditions for human happiness: a review of research. Int. J. Psychol. 50(5), 379-391 (2015)
Deaton, A.: The Analysis of Household Surveys: A Microecono- metric Approach to Development Policy. The World Bank (1997)
European Project.: SoBigData. http://sobigdata.eu/index. Accessed Oct 2019 (2015)
Shi, Z.R., Wang, C., Fang, F.: Artificial Intelligence for Social Good: A Survey. arXiv preprint arXiv:2001.01818 (2020)
Solomon, D.J.: Conducting web-based surveys. Pract. Assess. Res. Eval. 7(19), 12 (2001)
Daas, P.J., Puts, M.J., Buelens, B., Van den Hurk, P.A.: Big data and official statistics. In: Proceedings of the NTTS, pp. 5-7. New Techniques and Technologies for Statistics (2013)
Struijs, P., Daas, P.: Quality approaches to big data in official statistics. In: European Conference on Quality in Official Statistics (2014)
Jahani, E., Sundsøy, P., Bjelland, J., Bengtsson, L., de Montjoye, Y.A., et al.: Improving official statistics in emerging markets using machine learning and mobile phone data. EPJ Data Sci. 6(1), 3 (2017)
Blumenstock, J.E.: Fighting poverty with data. Science 353(6301), 753-754 (2016)
United Nations.: A world that counts: mobilizing the data revolu- tion for sustainable development. Technical report (2014)
Sustainable Development Solutions Network: Indicators and a Monitoring Framework for the Sustainable Development Goals. Launching a Data Revolution for the SDGs, United Nations, New York (2015)
WHO, World Health Organization: Geneva Macroeconomics and health: investing in health for economic development-report of the commission on macroeconomics and health. Commission on Macroeconomics and Health (2001)
European Commission: The Lisbon strategy for growth and jobs (2000)
OECD.: OECD Better Life Index: Health. http://www. oecdbetterlifeindex.org/topics/health/. Accessed Oct 2019 (2011)
OECD.: OECD Better Life Index: Jobs. http://www. oecdbetterlifeindex.org/topics/jobs/. Accessed Oct 2019 (2011a)
OECD.: OECD Better Life Index: Income. http://www. oecdbetterlifeindex.org/topics/income/. Accessed Oct 2019 (2011b)
OECD.: OECD Better Life Index: Environment. http://www. oecdbetterlifeindex.org/topics/environment/. Accessed Oct 2019 (2011c)
OECD.: OECD Better Life Index: Safety. http://www. oecdbetterlifeindex.org/topics/safety/. Accessed Oct 2019 (2011d)
Amerio, P., Roccato, M.: Psychological reactions to crime in Italy: 2002-2004. J. Commun. Psychol. 35(1), 91-102 (2007)
OECD.: OECD Better Life Index: Civic Engagement. http://www. oecdbetterlifeindex.org/topics/civic-engagement/. Accessed Oct 2019 (2011)
Blondel, V.D., Decuyper, A., Krings, G.: A survey of results on mobile phone datasets analysis. EPJ Data Sci. 4(1), 10 (2015)
Eagle, N., Pentland, A.S.: Eigenbehaviors: identifying structure in routine. Behav. Ecol. Sociobiol. 63(7), 1057-1066 (2009)
Pappalardo, L., Simini, F., Rinzivillo, S., Pedreschi, D., Giannotti, F., Barabási, A.L.: Returners and explorers dichotomy in human mobility. Nat. Commun. 6, 8166 (2015)
Pappalardo, L., Rinzivillo, S., Simini, F.: Human mobility mod- elling: exploration and preferential return meet the gravity model. Proc. Comput. Sci. 83, 934-939 (2016). https://doi.org/10.1016/ j.procs.2016.04.188
Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: A data mining approach to assess privacy risk in human mobility data. ACM Trans. Intell. Syst. Technol. 9(3), 31:1-31:27 (2017). https://doi.org/10.1145/3106774
Pappalardo, L., Simini, F.: Data-driven generation of spatio- temporal routines in human mobility. Data Min. Knowl. Disc. 32(3), 787-829 (2018)
Giannotti, F., Pappalardo, L., Pedreschi, D., Wang, D.: A Com- plexity Science Perspective on Human Mobility, pp. 297-314.
Cambridge University Press, Cambridge (2013). https://doi.org/ 10.1017/CBO9781139128926.016
Ranjan, G., Zang, H., Zhang, Z.L., Bolot, J.: Are call detail records biased for sampling human mobility? ACM SIGMOBILE Mob. Comput. Commun. Rev. 16(3), 33-44 (2012)
Iovan, C., Olteanu-Raimond, A.M., Couronné, T., Smoreda, Z,: Moving and calling: mobile phone data quality measurements and spatiotemporal uncertainty in human mobility studies. In: Geo- graphic Information Science at the Heart of Europe, pp. 247-265. Springer (2013)
Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understand- ing individual human mobility patterns. Nature 453(7196), 779 (2008)
Barabasi, A.L.: The origin of bursts and heavy tails in human dynamics. Nature 435(7039), 207 (2005)
Oliver, N., Matic, A., Frias-Martinez, E.: Mobile network data for public health: opportunities and challenges. Front. Public Health 3, 189 (2015)
Finger, F., Genolet, T., Mari, L., de Magny, G.C., Manga, N.M., Rinaldo, A., Bertuzzo, E.: Mobile phone data highlights the role of mass gatherings in the spreading of cholera outbreaks. Proc. Nat. Acad. Sci. 113(23), 6421-6426 (2016)
Kafsi, M., Kazemi, E., Maystre, L., Yartseva, L., Grossglauser, M., Thiran, P.: Mitigating epidemics through mobile micro-measures. arXiv preprint arXiv:1307.2084 (2013)
Lima, A., De Domenico, M., Pejovic, V., Musolesi, M.: Disease containment strategies based on mobility and information dissem- ination. Sci. Rep. 5, 10650 (2015)
Madan, A., Cebrian, M., Lazer, D., Pentland, A.: Social sensing for epidemiological behavior change. In: Proceedings of the 12th International Conference on Ubiquitous Computing, pp. 291-300. ACM (2010)
Pappalardo, L., Pedreschi, D., Smoreda, Z., Giannotti, F.: Using big data to study the link between human mobility and socio- economic development. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 871-78 (2015) https://doi.org/10. 1109/BigData.2015.7363835
Toole, J.L., Lin, Y.R., Muehlegger, E., Shoag, D., González, M.C., Lazer, D.: Tracking employment shocks using mobile phone data. J. R. Soc. Interface 12(107), 20150185 (2015)
Sundsøy, P., Bjelland, J., Reme, B.A., Jahani, E., Wetter, E., Bengtsson, L.: Towards real-time prediction of unemployment and profession. In: International Conference on Social Informat- ics, pp. 14-23. Springer (2017)
Eagle, N., Macy, M., Claxton, R.: Network diversity and economic development. Science 328(5981), 1029-1031 (2010)
Steele, J.E., Sundsøy, P.R., Pezzulo, C., Alegana, V.A., Bird, T.J., Blumenstock, J., Bjelland, J., Engø-Monsen, K., de Montjoye, Y.A., Iqbal, A.M., et al.: Mapping poverty using mobile phone and satellite data. J. R. Soc. Interface 14(127), 20160690 (2017)
Mao, H., Shuai, X., Ahn, Y.Y., Bollen, J.: Quantifying socio- economic indicators in developing countries from mobile phone communication data: applications to côte d'ivoire. EPJ Data Sci. 4(1), 15 (2015)
Gutierrez, T., Krings, G., Blondel, V.D.: Evaluating socio- economic state of a country analyzing airtime credit and mobile phone datasets. arXiv preprint arXiv:1309.4496 (2013)
Blumenstock, J.: Calling for better measurement: estimating an individual's wealth and well-being. ACM KDD (Data Mining for Social Good) (2014)
Blumenstock, J., Cadamuro, G., On, R.: Predicting poverty and wealth from mobile phone metadata. Science 350(6264), 1073- 1076 (2015)
Frias-Martinez, V., Virseda, J.: On the relationship between socio- economic factors and cell phone usage. In: Proceedings of the Fifth International Conference on Information and Communica- tion Technologies and Development, pp. 76-84. ACM (2012)
Soto, V., Frias-Martinez, V., Virseda, J., Frias-Martinez, E.: Pre- diction of socioeconomic levels using cell phone records. In: International Conference on User Modeling, Adaptation, and Per- sonalization, pp. 377-388. Springer (2011)
Frias-Martinez, V., Soguero-Ruiz, C., Frias-Martinez, E., Josephi- dou, M.: Forecasting socioeconomic trends with cell phone records. In: Proceedings of the 3rd ACM Symposium on Com- puting for Development, p. 15. ACM (2013)
Hernandez, M., Hong, L., Frias-Martinez, V., Frias-Martinez, E.: Estimating poverty using cell phone data: evidence from Guatemala. The World Bank (2017)
Pappalardo, L., Vanhoof, M., Gabrielli, L., Smoreda, Z., Pedreschi, D., Giannotti, F.: An analytical framework to now- cast well-being using mobile phone data. Int. J. Data Sci. Anal. 2(1), 75-92 (2016). https://doi.org/10.1007/s41060-016-0013-2
Lotero, L., Cardillo, A., Hurtado, R., Gómez-Gardeñes, J.: Several multiplexes in the same city: the role of socioeconomic differences in urban mobility. In: Interconnected Networks, pp. 149-164. Springer (2016)
Amini, A., Kung, K., Kang, C., Sobolevsky, S., Ratti, C.: The impact of social segregation on human mobility in developing and industrialized regions. EPJ Data Sci. 3(1), 6 (2014)
Smith-Clarke, C., Mashhadi, A., Capra, L.: Poverty on the cheap: estimating poverty maps using aggregated mobile communication networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 511-520. , ACM (2014)
Picornell, M., Ruiz, T., Borge, R., García-Albertos, P., de la Paz, D., Lumbreras, J.: Population dynamics based on mobile phone data to improve air pollution exposure assessments. J. Expos. Sci. Environ. Epidemiol. 29(2), 278 (2019)
Lu, X., Wrathall, D.J., Sundsøy, P.R., Nadiruzzaman, M., Wetter, E., Iqbal, A., Qureshi, T., Tatem, A.J., Canright, G.S., Engø- Monsen, K., et al.: Detecting climate adaptation with mobile network data in bangladesh: anomalies in communication, mobil- ity and consumption patterns during cyclone mahasen. Clim. Change 138(3-4), 505-519 (2016)
Lu, X., Bengtsson, L., Holme, P.: Predictability of population displacement after the 2010 haiti earthquake. Proc. Nat. Acad. Sci. 109(29), 11576-11581 (2012)
Bengtsson, L., Lu, X., Thorson, A., Garfield, R., Von Schreeb, J.: Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post- earthquake geospatial study in haiti. PLoS Med. 8(8), e1001083 (2011)
Wilson, R., Zu Erbach-Schoenberg, E., Albert, M., Power, D., Tudge, S., Gonzalez, M., Guthrie, S., Chamberlain, H., Brooks, C., Hughes, C., et al.: Rapid and near real-time assessments of population displacement using mobile phone data following dis- asters: the 2015 Nepal earthquake. PLoS Curr. 8, 1 (2016)
Nyarku, M., Mazaheri, M., Jayaratne, R., Dunbabin, M., Rahman, M.M., Uhde, E., Morawska, L.: Mobile phones as monitors of personal exposure to air pollution: Is this the future? PLoS ONE 13(2), e0193150 (2018)
Liu, H.Y., Skjetne, E., Kobernus, M.: Mobile phone tracking: in support of modelling traffic-related air pollution contribution to individual exposure and its implications for public health impact assessment. Environ. Health 12(1), 93 (2013)
Decuyper, A., Rutherford, A., Wadhwa, A., Bauer, J.M., Krings, G., Gutierrez, T., Blondel, V.D., Luengo-Oroz, M.A.: Estimating food consumption and poverty indices with mobile phone data. arXiv preprint arXiv:1412.2595 (2014)
Bogomolov, A., Lepri, B., Staiano, J., Oliver, N., Pianesi, F., Pentland, A.: Once upon a crime: towards crime prediction from demographics and mobile data. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 27-434. ACM (2014)
Ferrara, E., De Meo, P., Catanese, S., Fiumara, G.: Detecting crim- inal organizations in mobile phone networks. Expert Syst. Appl. 41(13), 5733-5750 (2014)
Elgethun, K., Fenske, R.A., Yost, M.G., Palcisko, G.J.: Time- location analysis for exposure assessment studies of children using a novel global positioning system instrument. Environ. Health Perspect. 111(1), 115-122 (2003)
Dias, D., Tchepel, O.: Modelling of human exposure to air pollu- tion in the urban environment: a GPS-based approach. Environ. Sci. Pollut. Res. 21(5), 3558-3571 (2014)
Beekhuizen, J., Kromhout, H., Huss, A., Vermeulen, R.: Perfor- mance of gps-devices for environmental exposure assessment. J. Eposure Sci. Environ. Epidemiol. 23(5), 498 (2013)
Pappalardo, L., Simini, F., Barlacchi, G., Pellungrini, R.: Scikit- mobility: a python library for the analysis, generation and risk assessment of mobility data. arXiv:1907.07062 (2019)
Jankowska, M.M., Schipperijn, J., Kerr, J.: A framework for using GPS data in physical activity and sedentary behavior studies. Exerc. Sport Sci. Rev. 43(1), 48 (2015)
Kelly, P., Krenn, P., Titze, S., Stopher, P., Foster, C.: Quantify- ing the difference between self-reported and global positioning systems-measured journey durations: a systematic review. Transp. Rev. 33(4), 443-459 (2013)
Meurs, H., Haaijer, R.: Spatial structure and mobility. Transp. Res. Part D Transp. Environ. 6(6), 429-446 (2001)
Oliver, M., Badland, H., Mavoa, S., Duncan, M.J., Duncan, S.: Combining GPS, GIS, and accelerometry: methodological issues in the assessment of location and intensity travel behaviors. J. Phys. Activity Health 7(1), 102-108 (2010)
Adams, S.A., Matthews, C.E., Ebbeling, C.B., Moore, C.G., Cunningham, J.E., Fulton, J., Hebert, J.R.: The effect of social desirability and social approval on self-reports of physical activ- ity. Am. J. Epidemiol. 161(4), 389-398 (2005)
Pappalardo, L., Rinzivillo, S., Qu, Z., Pedreschi, D., Giannotti, F.: Understanding the patterns of car travel. Eur. Phys. J. Spec. Top. 215(1), 61-73 (2013). https://doi.org/10.1140/epjst/e2013- 01715-5
Chaix, B., Kestens, Y., Duncan, D.T., Brondeel, R., Méline, J., El Aarbaoui, T., Pannier, B., Merlo, J.: A GPS-based methodol- ogy to analyze environment-health associations at the trip level: case-crossover analyses of built environments and walking. Am. J. Epidemiol. 184(8), 579-589 (2016)
Kerr, J., Duncan, S., Schipperjin, J.: Using global positioning sys- tems in health research: a practical approach to data collection and processing. Am. J. Prev. Med. 41(5), 532-540 (2011)
Saelens, B.E., Vernez Moudon, A., Kang, B., Hurvitz, P.M., Zhou, C.: Relation between higher physical activity and public transit use. Am. J. Public Health 104(5), 854-859 (2014)
Rundle, A.G., Sheehan, D.M., Quinn, J.W., Bartley, K., Eisen- hower, D., Bader, M.M., Lovasi, G.S., Neckerman, K.M.: Using GPS data to study neighborhood walkability and physical activity. Am. J. Prev. Med. 50(3), e65-e72 (2016)
Sadler, R.C., Gilliland, J.A.: Comparing children's GPS tracks with geospatial proxies for exposure to junk food. Spat. Spat. Temp. Epidemiol. 14, 55-61 (2015)
Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 1293-1304. ACM (2015)
Marchetti, S., Giusti, C., Pratesi, M., Salvati, N., Giannotti, F., Pedreschi, D., Rinzivillo, S., Pappalardo, L., Gabrielli, L.: Small area model-based estimators using big data sources. J. Off. Stat. 31(2), 263-281 (2015)
Smith, C., Quercia, D., Capra, L.: Finger on the pulse: identifying deprivation using transit flow analysis. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 683- 692. ACM (2013)
Lathia, N., Quercia, D., Crowcroft, J.: The hidden image of the city: sensing community well-being from urban mobility. In: International Conference on Pervasive Computing, pp. 91-98. Springer (2012)
Robinson, A.I., Carnes, F., Oreskovic, N.M.: Spatial analysis of crime incidence and adolescent physical activity. Prev. Med. 85, 74-77 (2016)
Ariel, B., Partridge, H.: Predictable policing: measuring the crime control benefits of hotspots policing at bus stops. J. Quant. Crim- inol. 33(4), 809-833 (2017)
Spinsanti, L., Berlingerio, M., Pappalardo, L.: Mobility and Geo- Social Networks, pp. 315-333. Cambridge University Press, Cam- bridge (2013). https://doi.org/10.1017/CBO9781139128926.017
Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. Front. Big Data 2, 13 (2019)
Rost, M., Barkhuus, L., Cramer, H., Brown, B.: Representation and communication: challenges in interpreting large social media datasets. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 357-362. ACM (2013)
Eichstaedt, J.C., Schwartz, H.A., Kern, M.L., Park, G., Labarthe, D.R., Merchant, R.M., Jha, S., Agrawal, M., Dziurzynski, L.A., Sap, M., et al.: Psychological language on twitter predicts county- level heart disease mortality. Psychol. Sci. 26(2), 159-169 (2015)
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. ICWSM 13, 1-10 (2013)
Signorini, A., Segre, A.M., Polgreen, P.M.: The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLoS ONE 6(5), e19467 (2011)
Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves influenza forecasting. PLoS Curr. 6, 12 (2014)
Lampos, V., Cristianini, N.: Tracking the flu pandemic by mon- itoring the social web. In: 2010 2nd International Workshop on Cognitive Information Processing, pp. 411-416. IEEE (2010)
Lampos, V., Cristianini, N.: Nowcasting events from the social web with statistical learning. ACM Trans. Intell. Syst. Technol. 3(4), 72 (2012)
Chen, X., Yang, X.: Does food environment influence food choices? A geographical analysis through "tweets". Appl. Geogr. 51, 82-89 (2014)
Llorente, A., Garcia-Herranz, M., Cebrian, M., Moro, E.: Social media fingerprints of unemployment. PLoS ONE 10(5), e0128692 (2015)
Antenucci, D., Cafarella, M., Levenstein, M., Ré, C., Shapiro, M.D.: Using social media to measure labor market flows. Tech- nical report. National Bureau of Economic Research (2014)
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1-8 (2011)
Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., Goldstein, G. Identifying and following expert investors in stock microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 1310-1319. Association for Computa- tional Linguistics (2011)
De Choudhury, M., Sundaram, H., John, A., Seligmann, D.D.: Can blog communication dynamics be correlated with stock market activity? In: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, pp. 55-60. ACM (2008)
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: $FAKE: Evidence of spam and bot activity in stock microblogs on Twitter. In: Proceedings of the 12th International Conference on Web and Social Media (ICWSM'18), pp. 580-583. AAAI (2018)
Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cash- tag piggybacking: uncovering spam and bot activity in stock microblogs on twitter. ACM Trans. Web (TWEB) 13(2), 11 (2019)
Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., Tesconi, M.: Predictability or early warning: using social media in modern emergency response. IEEE Internet Comput. 20(6), 4-6 (2016)
Kryvasheyeu, Y., Chen, H., Obradovich, N., Moro, E., Van Hen- tenryck, P., Fowler, J., Cebrian, M.: Rapid assessment of disaster damage using social media activity. Sci. Adv. 2(3), e1500779 (2016)
Avvenuti, M., Cresci, S., La Polla, M.N., Meletti, C., Tesconi, M.: Nowcasting of earthquake consequences using big social data. IEEE Internet Comput. 6, 37-45 (2017)
Mendoza, M., Poblete, B., Valderrama, I.: Nowcasting earthquake damages with twitter. EPJ Data Sci. 8(1), 3 (2019)
Avvenuti, M., Cresci, S., Del Vigna, F., Tesconi, M.: Impromptu crisis mapping to prioritize emergency response. Computer 49(5), 28-37 (2016)
Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., Tesconi, M.: CrisMap: a big data crisis mapping system based on damage detec- tion and geoparsing. Inf. Syst. Front. 1, 1-19 (2018)
Preis, T., Moat, H.S., Bishop, S.R., Treleaven, P., Stanley, H.E.: Quantifying the digital traces of hurricane sandy on flickr. Sci. Rep. 3, 3141 (2013)
Chen, X., Cho, Y, Jang, S.Y.: Crime prediction using twitter senti- ment and weather. In: 2015 Systems and Information Engineering Design Symposium, pp. 63-68. IEEE (2015)
Al Boni, M., Gerber, M.S.: Predicting crime routine activity patterns inferred from social media. In: 2016 IEEE Interna- tional Conference on Systems, Man, and Cybernetics (SMC), pp. 001233-001238. IEEE (2016)
Kadar, C., Brüngger, R.R., Pletikosa, I.: Measuring ambient pop- ulation from location-based social networks to describe urban crime. In: International Conference on Social Informatics, pp. 521-535. Springer (2017)
Chen, F., Neill, D.B.: Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: Proceedings of the 20th ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, pp. 1166-1175. ACM (2014)
Nobles, M., Neill, D.B., Flaxman, S.: Predicting and Preventing Emerging Outbreaks of Crime (2014)
Neill, D.B., Gorr, W.L.: Detecting and preventing emerging epi- demics of crime. Adv. Dis. Surveill. 4(13), 18 (2007)
Colleoni, E., Rozza, A., Arvidsson, A.: Echo chamber or public sphere? Predicting political orientation and measuring political homophily in Twitter using big data. J. Commun. 64(2), 317-332 (2014)
Goh, T.T., Xin, Z., Jin, D.: Habit formation in social media con- sumption: a case of political engagement. Behav. Inf. Technol. 38(3), 273-288 (2019)
Ferrara, E.: Manipulation and abuse on social media. ACM SIG- WEB Newsl. 2015(Spring), 4 (2015)
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: Proceedings of the 26th Interna- tional Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, pp 963-972 (2017)
Goldstein, B.A., Navar, A.M., Pencina, M.J., Ioannidis, J.: Oppor- tunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24(1), 198-208 (2017)
Wilson, P.W., D'Agostino, R.B., Levy, D., Belanger, A.M., Silber- shatz, H., Kannel, W.B.: Prediction of coronary heart disease using risk factor categories. Circulation 97(18), 1837-1847 (1998)
Sultana, J., Leal, I., de Wilde, M., de Ridder, M., van der Lei, J., Sturkenboom, M., et al.: Identifying data elements to measure frailty in a dutch nationwide electronic medical record database for use in postmarketing safety evaluation: an exploratory study. Drug Saf. 12, 1-7 (2019)
Ghaderighahfarokhi, S., Sadeghifar, J.: A model to predict low birth weight infants and affecting factors using data mining tech- niques. J. Basic Res. Med. Sci. 5(3), 1-8 (2018)
Metzger, M.H., Tvardik, N., Gicquel, Q., Bouvry, C., Poulet, E., Potinet-Pagliaroli, V.: Use of emergency department electronic medical records for automated epidemiological surveillance of suicide attempts: a french pilot study. Int. J. Methods Psychiatric Res. 26(2), e1522 (2017)
Mhaskar, H.N., Pereverzyev, S.V., van der Walt, M.D.: A deep learning approach to diabetic blood glucose prediction. Front. Appl. Math. Stat. 3, 14 (2017)
Santillana, M., Nsoesie, E.O., Mekaru, S.R., Scales, D., Brown- stein, J.S.: Using clinicians' search query data to monitor influenza epidemics. Clin. Infect. Dis. Off. Publ. Infect. Dis. Soc. Am. 59(10), 1446 (2014)
Althoff, T., Hicks, J.L., King, A.C., Delp, S.L., Leskovec, J., et al.: Large-scale physical activity data reveal worldwide activity inequality. Nature 547(7663), 336 (2017)
Hayeri, A.: Predicting future glucose fluctuations using machine learning and wearable sensor data. Diabetes (2018). https://doi. org/10.2337/db18-738-P
Leetaru, K.: The GDELT Project. https://www.gdeltproject.org/. Accessed Oct 2019 (2013)
Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., Pouliquen, B., Belyaeva, J.: Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013)
Dehghan, A., Montgomery, L., Arciniegas-Mendez, M., Ferman- Guerra, M.: Predicting News Bias (2016)
Grein, T.W., Kamara, K., Rodier, G., Plant, A.J., Bovier, P., Ryan, M.J., Ohyama, T., Heymann, D.L.: Rumors of disease in the global village: outbreak verification. Emerg. Infect. Dis. 6(2), 97 (2000)
Heymann, D.L., Rodier, G.R., et al.: Hot spots in a wired world: Who surveillance of emerging and re-emerging infectious dis- eases. Lancet. Infect. Dis 1(5), 345-353 (2001)
Brownstein, J.S., Freifeld, C.C., Reis, B.Y., Mandl, K.D.: Surveil- lance sans frontieres: Internet-based emerging infectious disease intelligence and the healthmap project. PLoS Med. 5(7), e151 (2008)
Wilson, K., Brownstein, J.S.: Early detection of disease outbreaks using the internet. CMAJ 180(8), 829-831 (2009)
Chunara, R., Andrews, J.R., Brownstein, J.S.: Social and news media enable estimation of epidemiological patterns early in the 2010 haitian cholera outbreak. Am. J. Trop. Med. Hyg. 86(1), 39-45 (2012)
Alanyali, M., Moat, H.S., Preis, T.: Quantifying the relationship between financial news and the stock market. Sci. Rep. 3, 3578 (2013)
Lillo, F., Miccichè, S., Tumminello, M., Piilo, J., Mantegna, R.N.: How news affects the trading behaviour of different categories of investors in a financial market. Quant. Finance 15(2), 213-229 (2015)
Kleinschmit, D., Sjöstedt, V.: Between science and politics: Swedish newspaper reporting on forests in a changing climate. Environ. Sci. Policy 35, 117-127 (2014)
Boykoff, M.T.: Lost in translation? united states television news coverage of anthropogenic climate change, 1995-2004. Clim. Change 86(1-2), 1-11 (2008)
Van Aelst, P., De Swert, K.: Politics in the News: Do Campaigns Matter? A Comparison of Political News During Election Periods and Routine Periods in Flanders (Belgium). Walter de Gruyter GmbH & Co, KG, Belgium (2009)
Eurostat Practical Guide for Processing Supermarket Scanner Data (2017)
Griffith, R., O'Connell, M.: The use of scanner data for research into nutrition. Fiscal Stud. 30(3-4), 339-365 (2009)
Baron, S., Lock, A.: The challenges of scanner data. J. Oper. Res. Soc. 46(1), 50-61 (1995)
Eurostat Practical Guide for Processing Supermarket Scanner Data. https://circabc.europa.eu/sd/a/8e1333df-ca16-40fc-bc6a- 1ce1be37247c/Practical-Guide-Supermarket. Accessed Oct 2019 (2017)
Diewert, W.E.: Harmonized indexes of consumer prices: their con- ceptual foundations (2002)
Magruder, S.: Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins Univ. APL Tech. Dig. 24(4), 349-353 (2003)
Bonnet, C., Dubois, P., Réquillart, V.: The dynamics of satured fat consumption in france. Technical. report. Toulouse mimeo (2008)
Griffith, R., Leibtag, E., Leicester, A., Nevo, A.: Consumer shop- ping behavior: how much do consumers save? J. Econ. Perspect. 23(2), 99-120 (2009)
Janssen, A., Parslow, E.: Pregnancy and alcohol purchases: evi- dence from scanner data. Avail. SSRN 3446559, 12 (2019)
Rider, J., Berck, P., Villas-Boas, S.B.: Eating Healthy in Lean Times: The Relationship Between Unemployment and Grocery Purchasing Patterns (2012)
Van der Grient, H.A., de Haan, J.: The use of supermarket scanner data in dutch cpi. In: Joint ECE/ILO Workshop on Scanner Data, vol. 10 (2010)
Silver, M., Heravi, S.: Scanner data and the measurement of infla- tion. Econ. J. 111(472), 383-404 (2001)
Pennacchioli, D., Coscia, M., Rinzivillo, S., Giannotti, F., Pedreschi, D.: The retail market as a complex system. EPJ Data Sci. 3(1), 33 (2014)
Sobolevsky, S., Massaro, E., Bojic, I., Arias, J.M., Ratti, C.: Pre- dicting regional economic indices using big data of individual bank card transactions. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1313-1318. IEEE (2017)
Panzone, L.A., Wossink, A., Southerton, D.: The design of an environmental index of sustainable food consumption: a pilot study using supermarket data. Ecol. Econ. 94, 44-55 (2013)
Gadema, Z., Oglethorpe, D.: The use and usefulness of carbon labelling food: a policy perspective from a survey of uk super- market shoppers. Food Policy 36(6), 815-822 (2011)
Brancoli, P., Rousta, K., Bolton, K.: Life cycle assessment of supermarket food waste. Resour. Conserv. Recycl. 118, 39-46 (2017)
Scholz, K., Eriksson, M., Strid, I.: Carbon footprint of supermar- ket food waste. Resour. Conserv. Recycl. 94, 56-65 (2015)
Goel, S., Hofman, J.M., Lahaie, S., Pennock, D.M., Watts, D.J.: Predicting consumer behavior with web search. Proc. Nat. Acad. Sci. 107(41), 17486-17490 (2010)
Cooper, C.P., Mallon, K.P., Leadbetter, S., Pollack, L.A., Peipins, L.A.: Cancer internet search activity on a major search engine, united states 2001-2003. J. Med. Internet Res. 7(3), e36 (2005)
Polgreen, P.M., Chen, Y., Pennock, D.M., Nelson, F.D., Wein- stein, R.A.: Using internet searches for influenza surveillance. Clin. Infect. Dis. 47(11), 1443-1448 (2008)
Hulth, A., Rydevik, G., Linde, A.: Web queries as a source for syndromic surveillance. PLoS ONE 4(2), e4378 (2009)
Yuan, Q., Nsoesie, E.O., Lv, B., Peng, G., Chunara, R., Brown- stein, J.S.: Monitoring influenza epidemics in china with search query from baidu. PLoS ONE 8(5), e64323 (2013)
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012 (2009)
Google: Google Flu Trends. http://www.google.org/flutrends. Accessed Oct 2019 (2008)
Nsoesie, E., Mararthe, M., Brownstein, J.: Forecasting peaks of seasonal influenza epidemics. PLoS Curr. 5, 8 (2013)
Yang, W., Lipsitch, M., Shaman, J.: Inference of seasonal and pandemic influenza transmission dynamics. Proc. Nat. Acad. Sci. 112(9), 2723-2728 (2015)
Wilson, N., Mason, K., Tobias, M., Peacey, M., Huang, Q., Baker, M.: Interpreting "google flu trends" data for pandemic h1n1 influenza: the new zealand experience. Eurosurveillance 14(44), 19386 (2009)
Chan, E.H., Sahai, V., Conrad, C., Brownstein, J.S.: Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance. PLoS Neglect. Trop. Dis. 5(5), e1206 (2011)
Althouse, B.M., Ng, Y.Y., Cummings, D.A.: Prediction of dengue incidence using search query surveillance. PLoS Neglect. Trop. Dis. 5(8), e1258 (2011)
Dukic, V.M., David, M.Z., Lauderdale, D.S.: Internet queries and methicillin-resistant staphylococcus aureus surveillance. Emerg. Infect. Dis. 17(6), 1068 (2011)
Ocampo, A.J., Chunara, R., Brownstein, J.S.: Using search queries for malaria surveillance, Thailand. Malaria J. 12(1), 390 (2013)
Yang, A.C., Tsai, S.J., Huang, N.E., Peng, C.K.: Association of internet search trends with suicide death in taipei city, taiwan, 2004-2009. J. Affect. Disord. 132(1-2), 179-184 (2011)
McCarthy, M.J.: Internet monitoring of suicide risk in the popu- lation. J. Affect. Disord. 122(3), 277-279 (2010)
Kristoufek, L., Moat, H.S., Preis, T.: Estimating suicide occur- rence statistics using google trends. EPJ Data Sci. 5(1), 32 (2016)
Adler, N., Cattuto, C., Kalimeri, K., Paolotti, D., Tizzoni, M., Verhulst, S., Yom-Tov, E., Young, A.: How search engine data enhance the understanding of determinants of suicide in india and inform prevention: observational study. J. Med. Internet Res. 21(1), e10179 (2019). https://doi.org/10.2196/10179
Ettredge, M., Gerdes, J., Karuga, G.: Using web-based search data to predict macroeconomic statistics. Commun. ACM 48(11), 87-92 (2005)
Askitas, N., Zimmermann, K.: Google econometrics and unem- ployment forecasting. Appl. Econ. Quart. 55(2), 107-120 (2009)
Francesco/FD D, Marcucci J "google it!" forecasting the us unemployment rate with a google job search index. Mpra paper. University Library of Munich, Germany. https://EconPapers. repec.org/RePEc:pra:mprapa:18248 (2009)
Suhoy, T., et al.: Query indices and a 2008 downturn: Israeli data. Technical report. Bank of Israel (2009)
Baker, S., Fradkin, A., et al.: What drives job search? evidence from google search data. Discussion Papers, pp. 10-20 (2011)
McLaren, N., Shanbhogue, R.: Using internet search data as eco- nomic indicators. Bank Engl. Quart. Bull. 51(2), 134-140 (2011)
Choi, H., Varian, H.: Predicting initial claims for unemployment benefits. Google Inc, pp. 1-5 (2009)
Choi, H., Varian, H.: Predicting the present with google trends. Econ. Rec. 88, 2-9 (2012)
Koop, G., Onorante, L.: Macroeconomic nowcasting using google probabilities. In: First International Conference on Advanced Research Methods and Analytics, CARMA2016. https://doi.org/ 10.4995/CARMA2016.2016.4213 (2016)
Guzman, G.: Internet search behavior as an economic forecasting tool: the case of inflation expectations. J. Econ. Soc. Meas. 36(3), 119-167 (2011)
Preis, T., Reith, D., Stanley, H.E.: Complex dynamics of our eco- nomic life on different scales: insights from search engine query data. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 368(1933), 5707-5719 (2010). https://doi.org/10.1098/rsta.2010.0284
Preis, T., Moat, H.S., Stanley, H.E.: Quantifying trading behavior in financial markets using google trends. Sci. Rep. (2013). https:// doi.org/10.1038/srep01684
Curme, C., Preis, T., Stanley, H.E., Moat, H.S.: Quantifying the semantics of search behavior before stock market moves. Proc. Natl. Acad. Sci. 111(32), 11600-11605 (2014). https://doi.org/ 10.1073/pnas.1324054111
Bordino, I., Battiston, S., Caldarelli, G., Cristelli, M., Ukkonen, A., Weber, I.: Web search queries can predict stock market vol- umes. PLoS ONE 7(7), e40014 (2012)
Moat, H.S., Curme, C., Avakian, A., Kenett, D.Y., Stanley, H.E., Preis, T.: Quantifying wikipedia usage patterns before stock mar- ket moves. Sci. Rep. 3, 1801 (2013)
Qi, H., Manrique, P., Johnson, D., Restrepo, E., Johnson, N.F.: Open source data reveals connection between online and on-street protest activity. EPJ Data Sci. 5(1), 18 (2016a)
Qi, H., Manrique, P., Johnson, D., Restrepo, E., Johnson, N.F.: Association between volume and momentum of online searches and real-world collective unrest. Results Phys. 6, 414-419 (2016b)
Chykina, V., Crabtree, C.: Using google trends to mea- sure issue salience for hard-to-survey populations. Socius 4, 2378023118760414 (2018)
Reilly, S., Richey, S., Taylor, J.B.: Using google search data for state politics research: an empirical validity test using roll-off data. State Polit. Policy Quart. 12(2), 146-159 (2012)
Kleemann, F., Voß, G.G., Rieder, K.: Un (der) innovators: the commercial utilization of consumer work through crowdsourcing. Sci. Technol. Innov. Stud. 4(1), 5-26 (2008)
Behrend, T.S., Sharek, D.J., Meade, A.W., Wiebe, E.N.: The via- bility of crowdsourcing for survey research. Behav. Res. Methods 43(3), 800 (2011)
Paolotti, D., Carnahan, A., Colizza, V., Eames, K., Edmunds, J., Gomes, G., Koppeschaar, C., Rehn, M., Smallenburg, R., Turbe- lin, C., et al.: Web-based participatory surveillance of infectious diseases: the influenzanet participatory surveillance experience. Clin. Microbiol. Infect. 20(1), 17-21 (2014)
Dalton, C., Durrheim, D., Fejsa, J., Francis, L., Carlson, S., d'Espaignet, E.T., Tuyl, F., et al.: Flutracking: a weekly australian community online survey of influenza-like illness in 2006, 2007 and 2008. Commun. Dis. Intell. Quart. Rep. 33(3), 316 (2009)
Smolinski, M.S., Crawley, A.W., Baltrusaitis, K., Chunara, R., Olsen, J.M., Wójcik, O., Santillana, M., Nguyen, A., Brownstein, J.S.: Flu near you: crowdsourced symptom reporting spanning 2 influenza seasons. Am. J. Public Health 105(10), 2124-2130 (2015)
Hashemian, M., Knowles, D., Calver, J., Qian, W., Bullock, MC., Bell, S., Mandryk, R.L., Osgood, N., Stanley, K.G.: iepi: an end to end solution for collecting, conditioning and utilizing epi- demiologically relevant data. In: Proceedings of the 2nd ACM International Workshop on Pervasive Wireless Healthcare. pp. 3- 8. ACM (2012)
Madan, A., Cebrian, M., Moturu, S., Farrahi, K., et al.: Sensing the "health state" of a community. IEEE Pervasive Comput. 11(4), 36-45 (2011)
Martinucci, I., Natilli, M., Lorenzoni, V., Pappalardo, L., Mon- reale, A., Turchetti, G., Pedreschi, D., Marchi, S., Barale, R., de Bortoli, N.: Gastroesophageal reflux symptoms among ital- ian university students: epidemiology and dietary correlates using automatically recorded transactions. BMC Gastroenterol. 18(1), 116 (2018)
Green, T.C., Huang, R., Wen, Q., Zhou, D.: Crowdsourced employer reviews and stock returns. J. Financ. Econ. 2, 18 (2019)
Dabirian, A., Kietzmann, J., Diba, H.: A great place to work!? understanding crowdsourced employer branding. Bus. Horiz. 60(2), 197-205 (2017)
Könsgen, R., Schaarschmidt, M., Ivens, S., Munzel, A.: Finding meaning in contradiction on employee review sites-effects of dis- crepant online reviews on job application intentions. J. Interact. Mark. 43, 165-177 (2018)
Tingzon, I., Orden, A., Sy, S., Sekara, V., Weber, I., Fatehkia, M., Herranz, M.G., Kim, D.: Mapping Poverty in the Philippines Using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information (missing year)
OpenStreetMap Community Openstreetmap. https://www. openstreetmap.org/#map=5/42.088/12.564. Accessed Oct 2019 (2004)
Piaggesi, S., Gauvin, L., Tizzoni, M., Cattuto, C., Adler, N., Ver- hulst, S., Young, A., Price, R., Ferres, L., Panisson, A.: Predicting city poverty using satellite imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Work- shops, pp. 90-96 (2019)
Abelson, B., Varshney, K.R., Sun, J.: Targeting direct cash trans- fers to the extremely poor. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1563-1572. ACM (2014)
Hersman, E., Okolloh, O., Rotich, J., Kobia, D.: Ushahidi. https:// www.ushahidi.com. Accessed Oct 2019 (2008)
Meier, P.: Digital Humanitarians: How Big Data is Changing the Face of Humanitarian Response. Routledge, London (2015)
Grainger, A.: Citizen observatories and the new earth observation science. Remote Sens. 9(2), 153 (2017)
Schneider, P., Castell, N., Vogt, M., Lahoz W., Bartonova A.: Making sense of crowdsourced observations: data fusion tech- niques for real-time mapping of urban air quality. In: EGU General Assembly Conference Abstracts, p. 17 (2015)
Meier, F., Fenner, D., Grassmann, T., Jänicke, B., Otto, M., Scherer, D.: Challenges and benefits from crowd sourced atmo- spheric data for urban climate research using Berlin, Germany, as testbed. In: ICUC9-9th International Conference on Urban Cli- mate jointly with 12th Symposium on the Urban Environment (2015)
Chapman, L., Bell, C., Bell, S.: Can the crowdsourcing data paradigm take atmospheric science to a new level? a case study of the urban heat island of london quantified using netatmo weather stations. Int. J. Climatol. 37(9), 3597-3605 (2017)
Lea, S.G., D'Silva, E., Asok, A.: Women's strategies addressing sexual harassment and assault on public buses: an analysis of crowdsourced data. Crime Prev. Commun. Saf. 19(3-4), 227-239 (2017)
Gosselt, J.F., Van Hoof, J.J., Gent, B.S., Fox, J.P.: Violent frames: analyzing internet movie database reviewers' text descriptions of media violence and gender differences from 39 years of us action, thriller, crime, and adventure movies. Int. J. Commun. 9, 547-567 (2015)
Ozkan, T., Worrall, J.L., Zettler, H.: Validating media-driven and crowdsourced police shooting data: a research note. J. Crime Jus- tice 41(3), 334-345 (2018)
Avvenuti, M., Bellomo, S., Cresci, S., La Polla, M.N., Tesconi, M.: Hybrid crowdsensing: A novel paradigm to combine the strengths of opportunistic and participatory crowdsensing. In: Proceed- ings of the 26th International Conference on World Wide Web Companion, International World Wide Web Conferences Steer- ing Committee, pp. 1413-1421 (2017)
Dennis, J.: United by what divides us: 38 degrees and the eu referendum. In: EU Referendum Analysis 2016: Media, Voters and the Campaign. Bournemouth University, p. 100 (2016)
Yasseri, T., Bright, J.: Wikipedia traffic data and electoral predic- tion: towards theoretically informed models. EPJ Data Sci. 5(1), 22 (2016)
Gellers, J.C.: Crowdsourcing global governance: sustainable development goals, civil society, and the pursuit of democratic legitimacy. Int. Environ. Agreements Polit. Law Econ. 16(3), 415- 432 (2016)
Burger, R.: Aristotle's Dialogue with Socrates: On the "Nico- machean Ethics". University of Chicago Press, Chicago (2009)
Diener, E.: Subjective well-being. Psychol. Bull. 95(3), 542 (1984)
Veenhoven, R.: How do we assess how happy we are? tenets, implications and tenability of three theories. Happiness Econ. Polit. 25, 45-69 (2009)
Alesina, A., Di Tella, R., MacCulloch, R.: Inequality and hap- piness: are europeans and americans different? J. Public Econ. 88(9-10), 2009-2042 (2004)
Watson, D., Clark, L.A., Tellegen, A.: Development and valida- tion of brief measures of positive and negative affect: the PANAS scales. J. Pers. Soc. Psychol. 54(6), 1063 (1988)
Watson, D., Clark, L.A.: The Panas-x: Manual for the Positive and Negative Affect Schedule-Expanded Form. Psychology Pub- lications, New York (1999)
Diener, E., Oishi, S., Tay, L.: Advances in subjective well-being research. Nat. Hum. Behav. 2, 1 (2018)
Hudson, N.W., Anusic, I., Lucas, R.E., Donnellan, M.B.: Com- paring the reliability and validity of global self-report measures of subjective well-being with experiential day reconstruction mea- sures. Assessment 2, 26 (2017)
Anusic, I., Schimmack, U.: Stability and change of personality traits, self-esteem, and well-being: the meta-analytic stability and change model of retest correlations. J. Pers. Soc. Psychol. 110(5), 766 (2016)
Tay, L., Chan, D., Diener, E.: The metrics of societal happiness. Soc. Indic. Res. 117(2), 577-600 (2014)
Deaton, A.: Income, health, and well-being around the world: evidence from the gallup world poll. J. Econ. Perspect. 22(2), 53-72 (2008)
Easterlin, R.A., Angelescu, L.: Happiness and growth the world over: time series evidence on the happiness-income paradox. Technical report. Institute of Labor Economics (IZA) (2009)
Kahneman, D., Deaton, A.: High income improves evaluation of life but not emotional well-being. Proc. Nat. Acad. Sci. 107(38), 16489-16493 (2010)
Frijters, P., Beatton, T.: The mystery of the u-shaped relationship between happiness and age. J. Econ. Behav. Organ. 82(2-3), 525- 542 (2012)
Stevenson, B., Wolfers, J.: The paradox of declining female hap- piness. Am. Econ. J. Econ. Policy 1(2), 190-225 (2009)
Deaton, A., Stone, A.A.: Understanding context effects for a mea- sure of life evaluation: how responses matter. Oxf. Econ. Pap. 68(4), 861-870 (2016)
Yap, S.C., Wortman, J., Anusic, I., Baker, S.G., Scherer, L.D., Donnellan, M.B., Lucas, R.E.: The effect of mood on judgments of subjective well-being: nine tests of the judgment model. J. Pers. Soc. Psychol. 113(6), 939 (2017)
Lucas, R.E., Lawless, N.M.: Does life seem better on a sunny day? Examining the association between daily weather conditions and life satisfaction judgments. J. Pers. Soc. Psychol. 104(5), 872 (2013)
Kahneman, D., Diener, E., Schwarz, N.: Well-Being: Founda- tions of Hedonic Psychology. Russell Sage Foundation, New York (1999)
Kahneman, D., Krueger, A.B., Schkade, D.A., Schwarz, N., Stone, A.A.: A survey method for characterizing daily life experience: the day reconstruction method. Science 306(5702), 1776-1780 (2004)
Courvoisier, D.S., Eid, M., Lischetzke, T.: Compliance to a cell phone-based ecological momentary assessment study: the effect of time and personality characteristics. Psychol. Assess. 24(3), 713 (2012)
Shiffman, S., Stone, A.A., Hufford, M.R.: Ecological momentary assessment. Annu. Rev. Clin. Psychol. 4, 1-32 (2008)
Eid, M.E., Diener, E.E.: Handbook of Multimethod Measurement in Psychology. American Psychological Association, New York (2006)
Diener, E., Seligman, M.E.: Beyond money: toward an economy of well-being. Psychol. Sci. Public Interest 5(1), 1-31 (2004)
Costa, P.T., McCrae, R.R.: Influence of extraversion and neuroti- cism on subjective well-being: happy and unhappy people. J. Pers. Soc. Psychol. 38(4), 668 (1980)
Zweig, J.S.: Are women happier than men? Evidence from the Gallup World Poll. J. Happiness Stud. 16(2), 515-541 (2015)
Deaton, A.S., Tortora, R.: People in Sub-Saharan Africa rate their health and health care among the lowest in the world. Health Aff. 34(3), 519-527 (2015)
Veenhoven, R., Ehrhardt, J.: The cross-national pattern of hap- piness: test of predictions implied in three theories of happiness. Soc. Indic. Res. 34(1), 33-68 (1995)
Cuñado, J., de Gracia, F.P.: Does education affect happiness? Evi- dence for spain. Soc. Indic. Res. 108(1), 185-196 (2012)
Nikolaev, B.: Does higher education increase hedonic and eudai- monic happiness? J. Happiness Stud. 19(2), 483-504 (2018)
Rehdanz, K., Maddison, D.: Climate and happiness. Ecol. Econ. 52(1), 111-125 (2005)
Hudson, J.: Institutional trust and subjective well-being across the eu. Kyklos 59(1), 43-62 (2006)
Hayo, B. Happiness in Eastern Europe. Marburg Economic Work- ing Paper No 12 (2004)
Ferrer-i Carbonell, A., Gowdy, J.M.: Environmental degradation and happiness. Ecol. Econ. 60(3), 509-516 (2007)
Gardner, J., Oswald, A.J.: Money and mental wellbeing: a longi- tudinal study of medium-sized lottery wins. J. Health Econ. 26(1), 49-60 (2007)
Tay, L., Zyphur, M., Batz, C.: Income and Subjective Well-Being: Review, Synthesis, and Future Research. Handbook of Well- Being. DEF Publishers, Salt Lake City (2017)
Wijngaards, I., Hendriks, M., Burger, M.J.: Steering towards hap- piness: an experience sampling study on the determinants of happiness of truck drivers. Transp. Res. Part A Policy Pract. 128, 131-148 (2019)
van der Zwan, P., Hessels, J., Burger, M.: Happy free willies? Investigating the relationship between freelancing and subjective well-being. Small Bus. Econ. 8, 1-17 (2019)
Blanchflower, D.G., Bell, D.N., Montagnoli, A., Moro, M.: The happiness trade-off between unemployment and inflation. J. Money Credit Bank. 46(S2), 117-141 (2014)
Knabe, A., Schöb, R., Weimann, J.: Partnership, gender, and the well-being cost of unemployment. Soc. Indic. Res. 129(3), 1255- 1275 (2016)
Brulé, G., Veenhoven, R.: Why are Latin Europeans less happy? Polyphonic Anthropology-Theoretical and Empirical Cross- Cultural Fieldwork. The Impact of Hierarchy. InTech (2012)
Bartolini, S., Mikucka, M., Sarracino, F.: Money, trust and happi- ness in transition countries: evidence from time series. Soc. Indic. Res. 130(1), 87-106 (2017)
Ott, J.C.: Good governance and happiness in nations: technical quality precedes democracy and quality beats size. J. Happiness Stud. 11(3), 353-368 (2010)
Fowler, J.H., Christakis, N.A.: Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ 337, a2338 (2008)
Luhmann, M.: Using big data to study subjective well-being. Curr. Opin. Behav. Sci. 18, 28-33 (2017)
Nederhof, A.J.: Methods of coping with social desirability bias: a review. Eur. J. Soc. Psychol. 15(3), 263-280 (1985)
Quercia, D., Ellis, J., Capra, L., Crowcroft, J.: Tracking gross community happiness from tweets. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 965-968. ACM (2012)
Bollen, J., Gonçalves, B., van de Leemput, I., Ruan, G.: The hap- piness paradox: your friends are happier than you. EPJ Data Sci. 6(1), 4 (2017)
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proceedings of hlt/emnlp on Interactive Demonstrations. Association for Computational Lin- guistics, pp. 34-35 (2005)
Bollen, J., Gonçalves, B., Ruan, G., Mao, H.: Happiness is assor- tative in online social networks. Artif. Life 17(3), 237-251 (2011)
Kramer, A.D., Guillory, J.E., Hancock, J.T.: Experimental evi- dence of massive-scale emotional contagion through social net- works. In: Proceedings of the National Academy of Sciences, p. 201320040 (2014)
Lim, K.H., Lee, K.E., Kendal, D., Rashidi, L., Naghizade, E., Winter, S., Vasardani, M.: The grass is greener on the other side: Understanding the effects of green spaces on twitter user senti- ments. In: Companion of the The Web Conference 2018 on The Web Conference 2018, International World Wide Web Confer- ences Steering Committee, pp. 275-282 (2018)
Mitchell, L., Frank, M.R., Harris, K.D., Dodds, P.S., C.M.: The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PLoS ONE 8(5), e64417 (2013)
Golder, S.A., Macy, M.W.: Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333(6051), 1878-1881 (2011)
Lansdall-Welfare, T., Lampos, V., Cristianini, N.: Nowcasting the mood of the nation. Significance 9(4), 26-28 (2012)
Cresci, S., La Polla, M.N., Mazza, M., Tesconi, M., Del Vigna, F.: #selfie: mapping the phenomenon. Consiglio Nazioonale delle Ricerche IIT TR-08/2016 Technical Report (2016)
Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: twitter sentiment and socio-economic phenomena. ICWSM 11, 450-453 (2011)
Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Dan- forth, C.M.: Temporal patterns of happiness and information in a global social network: hedonometrics and twitter. PLoS ONE 6(12), e26752 (2011)
Iacus, S.M., Porro, G., Salini, S., Siletti, E.: Social networks, happiness and health: from sentiment analysis to a multidi- mensional indicator of subjective well-being. arXiv preprint arXiv:1512.01569 (2015)
Ceron, A., Curini, L., Iacus, S.M.: Social Media e Sentiment Anal- ysis: L'evoluzione dei fenomeni sociali attraverso la Rete, vol. 9. Springer, New York (2014)
Ceron, A., Curini, L., Iacus, S.M.: ISA: a fast, scalable and accu- rate algorithm for sentiment analysis of social media content. Inf. Sci. 367, 105-124 (2016)
Curini, L., Iacus, S., Canova, L.: Measuring idiosyncratic happi- ness through the analysis of twitter: an application to the italian case. Soc. Indic. Res. 121(2), 525-542 (2015)
Durahim, A.O., Coşkun, M.: # iamhappybecause: gross national happiness through twitter analysis and big data. Technol. Forecast. Soc. Change 99, 92-105 (2015)
Coviello, L., Sohn, Y., Kramer, A.D., Marlow, C., Franceschetti, M., Christakis, N.A., Fowler, J.H.: Detecting emotional contagion in massive social networks. PLoS ONE 9(3), e90315 (2014)
Algan, Y., Beasley, E., Guyot, F., Higa, K., Murtin, F., Senik, C., et al. Big Data Measures of Well-Being: Evidence from a Google Well-Being Index in the United States. OECD Statistics Working Papers 2016 (2016)
Lane, N.D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T., Campbell, A.T.: A survey of mobile phone sensing. IEEE Com- mun. Mag. 48(9), 140-150 (2010)
Staiano, J., Lepri, B., Aharony, N., Pianesi, F., Sebe, N., Pentland, A.: Friends don't lie: inferring personality traits from social net- work structure. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 321-330. ACM (2012)
Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced data management: overview and challenges. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1711-1716. ACM (2017)
Lathia, N., Sandstrom, G.M., Mascolo, C., Rentfrow, P.J.: Happier people live more active lives: using smartphones to link happiness and physical activity. PLoS ONE 12(1), e0160589 (2017)
Asai, A., Evensen, S., Golshan, B., Halevy, A., Li, V., Lopatenko, A., Stepanov, D., Suhara, Y., Tan, W.C., Xu, Y. Happydb: a cor- pus of 100,000 crowdsourced happy moments. arXiv preprint arXiv:1801.07746 (2018)
Bogomolov, A., Lepri, B., Pianesi, F.: Happiness recognition from mobile phone data. In: Social Computing (SocialCom), 2013 International Conference on Social Computing, pp. 790- 795. IEEE (2013)
Goldberg, L.R.: An alternative "description of personality": the big-five factor structure. J. Pers. Soc. Psychol. 59(6), 1216 (1990)
Carlquist, E., Nafstad, H.E., Blakar, R.M., Ulleberg, P., Delle Fave, A., Phelps, J.M.: Well-being vocabulary in media language: an analysis of changing word usage in Norwegian newspapers. J. Positive Psychol. 12(2), 99-109 (2017)
Seligman, M.E.: Flourish: A New Understanding of Happiness and Well-Being and How to Achieve Them. Nicholas Brealey, Boston (2011)
Greco, M., Stenner, P.: Happiness and the art of life: diagnosing the psychopolitics of wellbeing. Health Cult. Soc. 5(1), 1-19 (2013)
Coulton, C.J., Goerge, R., Putnam-Hornstein, E., de Haan, B.: Harnessing Big Data for Social Good: A Grand Challenge for Social Work, pp. 1-20. American Academy of Social Work and Social Welfare, Cleveland (2015)
Lepri, B., Staiano, J., Sangokoya, D., Letouzé, E., Oliver, N.: The tyranny of data? The bright and dark sides of data-driven decision- making for social good. In: Transparent Data Mining for Big and Small Data, pp. 3-24. Springer (2017)
Floridi, L., Taddeo, M.: What is data ethics? The Royal Society (2016)
Hand, D.J.: Aspects of data ethics in a changing world: where are we now? Big Data 6(3), 176-190 (2018)

About the author

Lorenzo Gabrielli

Papers

Followers

View all papers from Lorenzo Gabrielliarrow_forward

Measuring objective and subjective well-being: dimensions and data sources

Sign up for access to the world's latest research

Abstract

Related papers

References (336)

Related papers

Related topics

Cited by