International Journal of Data Science and Analytics
https://doi.org/10.1007/s41060-020-00224-2
REVIEW
Measuring objective and subjective well-being: dimensions and data
sources
Vasiliki Voukelatou1 · Lorenzo Gabrielli2 · Ioanna Miliou3 · Stefano Cresci4 · Rajesh Sharma5 · Maurizio Tesconi4 ·
Luca Pappalardo2
Received: 18 July 2019 / Accepted: 8 May 2020
© The Author(s) 2020
Abstract
Well-being is an important value for people’s lives, and it could be considered as an index of societal progress. Researchers have
suggested two main approaches for the overall measurement of well-being, the objective and the subjective well-being. Both
approaches, as well as their relevant dimensions, have been traditionally captured with surveys. During the last decades, new
data sources have been suggested as an alternative or complement to traditional data. This paper aims to present the theoretical
background of well-being, by distinguishing between objective and subjective approaches, their relevant dimensions, the new
data sources used for their measurement and relevant studies. We also intend to shed light on still barely unexplored dimensions
and data sources that could potentially contribute as a key for public policing and social development.
Keywords Well-being · Objective well-being · Subjective well-being · Well-being dimensions · New data sources · Data
science for social good · Artificial intelligence for social good
1 Introduction being in society, mainly because it is strongly linked with
the standard of living indicators [1]. However, GDP has been
Economists and policy-makers have traditionally considered criticized as a weak indicator of well-being and, therefore, a
gross domestic product (GDP) as a good indicator of well- misleading tool for public policies [2]. The Stiglitz Commis-
sion [3] in 2009 observed that other statistical tools should
B Luca Pappalardo be used, complementary to GDP, for the measurement of
[email protected];
[email protected] well-being. Therefore, considering that well-being is diffi-
Vasiliki Voukelatou cult to be captured only with GDP, researchers with various
[email protected] backgrounds, from economists to psychologists, suggested
Lorenzo Gabrielli two main approaches to measuring the overall well-being;
[email protected] objective well-being and subjective well-being.
Ioanna Miliou Defining objective well-being has always been considered
[email protected] a challenging task, and therefore researchers have focused on
Stefano Cresci exploring its dimensions rather than its definition [4,5]. It is
[email protected] due to its objective nature that one could claim that objective
Rajesh Sharma well-being could be measured in terms of GDP. However, it
[email protected] must reflect both people’s material living conditions and the
Maurizio Tesconi quality of their lives. In fact, the Organisation for Economic
[email protected] Co-operation and Development (OECD) [6], the United
Nations Development Programme (UNDP) [7] and the Italian
1 Scuola Normale Superiore and ISTI-CNR, Pisa, Italy Statistics Bureau (ISTAT) [8] have identified six major objec-
2 ISTI-CNR, Pisa, Italy tive and observable dimensions for its measurement: health,
3 University of Pisa, Pisa, Italy job opportunities, socioeconomic development, environment,
4 IIT-CNR, Pisa, Italy safety, and politics. All these dimensions together represent
5 the objective well-being, which is assessed through the extent
University of Tartu, Tartu, Estonia
123
International Journal of Data Science and Analytics
to which these “needs” are satisfied. The objective approach ficial Intelligence for Social Good” (AI4SG) [39], since it
investigates the objective dimensions of a good life, whereas could work as a reference point for adequate measurement
the subjective approach examines people’s subjective evalua- of well-being with the use of innovative data sources and
tions of their own lives. In 2013, the OECD [9] recognized the tools. In particular, at this critical moment that the global
importance of taking into consideration people’s perceived society is under financial and political crisis and instabil-
well-being, labeled as subjective well-being when investigat- ity, policy-makers need frequent updates of well-being. This
ing the overall well-being. Subjective well-being, also called could facilitate them to react on time on applying the right
happiness, has been defined by Veenhoven [10], as the degree policies to prevent detrimental societal effects and contribute
to which an individual assesses the overall quality of her effectively to societal progress.
life-as-a-whole favorably. This might as well be different as The remainder of this paper is organized as follows: It is
compared to GDP, which cannot be representative of societal divided into two main sections, as suggested from the liter-
happiness. Indeed, GDP explains only a small proportion of ature, i.e. objective and subjective well-being. In particular,
its variations on humans [11], and it might be different from Sect. 2 is dedicated to objective well-being and Sect. 3 is dedi-
people’s perceptions of their well-being [12]. Therefore, sub- cated to subjective well-being. For both sections, we provide
jective well-being has been traditionally captured through a theoretical background on objective and subjective well-
studies based on data collected by self-reports. These stud- being and their dimensions respectively. We then provide the
ies highlight five main dimensions of subjective well-being: data sources used for monitoring well-being. Besides, we
the role of human genes, which seem to be fairly heritable present essential studies on well-being; to present them in an
[13–21], universal needs, meaning basic and psychological organized flow, we categorize the presentation of the studies
needs [22–24], social environment, such as education and by matching each well-being dimension separately with each
health [25–29], economic environment, including a lot of data source. Finally, in Sect. 4, we provide a discussion on
research on income [30–34], and political environment, such the study, highlighting the opportunities for future research
as democracy and political freedom [35,36]. on well-being.
Traditionally, both objective and subjective well-being are
measured through surveys of household income and con-
sumption [37]. Although these surveys have been considered 2 Measuring objective well-being
accurate and valid, they bring some considerable disadvan-
tages. For example, they cannot provide constant updates of Suggesting a single definition of objective well-being is a
well-being to policy-makers, and they have high costs to be substantial challenge, mainly due to its multi-dimensionality.
conducted, making it difficult for many developing coun- Therefore, researchers have focused on carefully specify-
tries to estimate well-being frequently. The last few years ing its objectively measurable dimensions [4,5]. Objective
have witnessed a drastic change in the approaches used to well-being is traditionally captured through surveys, such as
measure well-being. Researchers of different disciplines pro- household income and consumption surveys [37]. However,
pose several innovative data sources and methods, which usually, such surveys are very costly and time-consuming
could potentially overcome the limitations of the traditional [40], making it difficult for many countries and global insti-
methods for the individual and collective well-being mea- tutes to update their estimates frequently. Therefore, the last
surement, both objective and subjective. few years have witnessed a change in the way of mea-
To support research in this direction, the European project suring objective well-being. In particular, researchers of
SoBigData [38] has created a virtual environment within a various disciplines propose several methodologies to mea-
research infrastructure that provides theoretical knowledge, sure individual and collective objective well-being, based on
data, and innovative methods to scholars that want to address a combination of new data sources and traditional surveys
challenging questions involving both objective and subjec- [41–44]. The United Nations also stimulate this change of
tive well-being. studying well-being in two recent reports, where the usage
Therefore, in line with the purposes mentioned above and of new, mostly big, data sources, is encouraged for the inves-
the support of SoBigData, the aim of this paper is to provide tigation of patterns of phenomena related to people’s health
the theoretical background on objective and subjective well- and well-being [45,46].
being, including their relevant dimensions. Additionally, the
article seeks to present to researchers the new data sources 2.1 The dimensions of objective well-being
used for capturing well-being, as well as discuss indicative
existing studies. During the last years, public institutions and non-governmen-
We believe that this study offers great value to the scien- tal companies have worked on identifying dimensions that
tific community and especially to researchers interested in are considered essential for the improvement of the societal
“Data Science for Social Good” (DS4SG) or similarly “Arti- well-being and its comparison between countries and years.
123
International Journal of Data Science and Analytics
Fig. 1 The figure relates the
sources of data (left) with the
dimensions of the objective
well-being (right)
For example, the Organisation for Economic Co-operation primary disability and mortality factors in OECD countries.
and Development (OECD) has identified 11 essential top- Fortunately, some indicators can help prevent the diseases
ics labeled as OECD well-being framework [6]; the United mentioned above. For example, the number of people who
Nations Development Programme (UNDP) has identified 17 are driving carefully, who are non-smokers or who do not
sustainable development goals, labeled as SDGs [7]; and the drink a large amount of alcohol, are risk-indicators, which, if
Italian Statistics Bureau (ISTAT) has created an ambitious taken into consideration, could contribute to an improvement
project named “Benessere Equo e Sostenibile” (BES) that in the health status of a territory.
stands for “Fair and Sustainable Well-being” [8]. From the
initiatives mentioned above, it is evident that for different 2.1.2 Job opportunities
institutions, well-being dimensions might be different, some-
times vague, and statistically hard to be captured. Therefore, This is a crucial dimension of well-being since it has obvi-
based on the aforementioned official authorities, we suggest ous economic and societal benefits, contributing to people’s
the following concrete and measurable dimensions of well- health and societal, political, and economic stability. The job
being (Fig. 1). opportunities dimension is composed of three main deter-
minants: employment rate, quality of work, and work–life
2.1.1 Health balance. The employment rate is a crucial aspect since indi-
viduals in countries with a high level of employment, are
Health status represents an essential factor for people’s well connected in society. In particular, it is a proxy used
well-being, as shown by the WHO Commission on Macroe- by policy-makers to avoid poverty and social exclusion.
conomics and Health in 2001 at global level [47], and by the The second determinant is the quality of work, in terms of
Lisbon Strategy for Growth and Jobs in 2000 [48]. Health objective working stability, retribution, skills, and safety at
brings together many other benefits, from job opportuni- work, which might show some differences between different
ties to social relationships, from reduced health care costs working environments. Moreover, work-life balance is the
to an increased life expectancy. Indeed, there have been determinant that mainly aims to capture the balance between
remarkable gains in life expectancy over the past 50 years work and life. In the OECD countries, a full-time worker
in OECD countries [49], due to the health care spending devotes 62% of the day on average (15 hours) on personal
growth, lifestyle, educational, and environmental changes. care (e.g., eating, sleeping) and leisure (e.g., socializing with
Chronic (non-communicable) diseases, such as cancer, dia- friends and family, hobbies) [50]. This determinant is mainly
betes, and chronic respiratory conditions, are nowadays the created to capture women’s work-life balance. Indeed, the
123
International Journal of Data Science and Analytics
quality of a country’s employment is measured by the bal- 2.1.5 Safety
ance women have between family care and paid work.
It includes the risk of people being physically assaulted,
2.1.3 Socioeconomic development falling victims, and suffering from other crimes, such as
economic loss, physical damage, and psychological post-
While socioeconomic indicators alone do not suffice to rep- traumas stress. Reducing violent crime, sex trafficking,
resent societal well-being, it cannot be doubted that they forced labor, and child abuse are clear global goals, as sug-
positively influence it. The variables that contribute to its gested by the United Nations [7]. Besides, the Italian BES
measurement are income, wealth, consumption expenditure, project [8] suggests that safety is characterized by two deter-
housing conditions, and possession of consumer durables, minants: criminality and violence.
and it can implicitly influence access to university, health Criminality is one of the most common security threats in
care, and more. In particular, the Organization for Economic developed and emerging countries, and it has both a direct
Co-operation and Development (OECD) [6] and the Italian and indirect impact on people. It directly influences individ-
Statistics Bureau (ISTAT) [8] suggest two main determinants uals’ health (physical and mental) and economic situation.
that constitute the overall economic well-being: available According to the latest OECD data, the average homicide
income and wealth, and consumption expenditure. rate in the OECD is 3.6 murders per 100,000 inhabitants
In a market economy, income measures the purchasing [53]. Indirectly, criminality has an impact on non-victims’
capacity of individuals, and it is thus an essential predictor well-being when being on victims’ social network or by news
of economic well-being. Wealth, on the other hand, takes spread on (social) media.
into account savings, monetary gold, stocks, securities, and Another determinant is violence suffered inside and out-
loans [51]. Therefore, wealth could be considered an essential side the family and it has both a direct and indirect impact on
source of revenue, which could make people less vulnerable people. In particular, victims suffer from the direct effects,
to difficult economic situations that might affect their life. which can last for long periods, if not for the whole life,
Additionally, consumption expenditure is a direct estimate depending on individuals’ ability to manage their daily life,
of the goods and services that contribute to determining the medical expenses, dependence on others, and capacity to
living conditions of individuals. Unlike income, consumption achieve happiness. Indirectly, it causes insecurity and anxi-
expenditure can contribute to making interpersonal compar- ety, which brings difficulties in their daily activities [54].
isons, since it captures whether each individual can acquire
her desired goods and services. 2.1.6 Politics
2.1.4 Environment This dimension is also essential for objective well-being.
Today, due to the economic crisis, more than ever, citizens
A healthy natural environment is essential for all individuals’ demand greater transparency from their Governments and
well-being in society. Clean water, clear air, and uncontam- the Public Institutions. Fair civic and political participation,
inated food are examples of goods that can only be possible as well as transparency, do not only contribute directly to
in an environmental context where humans’ productive and well-being but also indirectly since they allow greater effi-
social activities are made with respect to the environment and ciency of public policies, a lower cost of transactions, and
its natural resources. For the reasons mentioned above and the minimization of the risk of fraud. Therefore, two deter-
due to the recent environmental crisis, the United Nations minants fall under this category, which are associated with
set sustainable environmental goals [7], such as Clean Water the Public Sphere as a driver of the individuals’ well-being,
and Sanitation, Climate Action, and more. Similarly, ISTAT on either local or national level: civic and political engage-
[8] suggests five determinants for describing the interactions ment, and trust and social cohesion. Voter turnout is the best
between society and the environment that are connected. existing means of measuring civic and political engagement,
These determinants are quality of the water, quality of the and is measured as the percentage of the registered population
air, quality of the soil and the land, biodiversity, and matter, that voted during elections. According to OECD data, voter
energy, and climate change. Finally, the “OECD Environ- turnout, is averaged 69% in OECD countries, which shows
mental Outlook to 2050” projects the number of premature that not everyone exercises the voting right [55]. Regarding
deaths associated with exposure to PM10 and PM2.5 to trust and social cohesion, OECD suggests public engagement
increase from just over 1 million worldwide in 2000 to about (e.g., stakeholder engagement) for developing regulations
3.5 million in 2050 [52]. Therefore, the more these deter- [55]. If citizens have the possibility to participate in the devel-
minants are taken into consideration by policy-makers and opment of laws and regulations, it is more likely that they will
by citizens’ activities, the more the citizens can contribute to trust the government institutions and they will comply with
radical changes for the protection of societal well-being. the societal rules.
123
International Journal of Data Science and Analytics
Table 1 Pros and cons for each data source used for the measurement Table 2 Example of Call Detail Records (CDRs). Every time a user
of objective well-being makes a call, a record is created with timestamp, the phone tower serving
the call, the caller identifier and the callee identifier (a). For each tower,
Data source Pros Cons the latitude and longitude coordinates are available to map the tower on
the territory (b)
CDRs Temporal and social Not publicly available,
dimensions, world sparsity, (a) Timestamp Tower Caller Callee
wide diffusion, geographically
repeatability imprecise 2007/09/10 23:34 36 4F80460 4F80331
GPS and Coverage of rural Privacy issues, indoor 2007/10/10 01:12 36 2B01359 9H80125
transporta- areas, unbiased and spatial inaccuracy 2007/10/10 01:43 38 2B19935 6W1199
tion classified, real-time
monitoring .. .. .. ..
. . . .
Social Media Measuring social Privacy issues,
dynamics, publicly overrepresentation, (b) Tower Latitude Longitude
available social desirability bias
36 49.54 3.64
Health and Cost effective, Not publicly available,
Fitness applicable for not necessarily 37 48.28 1.258
multiple studies, representative of the 38 48.22 -1.52
prediction of population, limited .. .. ..
near-term risk of time slots . . .
events
News Variety of subject Gatekeeping bias,
domains, range of coverage bias,
targets, 24/h updated, statement bias For example, B3 indicates the link between GPS data (B) and
archived historical socioeconomic development (3).
news
Retail Modeling of dynamic Dependency on
Scanners household behavior, retailer’s permission, 2.2.1 CDRs
control time-invariant legal constraints
characteristics, long Many works in the literature are based on the analysis of
term coverage,
quality improvement
mobile phone data, the so-called CDRs (Call Detail Records)
of HICP of calling and texting activity of users, because they guarantee
Web Search Publicly available, Population size varies the repeatability of experiments in different countries and
speed, convenience, across domains, hard on different scales given the worldwide diffusion of mobile
flexibility, ease of identifying relevant phones [56].
analysis queries
CDRs collect geographical, temporal, and interaction
Crowdsourcing Large number of data, Risk of low-quality information on mobile phone use [57–62], hence providing a
speed, relative low results, trade-off
cost between quality and comprehensive picture of human behavior at a societal scale.
cost Each time an individual makes a call, the mobile phone oper-
ator registers the connection between the caller and the callee,
the duration of the call, and the coordinates of the phone tower
communicating with the served phone. Table 2 illustrates an
2.2 Data sources for monitoring the dimensions of example of the structure of CDRs.
objective well-being Note that CDRs suffer from different types of bias [63,64].
For example, the position of a user is known at the granularity
Figure 1 describes the new data sources (left) that have been level of phone towers, and only when they make a phone
used to estimate one or more dimensions of objective well- call. Moreover, phone calls are sparse in time, i.e., the time
being (right). The presence of a link in Fig. 1 between a between consecutive calls follows a heavy tail distribution
data source and a dimension indicates that there are papers [65,66]. In other words, since users are inactive most of their
in the literature on monitoring that dimension with that data time, CDRs allow reconstructing only a subset of a user’s
source. In this section, we describe, for each data source, its behavior.
features (e.g., the process of data collection, its biases and CDRs are used to monitor several dimensions of well-
limitations) and the main works in the literature that use it to being, notably health (A1), job opportunities (A2), socioe-
measure several dimensions of objective well-being. Table 1 conomic development (A3), environment (A4), and safety
provides a summary of the data sources used, highlighting (A5).
the pros and cons of each one. We refer to a link between a CDRs provide one of today’s most exciting opportunities
data source and a dimension using a letter-number notation. to study human mobility and its influence on disease dynam-
123
International Journal of Data Science and Analytics
ics (A1). Many researchers use mobile phone data for public in the UK, regional communication diversity is positively
health, as the analysis of individual and population mobility associated with a socioeconomic ranking [75]. Other works
patterns is more objective and with finer spatiotemporal res- address the issue of mapping poverty [76] and other socioeco-
olution in comparison to traditional methods. Furthermore, nomic determinants [77] with mobile phone communication
mobile network data can also provide insights into human data, combined with airtime credit purchases data in the
behavior that can support the assessment and monitoring of Ivory Coast [78]. Blumenstock et al. [79,80] show prelim-
the health of specific communities at risk, thus paving the inary evidence of a relationship between individual wealth
way toward improved health promotion and prevention [67]. and the history of mobile phone transactions. Frias-Martinez
Taking into consideration that the spatiotemporal evolution et al. [81–84] analyze the relationship between human mobil-
of human mobility and the related fluctuations of population ity and the socioeconomic status of urban zones, presenting
density are essential drivers of disease outbreaks, Finger et al. which mobility indicators correlate best with socioeconomic
[68] use CDRs to track the cholera outbreak in 2005 in Sene- levels and building a model to predict the socioeconomic
gal. Findings show that a mass gathering taking place during level from mobile phone traces. Pappalardo et al. [85] analyze
the initial phase of the outbreak has an essential impact on the mobile phone data and extract meaningful mobility measures
course of the disease. Besides, Kafsi et al. [69] contribute to for cities, discovering an interesting correlation between
the fight against epidemics of infectious diseases using CDRs human mobility aspects and socioeconomic determinants.
provided by France Telecom-Orange. They use 2.5 billion Lotero et al. [86] analyze the architecture of urban mobility
calls made by 5 million users in the Ivory Coast, recorded networks in two Latin-American cities from the multiplex
over 5 months, from December 2011 to April 2012, to study perspective. They discover that the socioeconomic character-
and model behavioral patterns of the affected population and istics of the population have an extraordinary impact on the
propose several strategies for personalized behavioral rec- layer organization of these multiplex systems. In a successive
ommendations to reduce the infections. Lima et al. [70] use work, Lotero et al. [86] analyze urban mobility in Colombia
the same data set to build a model that describes how diseases representing cities by mobility networks. They encode the
circulate the country as people move between regions, and origin-destination trips performed by a subset of the popula-
they enhance the model with a concurrent process of relevant tion corresponding to a particular socioeconomic status and
information spreading. This process corresponds to people they show that spatial and temporal patterns vary across these
disseminating disease prevention information, e.g., hygiene socioeconomic groups. Amini et al. [87] use mobile phone
practices, vaccination campaign notices, and others, within data to compare the human mobility patterns of a developing
their social network. Finally, Madan et al. [71] use CDRs and country (the Ivory Coast) and a developed country (Portu-
mobile phone-based co-location sensing to measure charac- gal). They show that cultural diversity in developing regions
teristic behavior changes in symptomatic individuals. These can present challenges to mobility models defined in less cul-
behavior changes are reflected in their total communica- turally diverse regions. Smith-Clarke et al. [88] analyze the
tion, interactions with respect to time of day, diversity, and aggregated mobile phone data of two developing countries
entropy of face-to-face interactions and movement. Using and extract features that are strongly correlated with poverty
these extracted mobile features, they manage to predict the indexes derived from official statistics census data.
health status of an individual, without having actual health Moreover, researchers use CDRs to monitor the quality
measurements from the subject. of the environment and its impact on people’s lives (A4).
Besides, researchers use CDRs to study job opportuni- For example, Picornell et al. [89] evaluate the population
ties (A2). Pappalardo et al. [72] use CDRs to study the exposure to NO2 on a research published recently. They use
link between human mobility and the employment rate of CDRs from one of the three most important Spanish mobile
French cities, finding a strong correlation between measures phone network operators (MNOs), with around 30% mar-
of mobility entropy and the unemployment rate in urban envi- ket share. The analysis is conducted for the capital of Spain,
ronments. Toole et al. [73] show that changes in the calling Madrid, for the 17th of November 2014, as a typical day
behaviors of individuals, aggregated at regional level, can in terms of population mobility and NO2 levels. Compar-
improve forecasts of macro unemployment rates. Sunds et al. ing the results with traditional census-based methods, they
[74], use CDRs to create a model which predicts unemploy- demonstrate relevant discrepancies at disaggregated levels
ment with a 70.4% of accuracy. They also provide promising and underline the importance of integrating CDRs data for
support to the collection of data for populations in develop- the evaluation of population exposure to NO2 . Lu et al. [90]
ing countries, which are often under-represented in official study people’s behavior affected by climate stress. In partic-
surveys. ular, by exploring the individuals’ behavioral response to the
Most of researchers use CDRs to investigate socioeco- Cyclone Mahasen, which struck Bangladesh in May 2013,
nomic development (A3). A seminal work analyzes landline they find out that anomalous patterns of mobility and call-
calls and a nationwide mobile phone data set to show that, ing frequency correlate with rainfall intensity, showing the
123
International Journal of Data Science and Analytics
affected regions and when the storm moves. Lu and Bengts- Table 3 Example of GPS records
son [91,92] analyze the movement of 1.9 million mobile Vid Timestamp Latitude Longitude
phone users before and after the 2010 Haiti earthquake, and
they show that CDRs can be a valid data source for estimates 63 2014-06-18 06:31:24 43.557703 10.337913
of population movements during disasters. Wilson et al. [93] 63 2014-06-18 06:31:26 43.557725 10.33794
build a tool within nine days of the Nepal earthquake of 2015, 63 2014-06-18 06:31:27 43.557735 10.337955
to provide spatiotemporally detailed estimates of population .. .. .. ..
. . . .
displacements from CDRs based on movements of 12 million
mobile phones users. Nyarku et al. [94] use CDRs to explore The collected GPS data consist of the sequence of space-time detec-
whether mobile phones could be reliably used to monitor tions of vehicles on which the positioning device is installed. Every
time a vehicle switches on, a record is created consisting of the vehicle
individual exposure to selected air pollutants when moving identifier, timestamp, the latitude and longitude coordinates
between indoor and outdoor microenvironments. In particu-
lar, data are collected from two BROAD life mobile phones,
which are equipped with sensors for direct measurements of
air pollutants. The two phones bring similar results, both for GPS data can also cover rural areas, as opposed to other
particles and formaldehyde, making them potentially suit- data, mostly collected among citizens of urban areas [104].
able for applications in polluted environments, even if there Comparing to the traditional ways of measuring mobility,
seem to be some exceptions where the readings of the two usually by self-reports assessed with questionnaires, GPS
phones do not correspond well to each other. Liu et al. [95] does not bring any biases and misclassification, [104,105],
map personal trajectories using mobiles in an urban envi- as it eliminates the social desirability usually brought by self-
ronment to assess the impact of traffic-related air pollution report participants [106,107]. Another advantage of GPS data
in society. They estimate traffic pollution exposure to indi- is that they provide real-time monitoring. However, while
viduals based on the exposure along the individual human there are studies based on GPS data covering hundreds of
trajectories in the estimated pollution concentration fields by thousands of individuals [108] most of the GPS studies are
utilizing modeling tools and manage to identify trajectory conducted with fewer than 300 participants [104,109], usu-
patterns of particularly exposed human groups. In addition, ally due to privacy issues. Apart from this drawback, when
Decuyper et al. [96] use CDRs to study food security indi- a GPS is used indoors, the spatial accuracy of the measure-
cators finding a strong correlation between the consumption ments is fairly detected [110], which creates problems in
of vegetables rich in vitamins and airtime purchase. specific fields, such as on epidemiology research.
Other studies focus on the safety dimension (A5). Bogo- GPS data are used to explore several dimensions of
molov et al. [97] use CDRs for 3 weeks from the 9th to the objective well-being, notably health (B1), socioeconomic
15th of December 2012 , and from the 23rd December 2012 development (B3), and safety (B5).
to the 5th of January 2013, in combination with demographic Health (B1) exploration has also attracted the interest of
data from December 2012 to January 2013, to predict crime in researchers. For example, Saelens et al. [111] track the move-
the city of London. Experimental results show 70% of accu- ments of an individual through GPS devices and bring to the
racy in predicting whether an area could be a crime hotspot surface growing evidence that transit users are more phys-
or not. Similarly, Ferrara et al. [98] study criminal networks ically active than non-transit users, which could potentially
to detect and characterize criminal organizations in networks lead to the health improvement of the first ones. Similarly,
reconstructed from the CDRs. They also introduce an expert Rundle et al. [112] explore health in terms of physical activ-
system to support law enforcement agencies in unveiling the ity, and conclude that neighborhood walkability influences
underlying structure of criminal networks. other residents’ choice of space utility and is also associated
with higher weekly physical activity. Additionally, Sadler et
2.2.2 GPS and transportation data al. [113] use GPS data to understand children’s exposure to
junk food in Canada and compare the results to a validated
Since the 1990s, Global Positioning Systems (GPS) have food environment database. They demonstrate that official
been used for tracking the movements of the individuals results underestimate exposure to junk food up to 68%, which
[99–102]. In particular, GPS data provide time and location should be taken into consideration by policy-makers. Finally,
coordinates information, which can be used to link locations Canzian and Musolesi [114] analyze mobility patterns from
with environments and to calculate the speed of movements GPS traces to answer whether mobile phones can be used
[103]. For insurance reasons, some vehicles have a black box to monitor individuals affected by depressive mood disor-
installed. The device records the position of the vehicle at reg- ders. They develop a smartphone application that periodically
ular intervals and sends it to the database. Table 3 illustrates collects the locations of the users and the answers to daily
an example of the structure of GPS records. questionnaires that quantify their depressive mood. They find
123
International Journal of Data Science and Analytics
Table 4 The table contains a
Id Coordinates Hashtags Mentions Text Profile info …
subset of the information
returned by a Twitter API 240556 null #ny #dinner [10214;452879] ….. {…..} …
4261063 NY null null ….. {…..} …
72096 42.10;10.2 #wellbeing [964215] ….. {…..} …
If the user activates a localization system, the tweet also contains information on the position (longitude,
latitude or city) from which the tweet is sent. Each tweet contains the information of the user profile and
mentions or hashtags used in the text
a significant correlation between mobility trace characteris- interactions with other users, or tags inserted in the tweet.
tics and the depressive moods of individuals. Twitter also returns some information about the user pro-
Some of these works using GPS data focus on exploring file. Table 4 illustrates an example of the structure of Twitter
socioeconomic development (B3). Marchetti et al. [115] per- records.
form a study at regional level, analyzing GPS tracks from cars Despite their indubitable usefulness, social media data
in Tuscany to extract measures of human mobility at province may also encounter some concerns [121]. First of all, they
and municipality level. They find that there is a strong cor- may reflect social desirability biases, since individuals man-
relation between the mobility measures and a poverty index age their online profiles [122]. Besides, social media users
independently surveyed by the Italian official statistics insti- may not be as representative of the general population as tra-
tute. Smith et al. [116] use an automated fare collection data ditional anonymized self-reports conducted through a chosen
set of journeys made on the London rail system to build a representative sample [123].
classifier that identifies areas of the city with high economic All dimensions of objective well-being are monitored
deprivation. They highlight that, given its high precision, the through social media data, i.e., health (C1), job opportunities
classifier provides potential benefits for city planning and (C2), socioeconomic development (C3), environment (C4),
policy-making. Lathia et al. [117] use the same data set to find safety (C5) and politics (C6).
that more deprived areas tend to receive passenger flow from Several studies provide valuable insights into how the
a higher number of other areas compared to less deprived analysis of social media data can lead to next-generation
areas, also uncovering some evidence of social segregation. automated methodologies for public health (C1). As an
Another objective well-being dimension that is explored example, Eichstaedt et al. [123] use Twitter data, in com-
with GPS data is safety (B5). Robinson et al. [118] com- bination with atherosclerotic heart disease (AHD) mortality
pare the spatial distribution of objective crime incidents and rates and country-level socioeconomic variables. They pre-
self-reported physical activity among adolescents in Mas- dict country-level heart disease mortality since the language
sachusetts, between 2011 and 2012, and show that there is a expressed on Twitter reveals important psychological char-
positive association between them (r = 0.72, p < 0.0001). acteristics that are significantly associated with heart disease
Ariel et al. [119] use GPS data to replicate findings pub- mortality risk. Besides, De Choudhury et al. [124] use Twit-
lished from US official research on the effect of hot spots ter data in combination with traditional depression screening
policing for the prevention of crime in England and Wales test data for the detection and diagnose of the individuals’
and demonstrate that victim-generated crimes (the primary major depressive disorders and even to predict the likelihood
outcome measured in previous studies) increase in both the of depression of individuals. Signorini et al. [125] use data
near vicinity and in catchment areas. from Twitter to track rapidly-evolving public sentiment con-
cerning H1N1 and to measure actual disease activity. They
show that Twitter can be used as a measure of public interest
2.2.3 Social media data or concern about health-related events and that estimates of
influenza-like illness derived from Twitter chatter accurately
Social media, such as Twitter, Facebook, and Instagram, can track reported disease levels. Paul et al. [126] incorporate
be considered as a digital database of information about in their forecasting models the historical influenza data and
online users, hence rendering individuals’ online activi- Twitter data. Lampos et al. [127] measure the prevalence
ties accessible for analysis. Given this enormous potential, of flu-like symptoms in the general UK population, based
researchers, governments, and corporations are turning their on the contents of Twitter, searching for symptom-related
interest on social media to understand human behavior and statements, turning this information into a flu-score and they
interactions better [120]. Among all social media, Twitter obtain on average a statistically significant linear correlation
is the most popular, since it provides public access to data which is higher than 95%. In a later work, the authors [128]
through APIs with the least restrictive policy. The Twitter instead of choosing the keywords and phrases themselves,
APIs return information about locations, date of the event,
123
International Journal of Data Science and Analytics
they use machine learning algorithms to find out which words or nowcasting the damage produced by earthquakes by ana-
in the database of tweets occurred more often at times of ele- lyzing social media communications in the aftermath of the
vated levels of flu, and they obtained very positive results. event. The results of these models can also be displayed in
They claim that flu epidemics can be detected based on Twit- real-time, interactive maps that highlight stricken areas and
ter content. Chen and Yang [129] use individuals’ tweets with provide support to emergency responders. Notable examples
spatiotemporally tagged information to demonstrate that peo- of this kind are the systems developed by Avvenuti et al.
ple’s healthy diet is elicited by exposure to their immediate [141,142]. Preis et al. [143] find that the number of pho-
food environment. tos taken and subsequently uploaded to Flickr with titles,
Regarding the monitoring of job opportunities (C2), descriptions, or tags related to Hurricane Sandy bears a strik-
Llorente et al. [130] quantify the extent to which deviations in ing correlation to the atmospheric pressure in the US state
diurnal rhythm, mobility patterns, and communication styles New Jersey. They claim that appropriate leverage of such
across regions relate to unemployment. For this purpose, they information could be useful to policy-makers and emergency
examine country-wide Twitter data describing 19 million crisis managers.
geo-located messages and find that the regions exhibiting Safety is another dimension that can be monitored using
more diverse mobility fluxes, earlier diurnal rhythms, and data from social media (C5). For example, Chen et al. [144]
more correct grammatical styles display lower unemploy- use Twitter data and create a model that predicts the specific
ment rates. Antenucci et al. [131] use data from Twitter, time and location a crime occurs. This model combines ker-
from July 2011 to early November 2013, to create indexes nel density estimation based on historical crime incidents and
of job loss, job search, and job posting. They derive signals prediction via linear modeling with sentiment and weather
by counting job-related phrases in tweets such as “lost my predictors. By adding the latter determinants, they show that
job”. They construct social media indexes from the principal their model improves significantly with respect to existing
components of these signals and manage to track events that models. Similarly, Boni et al. [145] use spatio-temporally
affect the job market in real-time, such as Hurricane Sandy tagged tweets and create a model for crime prediction. In
and the federal government shutdown. particular, they combine real crime data with individuals’
A large number of works in the literature focus on mon- micro-level movement patterns extracted from Twitter and
itoring socioeconomic development from social media data demonstrate improved predictions. Likewise, Kadar et al.
(C3). Bollen et al. [132], in a further study, analyze data [146] describe urban crime by using Foursquare and consid-
from Twitter and consider the emotions of traders, rather ering these data as a measurement for the ambient population
than their information gathering processes, suggesting that of a neighborhood, to further describe crime levels. They
changes in the calmness of Twitter messages could be linked also confirm that such models improve the traditional mod-
to changes in stock market prices. Still, regarding socioeco- els, based on census data. Additionally, the city of Chicago
nomic development, social media data are also extensively applies text analytics on Twitter and 311 (the local emer-
used to nowcast and forecast stock market prices and traded gency number) records to detect and prevent phenomena like
volumes. Seminal works in this field leverage information rat infestations and to track civil unrest and violent crimes
contained within investment discussion boards and blogs. (CrimeScan and CityScan software) [147–149].
For example, Bar-Haim et al. [133] use StockTwits data to Finally, the politics dimension (C6) is extensively stud-
uncover relevant correlations between Web-derived indica- ied, in particular, during the last years with the rise of the
tors and the stock market. In detail, they leverage sentiment political crisis across the world. Colleoni et al. [150] inves-
scores of messages shared in the Yahoo message boards to tigate the political homophily on Twitter to classify users as
find correlations with the stock market. In a different web Democrats or as Republicans based on their tweets. They
platform study, De Choudhury et al. [134] try to find corre- show that, in general, the former exhibit higher levels of
lations between the stock market and blog communications. political homophily than the latter. Goh et al. [151] use Face-
Last, Cresci et al. [135,136] assess the risks and vulnerability book pages of a group of 12 politicians and demonstrate
of stock markets to automation, manipulation, and disin- that political engagement can be achieved by creating social
formation, with the ultimate goal of safeguarding people’s media consumption habits, as supported by the habit forma-
investments. tion in consumption from macroeconomics. Similarly to the
Researchers also use social media for the exploration of field of socioeconomic and financial analyses, social media
the environment dimension (C4). Avvenuti et al. [137] claim data can be easily manipulated also for achieving political
that the analysis of social media proves valuable for quickly goals [152,153]. As such, results of political analyses based
acquiring situational awareness and estimates of the impact on social media should be carefully weighed to minimize
of disasters. As an example of the predictive power of social issues related to biases and manipulations.
media, Kryvasheyeu et al. [138], Avvenuti et al. [139] and
Mendoza et al. [140] demonstrate the viability of predicting
123
International Journal of Data Science and Analytics
Table 5 The table shows an example of clinical records, including the levels based on Continuous Glucose Monitoring (CGM). In
pathology for which a patient is admitted to the hospital, the duration particular, they use data from the DirecNet Central Labo-
of hospitalization and the medicines she/he took
ratory, containing time series for 25 patients, who are less
In date Out date Pathology Medicines than 18 years old. By training a deep learning model on
a data set designed to explore the performance of CGM
01/02/2019 01/02/2019 Asthma m1,m2,m3
devices in children with Type I diabetes, they demonstrate
03/02/2019 08/03/2019 Head trauma m5
how deep neural networks can outperform shallow networks
on this task. In addition, Santillana et al. [160] use a clin-
ician’s database, named as UpToDate, to predict influenza
2.2.4 Health and fitness data epidemics in the United States promptly. They show that
digital disease surveillance tools based on experts’ databases
These data mainly consist of Electronic Health Records may be able to provide an alternative, reliable, and stable sig-
(EHRs) and mobile application data that are mainly used nal for accurate predictions of influenza outbreaks. Besides
for monitoring the health dimension (D1). EHRs, initially EHRs, mobile app data, such as lifestyle habits data concern-
created for the facilitation of the billing and patient care, are ing eating and physical activity behaviors, are used for the
widely used for clinical studies and clinical risk prediction. monitoring of objective well-being in terms of health (D1).
Table 5 reports an example of clinical records concerning the These data demonstrate for once more that smartphones can
hospitalization of some patients. contribute to research with valuable new insights, although
Out of a systematic review, Goldstein et al. [154] demon- they might apply biases towards people with lower socioe-
strate both opportunities and challenges of EHRs. On the conomic status or towards people who are more interested
one hand, compared to the traditionally used cohort data in their health. In addition, such data collected through web
developed and collected for research purposes (such as the surveys for research purposes might bring the disadvantages
Framingham Heart Study [155]), EHRs are cost-effective. discussed before. A critical study using mobile app data is
In contrast with cohort data, EHRs can indeed be used for conducted by Althoff et al.[161]. They use a data set consisted
multiple health studies and, since they are collected at a high of physical activity for 717,527 Apple iPhone smartphone
frequency, they allow a better prediction of near-term risk of users of the Azumio Argus app, which tracks users’ diet
events. On the other hand, EHRs include only individuals that and fitness and other healthy behaviors, between July 2013
have been ill or at least have had a clinic visit, which could and December 2014. They demonstrate inequality in how the
generate a problem of representativeness. Moreover, they are activity is distributed within countries and that this inequal-
not publicly available and might include limited time slots. ity is a better predictor of obesity than average activity level.
Researchers use EHRs to monitor several aspects of per- Similarly, Hayeri [162] uses continuous glucose monitors
sonal health (D1). For example, Sultana et al. [156] use the (CGM) and fitness wearables (Fitbit) to predict blood glucose
Integrated Primary Care Information (IPCI) database to look values. The study uses data gathered from each participant
for elements that could contribute to traditional methodolo- for 60 days, where the data from the first 30 days are used to
gies. For example, multimorbidity and polypharmacy are train the algorithm and the remaining 30 days to test the pre-
elements that could help in identifying frailty methodologies. dictions. On average, the software is able to predict a user’s
They demonstrate that the Mini-Mental State Examination future glucose values with a 93% accuracy rate for 60-mins
score, which is the most commonly recorded data item, could ahead of time.
be potentially used as a frailty identifier. Ghaderighahfarokhi
et al. [157] use medical records of newborns in the educa- 2.2.5 News
tional Hospitals affiliated to the Ilam University of Medical
Sciences (from April 2015 to April 2016) to identify accurate News data sources, such as the GDELT database [163],
predictors of Low Birth Weight (LBW). They demonstrate contain information extracted from the news of newspapers
that LBW is a multi-factorial condition requiring a system- around the world. News records generally describe a variety
atic and accurate program to be reduced, such as education of subject domains (e.g., economic events, political events),
through mass media, repeated monitoring of pregnancy, and represent a wide range of targets (e.g., opposing politicians)
others. Metzger et al. [158] use EHRs with Emergency [164] and are continuously updated, containing even archived
Department patient visits in 2012, from Lyon University Hos- historical news of the last decades. Nevertheless, such data
pital, to demonstrate that machine learning can contribute to contain three main biases [165]: the gatekeeping bias, i.e.,
more accurate estimations of suicide attempts in France, in the editors or the journalists decide on which event to pub-
relation to the current national surveillance system based on lish; the coverage bias, related to the coverage of an event
manual coding by emergency practitioners. Mhaskar et al. (e.g., western countries are over-covered, whereas African
[159] investigate the 30 minutes prediction of blood glucose countries are under-covered); the statement bias, when the
123
International Journal of Data Science and Analytics
Table 6 Subset of the main fields provided by GDELT platform
EventCode EventCategory EventTone Date Country code Url
815176338 Arrest, detain − 70 20180110 US http://tiny.cc/s5s16y
815176339 Use conventional military force − 30 20180110 UK …
815176340 Consider policy option + 25 20180110 IT …
content written by the journalist, even if tried to be objective, US news broadcasts (e.g., ABC World News Tonight) for
is favorable or unfavorable towards certain events. Table 6 the period between 1995 and 2004. He demonstrates that
shows an example of news records. 70% of the US television news provide balanced coverage
News records are used to measure health (E1), socioe- on anthropogenic contributions to climate change compared
conomic development (E3), environment (E4), and politics to natural radiative forcing. He also shows that there is a
(E6) dimensions of objective well-being. significant difference between this television coverage and
Emerging infectious diseases and the rise of modern tech- scientific consensus on the topic.
nology have generated new demands and possibilities for News records are also used to understand the coverage
disease surveillance and response (E1). Growing numbers of political issues (E6). Van Aelst and De Swert [175] use
of outbreak reports must be assessed rapidly so that control daily news of politics of campaign periods, extracted from
efforts can be initiated. For example, the World Health Orga- the Electronical News Archive over the 2003 to 2006 period,
nization (WHO) sets up a process for timely disease outbreak and show that campaign periods have a high impact on the
verification to convert large amounts of data from some 600 amount, style and actors of the political news in Belgium.
sources, including all major news wires, newspapers, and To the best of our knowledge, the dimension politics (E6)
biomedical journals, into accurate information for suitable has not yet been adequately explored through news data and
action [166,167]. Brownstein et al. [168] in a similar effort, constitutes inspiration for future research.
create HealthMap, a freely accessible, automated real-time
system that monitors, organizes, integrates, filters, visual-
izes, and disseminates online information about emerging 2.2.6 Scanner data
diseases. Wilson et al. [169] use the HealthMap project to
monitor listeriosis. Chunara et al. [170] use social and news Scanner data are generated by point-of-sales terminals in
media to validly estimate the 2010 Haitian cholera outbreak. shops and provide information at the level of the single prod-
News records on financial affairs and financial markets uct. Sales terminals record each transaction, and the resultant
are intrinsically interlinked (E3). Alanyali et al. [171] quan- data can provide considerable insights into consumer pur-
tify the relation between movements in financial news and chasing patterns. They can be obtained from a wide variety
movements in financial markets by exploiting a corpus of six of retailers: supermarkets, pharmacies, do-it-yourself stores,
years of financial news from 2007 to 2012 from the Finan- home electronics or clothing shops, and many others [176].
cial Times. Their results suggest that greater interest in a Scanner data are used from social researchers, as they can
company in the news is related to greater interest in the cor- offer useful detailed information and the possibility to model
responding company in stock markets. Lillo et al. [172] show the dynamic behavior of households, as well as to control
that the flux of news of the previous day affects the trading for unobservable time-invariant characteristics [177]. Also,
activity of companies, households, and foreign investors and scanner data provide information over long periods of time
the dynamics of volatility. than only one day or a couple of weeks. This happens because
News can also help capturing the environmental dimen- the final data used are produced from customers that purchase
sion of well-being (E4). As an example, Kleinschmit et al. several items on each store visit, for several store visits, over
[173] investigate 394 articles on forest and climate change a period of time [178,179]. It is also worth mentioning that
published in the Swedish newspaper Dagens Nyheter from scanner data can contribute to the improvement of the quality
1992 to 2009. They show that there has been an increas- of the Harmonized Index of Consumer Prices (HICP) [180].
ing discussion on forests in a changing climate over the last However, using scanner data is challenging since researchers
18 years from both scientists and politicians. The increased are dependent on the retailer’s permission, and they should
number of these news events correlate with real environ- also overcome the legal constraints in order to obtain them
mental events happening internationally. Similarly, Boykoff [179]. Table 7 shows an example of supermarket records.
[174] uses data extracted from the Vanderbilt University Tele- Scanner data are used to measure health (F1), socioeco-
vision News Archive, consisting of television news from nomic development (F3), and environment (F4) dimensions
of objective well-being.
123
International Journal of Data Science and Analytics
Table 7 Subset of the main fields provided by a supermarket database for the purchases in different shops
Id Customer Timestamp Place Receipt Items
2018020156287 109745368 2018-02-01 17:30:14 Pisa, Italy 2018020101567 [bread, milk, eggs, tissues]
2018020578256 104827423 2018-02-05 10:14:57 Torino, Italy 2018020500234 …
2018020743624 012753862 2018-02-07 19:57:00 Florence, Italy 2018020721987 …
To begin with, researchers use scanner data to monitor other hand, low-ranked, low purchase volume customers tend
several aspects of public health (F1). For example, phar- to buy only high-ranked products, very popular products that
maceutical sales may be used to predict changes in clinical everyone buys. In addition, Sobolevsky et al. [189] use a
conditions with a useful time lead. Magruder et al. [181] find a complete set of bank card transactions in 2011 in Spain and
90% correlation between flu-related drug sales and physician demonstrate that there is a clear correlation between individ-
diagnoses of acute respiratory conditions, at several subre- ual spending behavior and official socioeconomic indexes
gions of the National Capital Area. They show that these sales denoting the quality of life.
occur approximately three days before the physician-patient Finally, researchers use scanner data to monitor the impact
encounter. Scanner data are also used to study the nutrients of humans on the environment (F4). Panzone et al. [190] use
and saturated fat of several food categories and their implica- scanner data from the largest UK food retailer for the creation
tions on personal health. For example, Griffith et al. [177] use of an Environmentally Sensitive Shopper (ESS) index mea-
supermarket scanner data from the UK to study the nutrients suring the environmental sustainability of food consumption
in foods. They show that there is a lot of variation in nutri- at household level. In addition, Gadema et al. [191] use data
ents at individual product level, even with food categories from UK supermarket shoppers to examine whether carbon
such as butter, which are very narrow. Bonnet et al. [182] use footprinting and labeling food products are tools that could
data from French supermarkets to explore consumer behav- facilitate consumers to make greener purchasing decisions.
ior with respect to the consumption of saturated fat, while They claim that this could be a sensible way to potentially
Griffith et al.[183] model the potential impact of a tax on sat- achieve a low carbon future. Food waste is a significant
urated fats. Finally, Janssen et al. [184] use scanner data from problem in modern society and carries considerable social,
the Nielsen Consumer Panel data set that covers the years economic, and environmental costs. For example, Brancoli
from 2004 to 2017. They aim to identify households with a et al. [192] use scanner data to analyze the impacts of food
pregnant household member and also to estimate the effect waste at a supermarket in Sweden. They discover the impor-
during and after pregnancy on alcohol purchases and rela- tance of not only measuring food waste in terms of mass but
tive expenditure on fruit and vegetables. Results show that also in terms of environmental impacts and economic costs.
during and after pregnancy, households reduce their alcohol They also show that meat and bread waste contribute the
purchases by 22–27%. In contrast, the relative expenditure most to the environmental footprint of the supermarket. Last,
on fruit and vegetables does not increase during pregnancy Scholz et al. [193] analyze food waste data of six Swedish
but decreases post-pregnancy by 19%. supermarkets from 2010 to 2012 in terms of mass and car-
The majority of studies with scanner data focus on explor- bon footprint. They calculate the wastage carbon footprint
ing the socioeconomic development (F3). Van der et al. [186] for fresh products such as meat, deli, cheese, dairy, and fruits
introduce a new method for computing the Dutch Consumer and vegetables.
Price Index (CPI) based on supermarket scanner data. In the
meanwhile, in 2017, Eurostat issued a practical guide for pro-
cessing supermarket scanner data to calculate the CPIs of EU 2.2.7 Web search queries
countries in order to ensure the comparability of the values
across Europe, as well as to modernize the official statistics Web search queries data report the frequency of specific terms
[179]. Silver et al. [187] outline the potential use of scanner over time, entered into a web search engine from users to
data from retailers for the measurement of inflation. They use satisfy their information needs. Data are represented as time
monthly scanner data for television sets in 1998 in the UK series of the frequency, and therefore we do not provide an
to study the two primary forms of bias in CPIs. Moreover, example of search queries records in this paper.
Pennacchioli et al. [188] study the retail activity of the cus- Comparing to other data sources that require customized
tomer subset of an Italian supermarket chain. They discover and often complicated collection strategies, search data can
that highly ranked customers, with more sophisticated needs, be collected for many domains simultaneously. They can
tend to buy niche products, i.e., low-ranked products. On the also be easily analyzed across several countries or regions
in real-time. Search data are often helpful in making fore-
123
International Journal of Data Science and Analytics
casts. However, their utility for predicting real-world events Searches for “major depression” and “divorce”, for exam-
is based on convenience, speed, and flexibility and has less ple, account for at most, 30.2% of the variance in suicide
to do with their superiority over other data sources. Goel et data. McCarthy [209] uses annually-averaged Google search
al. [194] provide a useful survey in this area and describe activity for “suicide” from the same period, from 2004 to
some of the limitations of this data source. First, for different 2009 to study suicide rate data in the United States. The study
domains, the size of the relevant population varies consid- shows that searches for most medical, familial, and socioe-
erably, along with difficulty in identifying relevant queries. conomic terms precede suicide deaths, and most searches for
Additionally, in specific domains, searching may be more psychiatric-related terms coincide with suicide data. In a later
closely tied to the measured outcomes than in others. work, Kristoufek et al. [210], use Google data from 2004 to
Web search queries data are used to measure health (G1), 2013 in combination with suicide occurrences data to esti-
job opportunities (G2), socioeconomic development (G3), mate the number of suicide occurrences in England. Finally,
safety (G5), and politics (G6) dimensions of objective well- Adler et al. [211] combine official statistics on demographic
being. information with data generated through search queries from
Public health is a dimension of well-being that is explored Bing, between November 2016 and February 2017, to gain
through web search queries (G1). In order to improve early insight into suicide rates per state in India. In this way, their
detection, researchers monitor health-seeking behavior in the search data work as a proxy for unmeasured (hidden) factors
form of web search queries, which are submitted by millions corresponding to suicide rates.
of users around the world every day. For example, Cooper The first to explore the job opportunities dimension (G2),
et al. [195] study Yahoo! search activity related to cancer are Ettredge et al. [212] as they find that counts of the top 300
in the USA. They find out that the Yahoo! search activity search terms during from 2001 to 2003 are correlated with US
associated with cancer correlates with the estimated can- Bureau of Labor Statistics unemployment figures. Later on,
cer incidence and estimated cancer mortality. Polgreen et Askitas et al. [213], D’Amuri et al. [214], Suhoy et al. [215]
al. [196] show that search volume for handpicked influenza- confirm the value of search data in forecasting unemployment
related queries is correlated with the reported number of cases in the US, Germany, and Israel. Baker et al. [216] use Google
over the period from 2004 to 2008. Hulth et al. [197] find search data to examine how job search responds to extensions
similar results in a study of search queries submitted on a of unemployment payments. Finally, McLaren et al. [217]
Swedish medical Web site. Yuan et al.[198] monitor influenza summarise how online search data can be used for economic
epidemics in China with search queries from Baidu. Addi- nowcasting by central banks. They show that the volume
tionally, an automated procedure for identifying informative of online searches can be used as indicators of economic
queries is described by Ginsberg et al. [199]. Based on that, activity, more specifically for unemployment and housing
Google Flu Trends [200] was introduced by Google in 2008 markets in the United Kingdom.
to provide real-time estimates of flu incidence for more than Researchers use search queries to monitor socioeco-
25 countries and to help predict outbreaks of flu. Nsoesie et nomic development (G3) as well. Choi and Varian [218,219]
al. [201] present a framework for near real-time forecast of consider Google Trends as a source of data on real-time eco-
influenza epidemics using web-based estimates of influenza nomic activity, and they show that by using its query indices
activity from Google Flu Trends for 2004–2005, 2007–2008, accurate predictions can, for example, be made for retail,
and 2012–2013 flu seasons. Yang et al. [202] use Google Flu automotive, etc., and could be helpful for short-term eco-
Trends and historical data to infer the evolving epidemio- nomic prediction or nowcasting. Koop and Onorante [220]
logical features of influenza and its impacts among the large use Dynamic Model Selection (DMS) methods, which allow
population during 2003–2013, including the 2009 pandemic. for model switching between time-varying parameter regres-
Wilson et al. [203] use data from Google Flu Trends to study sion models. They extend the DMS methodology by allowing
the spread of the pandemic H1N1 influenza in New Zealand Google variables to determine the nowcasting model to be
during 2009. Furthermore, Chan and Althouse [204,205] used at each point in time. Guzman [221] examines Google
use Google queries to monitor Dengue epidemics, Dukic et data as a predictor of inflation. Additionally, Preis et al.
al. [206] to predict hospitalizations for methicillin-resistant [222] provide evidence that search engine query data and
Staphylococcus aureus infections and Ocampo et al. [207] US stock market fluctuations are correlated. In a later [223]
for malaria surveillance. Moreover, Yang et al. [208] eval- work, they analyze changes in Google query volumes for
uate the association between suicide and Google searches search terms related to finance, and they find patterns that
trends for 37 suicide-related terms representing major known may be interpreted as “early warning signs” of stock market
risks of suicide in Taipei City, Taiwan, from 2004 to 2009. moves. Furthermore, Curme et al. [224] present a method that
Their results show that a set of suicide-related search terms, allows identifying topics for which levels of online interest
the trends of which either temporally coincided or preceded change before large movements of the Standard & Poor’s 500
trends of suicide data, are associated with suicide death. index (S&P 500). They find that search volumes from Google
123
International Journal of Data Science and Analytics
Table 8 Example of the
User Age Gender Date Highest temperature Symptoms
information provided by users
of influenzanet 784590 35 M 2017-12-03 38.0◦ [cough, sore throat]
275173 28 F 2018-01-05 36.6◦ [no symptoms]
428415 64 M 2018-04-13 38.2◦ [tired, runny nose]
related to politics and business can be linked to subsequent establish the validity for this data for a critical topic in state
stock market moves. This demonstration of a connection politics research.
between stock market transaction volume and search volume
is also replicated using Yahoo! data, where Bordino et al.
[225] show that query volumes precede in many cases peaks 2.2.8 Crowdsourced data
of trading by one day or more. Finally, Moat et al. [226] show
that data on views of Wikipedia pages can also be related to Kleemann and Rieder [231], in 2008, have defined crowd-
market movements, providing evidence that increases in the sourcing as the “the intentional mobilization for commercial
number of views of financially related pages on Wikipedia exploitation of creative ideas and other forms of work
can be detected before stock market falls. performed by consumers”. In other words, crowdsourcing
Search data are also used for the exploration of safety involves obtaining work, information, or opinions from a
(G5). Qi et al. [227] show that a simple low-level indicator large group of people who submit their data via the Inter-
of civil unrest can be obtained from online data at an aggre- net, smartphone apps, etc. Naturally, crowdsourcing brings
gate level through Google Trends or similar tools. The study several advantages. Crowdsourcing can provide researchers
covers countries across Latin America from 2011 to 2014 in with a huge amount of data, which can be accessed quickly
which diverse civil unrest events took place. In each case, and at a relatively low cost. Besides, comparing to traditional
they find that the combination of the volume and momen- research (such as studies using traditional surveys), the use
tum of searches from Google Trends surrounding pairs of of crowdsourcing can provide researchers with data from
simple keywords, tailored for the specific cultural setting, samples that are more diverse [232]. However, crowdsourc-
provide useful indicators of periods of civil unrest. Qi et al. ing yields various challenges, as well. Firstly, crowdsourcing
[228] study online search activity from Google Trends sur- may bring relatively low-quality results, e.g., a participant of
rounding the topics of social unrest over several countries in a crowdsourced study may intentionally give wrong answers.
Latin America from 2011 to 2014. They find that the vol- Secondly, mobile platforms pose new challenges for crowd-
ume and momentum of searches surrounding mass protest sourced data management. Table 8 shows an example of
language, can detect—and may even pre-empt—the macro- crowdsourced data.
scopic on-street activity. They also find that the most crucial Crowdsourced data are used to capture all dimensions of
search keywords differ subtlety from country to country, even objective well-being, i.e., health (H1), job opportunities (H2),
though the language may be the same. They explain this by socioeconomic development (H3), environment (H4), safety
the fact that civil unrest is a time-varying coordinated inter- (H5) and politics (H6) dimensions of well-being.
action between individuals, groups, or populations within a To improve early detection, researchers started monitoring
given cultural and socioeconomic setting. the health of individuals (H1) through crowdsourced self-
Finally, the politics dimension is explored with search data reporting mobile apps, such as Influenzanet (Europe) [233],
(G6). Chykina et al. [229] study how Google Trends can be Flutracking (Australia) [234], and Flu Near You (United
used to examine issue salience for hard-to-survey mass popu- States) [235]. Hashemian et al. [236] introduce iEpi, an
lations in the US, from 2010 to 2017. They apply this method end-to-end system for epidemiologists and public health
to immigrant concerns over deportation. They show that anx- workers to collect, visualize, and analyze contextual micro-
ieties over removal increase in response to (potential) policy data through smartphones. Additionally, Madan et al. [237]
changes, such as immigration policies that are considered use data from a smartphone application provided to univer-
in the wake of Donald Trump’s election. Reilly et al. [230] sity students to study their health state. Participants fill out
use Google search activity for ballot measures’ names and self-report surveys related to their health habits, diet, exer-
topics in a state one week before the 2008 Presidential elec- cise, weight changes, daily symptoms related to common
tion, and they find that they correlate with actual participation colds, fever, influenza, and mental health. The researchers
on those ballot measures. Their result demonstrates that the find that phone-based features can be used to predict changes
more Internet searches there are for a ballot measure, the less in health, such as common colds, influenza, and stress. For
likely voters are to roll-off (not answering the question) and longer-term health outcomes such as obesity, they find that
weight changes of participants are correlated with exposure
123
International Journal of Data Science and Analytics
to peers who gain weight in the same period. Finally, Mar- Crowdsourcing is also used to capture the environmen-
tinucci et al. [238] study Gastroesophageal Reflux Disease tal dimension of well-being (H4). There are plenty of
(GERD) symptoms among Italian university students from examples of crowdsourcing platforms for emergency man-
a data set collected from a web-app. The app allows users a agement, such as Ushahidi [246], where volunteers provide
self–diagnosis for the gastrointestinal disturbances through updated environmental information in the aftermath of mass
a simple questionnaire and data about the students’ food emergencies. These platforms are shown to contribute sig-
consumption at the university canteen. They show that 792 nificantly to organizing a prompt emergency response [247].
students reported typical GERD symptoms to occur at least Another category of crowdsourced platforms is the so-called
weekly. Among all users, females, smokers, and high in BMI citizens’ observatories [248], a community-based network
students tend to show increased GERD values. of environmental monitoring and information systems. On
Researchers use crowdsourced data to explore the job these platforms volunteers monitor and provide data about
opportunities dimension (H2) and the direct socioeconomic a plethora of environmental dimensions, such as comprising
benefits associated with it. For example, Green et al. [239] water availability and water quality, air pollution, land use,
use the crowdsourced employer review website named Glass- and flood risk management [249]. As an example, Schneider
door, an online crowdsourced employer branding platform, to et al. [250] combine crowdsourced data from the EU-funded
explore employees’ satisfaction and work–life balance. This CITI-SENSE project, which measures the air-quality with
exploration is preliminary for the direct economic benefit data obtained from statistical or deterministic air quality
and most important finding of the study; companies expe- models. Their goal is to present a novel data fusion-based
riencing improvements in employer ratings are significantly technique for combining real-time crowdsourced observa-
associated with future stock returns, comparing to compa- tions with model output that maps the urban air quality in
nies with declines in employer rating. Similarly, Dabirian et detail. This could help users find the least polluted routes
al. [240] analyze reviews of the highest and lowest-ranked or control their exposure to pollution while moving around
employers on Glassdoor. Using IBM Watson to analyze the the city. Besides, Meier et al. [251] use crowdsourced atmo-
data, they show how employers could use crowdsourced spheric data from Netatmo weather stations in the city of
employer branding intelligence to turn into a workplace that Berlin, as well as available metadata to explore the urban
attracts highly qualified employees. Furthermore, Könsgen et atmosphere. Results show a distinctive urban heat island
al. [241] analyze employee reviews data, listed on the Ger- pattern in Berlin during the night and are also validated, con-
man employee review site named Kununu.de, combined with firming that crowdsourced atmospheric data can contribute
2×2×2 between-subjects experimental design. Results show to advancement in climate research. Similarly, Chapman et
that such studies can complement the research on the online al. [252] use Netatmo weather station crowdsourced data to
reputation by underlying the relevance of discrepant reviews quantify the urban heat island in the city of London over the
for job candidates’ application intentions. summer of 2015. Their results are similar to previous studies
Crowdsourced data are also used to estimate the socioe- with official data and are therefore validated.
conomic (H3) well-being. For example, Tingzon et al. Crowdsourced data are considered an important data
[242] show the feasibility to map poverty by combing source for studying safety (H5). Suzanne Goodney et al.
crowdsourced geospatial information with nighttime lights, [253] map violence against women with the use of a crowd-
daytime satellite imagery, and human settlement data. In par- sourced app named as Safecity.in, which includes anonymous
ticular, they use the popular geospatial data crowd-sourcing reporting of violence against women. The goal of the study
platform named OpenStreetMap [243] to map poverty in is to highlight the importance of crowd mapping violence, as
the Philippines. Similarly, Piaggesi et al. [244] use Open- it can make women aware of potentially dangerous locales,
StreetMap [243] crowdsourced data merged with official encourage violence reporting, and provide advice on practi-
data at a city scale. They demonstrate the possibility of cal solutions for navigating street harassment and assault in
estimating the socioeconomic conditions of different neigh- public buses. Furthermore, Gosselt et al. [254] use the Inter-
borhoods of five different cities in North and South America. net Movie Database (www.imdb.com) to study the violent
In order to increase the efficiency of direct money transfers behavior and victimization of male and female film charac-
to impoverished villages in Kenya and Uganda, Abelson et ters over time in the United States. In particular, using IMDb
al. [245] develop and deploy a crowdsourcing interface to synopsis texts, they analyze reviewers’ movie descriptions.
obtain labeled satellite imagery training data. They train and They demonstrate that both perpetrators and victims are
deploy a predictive model for detecting impoverished vil- mainly male, as well as that violence becomes less severe and
lages. Their estimations are leveraged to build a fine-scale more often non-deadly over the years. Researchers under-
heat map of poverty that is used to recommend donations to line the future potentiality of using such data sources to
the most impoverished villages. explore matching results with actual crime figures. Addi-
tionally, Ozkan et al. [255] use crowdsourced police-involved
123
International Journal of Data Science and Analytics
killings data from FatalEncounters.org, as well as media data, larly, psychologist Diener [261] defines happiness as people’s
to control whether police killings is counted and reported cor- affective and cognitive evaluations of life. Veenhoven [262]
rectly in the aforementioned unofficial data, as compared to shows that people use two sources of information to evaluate
official data in the city of Dallas. Results mostly show con- their appreciation with life-as-a-whole: affects and thoughts.
sistency between all data sources. In conjunction with social The first source of information captures people’s feelings,
media and crowdsourcing data sources, as well as environ- emotions, and moods, the so-called hedonic level of affect
mental and safety dimensions, Avvenuti et al. [256] collect (or simply called emotional component). In particular, he
targeted and detailed information from people involved in underlines that to avoid neglecting crucial information about
natural disasters through crowdsourcing surveys via social precedent and subsequent events, researchers should separate
media. These data are used to monitor unfolding disasters bet- between positive and negative affects. On the other hand, the
ter and to monitor their consequences (i.e., damage caused) second source of information is the contentment component
Last, crowdsourced data are also used to study the pol- (or simply called structural component), concerning people’s
itics dimension (H6) of objective well-being. For instance, thoughts and capturing whether people’s life expectations
crowdsourced data have been used within NGOs to set strate- have been fulfilled, according to their cultural or societal
gic priorities and involvement in the referendum activities standards, and lead them to evaluate their life satisfaction.
based on participants’ responses to a survey [257]. Yasseri These two components, the hedonic level of affect and the
and Bright [258] use Wikipedia traffic data for electoral contentment component, determine the overall happiness.
prediction. In particular, they get insights about changes in This concept of happiness, compared to the traditional
overall turnout at elections and changes in vote share for macroeconomic measurements, such as GDP, inflation and
certain parties. Furthermore, Gellers [259] explores whether national income (see, e.g., Alesina et al. [263]) can capture
crowdsourcing can overcome the democratic deficit in global the variations of people’s perceived well-being [11,12]. It is
environmental governance. He uses data from the United also worth mentioning the controversy surrounding the rela-
Nations MY World survey, a multi-year (2012–2015) global tionship between national income and national happiness,
poll designed to identify post-2015 development priorities, identified by Easterlin [30]. According to the Easterlin para-
as well as e-discussions data, organized by the UNDG and dox, temporary changes in income both within and between
the thematic consultation on environmental sustainability nations directly affect happiness, but over time happiness
ran from November 2012 to July 2013. Results suggest that does not trend upward as income continues to grow.
although crowdsourcing may present an attractive technolog- Considering its subjective nature, researchers frequently
ical approach to enhance participation in global governance, measure happiness by self-report rating scales. Nevertheless,
ultimately, the representativeness of this participation and the the most widely used are global reports, using the single-
legitimacy of the policy results depend on the way the contri- item scale, such as the Positive And Negative Affect Scale
butions are sought and filtered by international organizations. (PANAS) [264,265]. Self-report measures are reliable since
they provide accuracy and temporal stability, they are valid
for community surveys and cross-cultural comparisons, and
3 Measuring subjective well-being they can capture happiness as life-as-a-whole, as well as
domain satisfactions [266–269]. Examples of self-reported
“Subjective well-being”, the scientific term of happiness, is a surveys are the Gallup World Poll (e.g., study by Deaton
central value in people’s lives, and reflections for its definition [270]) and the World Values survey (e.g., study by Easter-
have arisen ever since antiquity. Aristotle has expressed his lin et al. [271]), which capture the worldwide happiness; the
interest on the topic claiming that human well-being, labeled Gallup-Healthways Well-being index (e.g., study by Kahne-
as eudaimonia (εvδαιµoνία: Eu=Good, Daimon=spirit), is man and Deaton [272]), the British Household Panel Survey
an activity of the soul expressing complete virtue [260]. Dur- (e.g., study by Frijters et al. [273]) and the Eurobarometer
ing the last decades, researchers have focused on identifying (e.g., study by Stevenson et al. [274]), which capture the
the critical dimensions and the relevant determinants that happiness at local level. Although self-report surveys are
can positively or negatively affect human well-being, hence widely used for the measurement of happiness, some factors
providing a perspective different from the philosophical def- might influence the results. For example, the type of ques-
inition that Aristotle has been contemplating about. Since tions asked before the happiness questions, as well as the
humans are conscious beings, they can subjectively evalu- individuals’ mood at the time of the well-being rating, might
ate their appreciation of life, labeled “subjective well-being” disturb the results. Deaton and Stone [275] demonstrate a
or happiness. In particular, happiness can be defined as sat- high item-order effect because of political questions coming
isfaction with life in general, or as sociologist Veenhoven before happiness questions. Also, substantial current-mood
(1984) suggests, as the degree to which an individual judges effects on happiness judgments are generated because of
the overall quality of her life-as-a-whole favorably. Simi- weather conditions, since they affect people’s thoughts, feel-
123
International Journal of Data Science and Analytics
Table 9 Pros and cons for each traditional data source and new data source used for the measurement of subjective well-being
Data source Pros Cons
Surveys - traditional data Accurate, temporal stability, valid for community Item-order effect bias, current-mood effects,
source surveys and cross-cultural comparisons, valid for neglected temporal resolution
capturing happiness as-a-whole and satisfaction
domains
Ecological Momentary Measurement of the affective component, reduced Disturbance of normal activities
Assessment (EMA) - tradi- retrospective biases, measurement of moment-to-
tional moment variation of emotions
data source
Day Reconstruction Method Measurement of the affective component, time- Neglected moment-to-moment variation of emo-
(DRM)- traditional budget information, reduced respondent burden tions
data source
Social Media (Twitter, Continuously updated user-generated content, Social desirability biases, non-population repre-
etc)-new data source elimination of social desirability effect, few barri- sentative
ers in data extraction (Twitter)
Google Trends-new Timeliness, observation of people’s behavior Interpretability of the value of the series, compa-
data source rability of time series of different terms on a given
day
Crowdsourcing-new Measurement of daily behavior and activity Use of self-reports, paid participation of users
data source
News-new data source Variety of data (e.g., text data), variety of subject Gatekeeping bias, coverage bias, statement bias
domains, range of targets, archived historical news
ings, and behavior [276,277]. Finally, because global reports 3.1 The dimensions of subjective well-being
are abstract general ratings of happiness over a long period,
they neglect temporal resolution. Over the years, researchers studied subjective well-being and
Diversely, researchers use Ecological Momentary Assess- have identified the dimensions and the relevant determinants
ment (EMA) and Day Reconstruction Method (DRM) that that can positively or negatively affect human well-being.
are momentary diary self-report measures of happiness. They Some studies rely on small data sets (e.g., review by Diener
are designed to capture the affective components of happiness and Seligman[283]) reflecting the psychologists’ interest,
and reduce recall biases and heuristics [269]. In particular, such as personality, and some others use larger data sets,
EMA is a longitudinal research methodology that asks par- such as panel data (e.g., review by Dolan et al. [29]) reflect-
ticipants to report their feelings, thoughts, and emotions at ing the economists’ interest. These studies, conducted with
the moment or right after each of their activities, avoid- the use of traditional data sources, and in particular with
ing retrospective biases and maximizing the accuracy of surveys, have shed more light on identifying in detail the
the assessments [278]. Similarly, DRM asks participants to determinants of happiness, which we divide into five main
reconstruct their daily life activities systemically and their dimensions explained below:
experiences of the preceding days. It does not capture the
moment-to-moment variation of emotions, as EMA does, but
3.1.1 Human genes
it avoids disturbing normal activities, requires less respon-
dent burden [279] and captures time-budget information
Evidence shows that one of the most important predictors
more efficiently [280]. Shiffman et al. [281] show that global
of happiness is human genes, which is fairly heritable, with
reports of happiness are more predictive of future behaviors
30% to 50% range, since there is a variation on the results
than momentary methodologies. Therefore, taking into con-
across studies [13–20]. Therefore, on average, about 40% of
sideration the pros and cons discussed above, researchers
the variance of individual differences in happiness scores
suggest a multi-method assessment, combing both global
is accounted for by genes. Personality, which falls under
and momentary methods, to reach valid and accurate results
our genetic makeup, can distinguish between happy and
[269,282].
unhappy personalities. For example, extraverted individu-
The first rows of Table 9 provide a summary of the tra-
als are happier to anxious and worried ones [284]. People
ditional data sources, as well as their pros and their cons,
higher in self-esteem are less likely to suffer from depres-
as discussed previously. The remaining rows are explained
sion [29]. In addition, studies undertaken with data across
later.
different countries and periods of time, find influences of the
123
International Journal of Data Science and Analytics
following results: age has a U-shaped effect on happiness, psychological health to be more strongly correlated to hap-
with the highest level of happiness on the youngest and the piness, than physical health (e.g., review by Dolan et al.
oldest age and the lowest level of happiness on the middle [29]). Climate is another determinant, which appears to have
age, between 32 and 50 years [29]; women are either hap- effects on happiness. Rehdanz and Maddison’s [290] study
pier than men, or there is no significant difference between gives a reasonable indication that extreme weather is dam-
them in almost all 73 countries investigated [285]. However, aging to happiness. Moreover, living in an urban or rural
these results should be carefully interpreted. For example, area seems to influence happiness. In particular, living in big
Deaton and Tortora [286] show that the U-shaped relation- cities negatively affects happiness, whereas living in rural
ship between happiness and age in West countries turns into areas positively affects it (e.g., Hudson and Kyklos [291]
a linear relationship in sub-Saharan countries, where there is for Europe; Hayo [292] for Eastern Europe). On the con-
unavailability of social services for older people. trary, Rehdanz and Maddison [290] show different results on
urbanization and ruralization. They demonstrate that popu-
3.1.2 Universal needs lation density does not affect happiness. Another important
determinant is exercising. Naturally, Ferrer-i-Carbonell and
According to the evolutionary theory [22] and human’s inher- Gowdy [293] show that people who exercise tend to have
ent growth tendencies [23], basic and psychological needs higher levels of happiness.
play an important role on happiness and are considered to
be universal. In fact, Tay et al. [24], in a research conducted
across 123 countries, show that life evaluation is associated
with having basic and psychological needs, such as food and
shelter, met (r = 0.31)1 ; positive affects are associated with 3.1.4 Economic environment
the fulfillment of social needs (r = 0.29)1 and the respect
gained from other people (r = 0.36)1 ; negative affects are Income is one of the most discussed economic determinants
associated with the fulfillment of basic needs (r = −0.17)1 , of happiness. Easterlin [30] argues that while happiness and
respect gained from others (r = −0.20)1 and autonomy income show a positive relationship within nations, they
needs (r = −0.18)1 in terms of the degree of freedom in show weak or no association between nations. He also shows
life. Therefore, according to Veenhoven and Ehrhardt’s Liv- that, across countries, although a relationship between hap-
ability theory [287], some societies have a better quality of piness and income holds in the short-run, this is not the case
life because they highly satisfy the aforementioned universal over time. However, time-series and panel analyses across
needs. It should be noted that each of these basic and psy- countries show that there is a positive relationship between
chological needs is independent of one another, meaning that income and happiness also in the long-run [32–34]. Veen-
each of them is influencing happiness beyond the effects of hoven [31] challenges Easterlin’s findings by arguing that
others. people’s happiness highly depends on the satisfaction of basic
and psychological needs covered by income, which is more
3.1.3 Social environment an absolute standard, than a relative standard. To the present
day, studies support both arguments. Another source con-
Many determinants fall under this dimension and can explain tributing to this debate is individual-level income or wealth
changes in the reported level of happiness. To begin with, and happiness data studies. For example, a longitudinal study
education is an important determinant, which needs to be with a sample of about 33,000 individuals shows that after
carefully studied since there is controversial evidence of its two years, lottery winners rated higher happiness than non-
effects on happiness. Some studies of happiness economics lottery winners [294]. These contradicting findings confirm
suggest an insignificant relationship between higher educa- the complexity of the interpretation of the role of income and
tion and happiness, whereas some others show a negative wealth on happiness, since the potential positive relationship
relationship between them [25–28]. On the other hand, other between them may be moderated by other factors. For exam-
studies show that educated individuals tend to report more ple, in the case of natural disasters, wealthier countries are
positive emotions and less negative ones, as well as more sat- more economically capable of providing financial aid to the
isfaction with most domains of their life, such as financial, people affected by the event [295]. Employment falls under
employment opportunities, etc., even when controlling for the economic dimension category (see, e.g., [296,297]), like-
non-economic factors, such as marriage [288,289]. Besides, wise income. Evidence shows that unemployed individuals
studies show that health is an important determinant, with report lower happiness than employed individuals [298]. In
particular, Knabe et al. [299] demonstrate that unemploy-
1Zero-Order Correlations of needs and subjective well-being for the ment has a substantial relationship with diminished cognitive
world. well-being, but does not decrease affective well-being.
123
International Journal of Data Science and Analytics
Fig. 2 The figure relates the
sources of data (left) with the
dimensions of the subjective
well-being (right)
3.1.5 Political environment as discussed previously. We aim to highlight the advantages
and disadvantages of using each data source as a useful guide
As discussed previously, there are also political determinants for future research on happiness.
associated with happiness. For example, Radcliff et al. [35] With the growth of technology, researchers are inclined
examine the effect of direct democracy, and in particular, the to use more innovative approaches for the measurement of
effect of the use of initiatives on happiness. They show that happiness. In fact, over the last years, researchers use novel
an individual’s happiness is higher in states where not only methodologies and data sources, which offer new opportu-
initiatives are permitted, but also policy-makers depend on nities to study happiness and to circumvent the limitations
these initiatives to form the political system. Political free- carried from traditional methodologies and data sources.
dom falls under this dimension, as well. Veenhoven [36] Fowler and Christakis study [303] is one of the first and
shows that political freedom is highly correlated with hap- most important to help the transition of happiness research
piness in developed countries. Another political determinant from the traditional to the innovative era. The researchers
associated with happiness is social hierarchy in terms of the computerised information from archived handwritten admin-
differences in power and prestige. Brule and Veenhoven [300] istrative tracking sheets from the Framingham Heart Study.
show that in northern and southern European countries, peo- They study happiness as a network phenomenon, by using
ple are less happy in hierarchical societies. Last, social trust data of 4739 people, from 1983 to 2003. Comparing to pre-
[301] and government quality [302] are political determi- vious traditional work on happiness, which main focus is on
nants that are substantially associated with happiness. socioeconomic, political, and genetic factors, this study is the
first one to study happiness as a spreading phenomenon and
its characteristics. In particular, they suggest that happiness
3.2 Data sources for monitoring the dimensions of
is a network phenomenon, which clusters happy and unhappy
subjective well-being
people and spreads across various social relationships (e.g.,
relatives, friends) up to three degrees of separation (e.g., to
Similarly to Fig. 1 on objective well-being, Fig. 2 describes
one’s friends’ friends’ friends). Additionally, individuals that
the new data sources (left) that have been used to estimate
are central in the network are more likely to be happy in the
one or more dimensions of subjective well-being (right). The
future.
presence of a link in Fig. 2 between a data source and a
There are more than the study mentioned above in the
dimension indicates that there are papers in the literature on
innovative era, predominantly with the use of innovative big
monitoring that dimension with that data source. For exam-
data sources. Although measuring happiness with new data
ple, b4 indicates the link between Google Trends data (b) and
approaches appears to be adequate in predicting the emo-
economic environment (4).
tional component of happiness, most studies seem to neglect
In this section, we describe, for each data source, its fea-
the structural component of happiness [304]. Below new data
tures (e.g., the process of data collection, its biases and
sources are described, and relevant studies are provided. We
limitations) and the main works in the literature that use
would like to underline that in comparison to objective well-
it to measure several dimensions of subjective well-being.
being studies, researchers of subjective well-being usually
Table 9 provides a summary of the new data sources used
explore more than one dimension.
to explore happiness, including the traditional data sources,
123
International Journal of Data Science and Analytics
Table 10 The table contains a
Id Hashtags Mentions Text Profile info
subset of the information
returned by a Twitter API 240556 #dinner #ny [10214] #dinner bihday your majesty @user #ny {…..}
4261063 #lyft [964215] @user thanks for #lyft credit {…..}
72096 null null factsguide society now {…..}
Each tweet contains the information of the user profile and mentions or hashtags used in the text
3.2.1 Social media awareness of the influenced individuals. Indeed, by reduc-
ing the amount of emotional content in the Facebook News
Nowadays, people are highly involved in social media, and Feed on an experiment conducted on Facebook users, they
they are motivated to share their emotions and thoughts demonstrate that emotional contagion can also happen with-
online, leaving a large and continuously updated user- out direct interaction between the users and even without
generated content. Studying happiness from users’ posts may non-verbal cues.
eliminate the social desirability effect that traditional self- Social media is also used for the exploration of happi-
reports bring, due to participants’ inaccurate and dishonest ness as influenced by the social environment dimension (a3).
evaluation of happiness [305]. Thus, researchers and policy- For example, Lim et al. [311] collect a set of geotagged
makers are attracted by these intellectual opportunities to tweets, of users in Melbourne, Australia, between the period
explore happiness, with wider use of Twitter data accessed of November 2016 to January 2017. They use sentiment anal-
through Twitter’s public API. Twitter has the least barriers ysis to demonstrate that people show more positive emotions
in data extraction, while the other social media have strict and less negative emotions in green spaces or close to them.
policies, and the acquisition of data has turned to be diffi- This could potentially be taken into consideration by policy-
cult. Social media data may also encounter some concerns. makers aiming to improve the societal well-being by urban
They may reflect social desirability biases since individu- greening interventions. Besides, Mitchell et al. [312] use
als manage their online profiles [122]. Also, Twitter users Twitter data to study happiness and the 2010 United States
may not be as representative of the general population [123] Census Bureau’s MAF/TIGER database to define the urban
as anonymized self-reports conducted through a chosen rep- areas. They use the Language Assessment by Mechanical
resentative sample. Table 10 illustrates an example of the Turk (labMT) sentiment analysis tool to study the similari-
structure of Twitter records. ties in word use in urban areas in the United States, to map
There are several studies on social media (mostly on Twit- areas according to the happiness level and score individual
ter) showing the variations on happiness as influenced by the states and cities for average word happiness. Golder and
universal needs (a2), and in particular, the interaction with Macy [313] identify individual-level diurnal and seasonal
other people. For example, Quercia et al. [306] use Twit- mood rhythms in cultures across the globe, using data from
ter data in order to monitor the gross community happiness Twitter between February 2008 and January 2010. They find
in the city of London. In particular, they suggest that Twit- that people like the weekend as people are much happier on
ter friends, on average, have similar sentiment. They also Saturdays and Sundays. They also find that even individuals’
show that the relationship between sentiment and well-being good mood deteriorates as the day progresses, which is con-
can hold at individual and community level. Bollen et al. sistent with the effects of sleep and circadian rhythm. They
[307] use the OpinionFinder (OF) subjectivity lexicon [308] also show that seasonal change in baseline positive affect
in order to analyze the sentiment of an online social network varies with change in day length. Landsdall et al. [314] turn
of 39,110 Twitter users. They show the first direct observation their attention to the issue of the public mood or sentiment—
of a significant Happiness Paradox, meaning that on average the mood of the nation. They use tweets sampled from the 54
most of the individuals are less happy than their friends are. largest cities in the UK from July 2009 to January 2012, and
Similarly, by using the OF, Bollen et al. [309] analyze the they associate each of the basic emotions (fear, joy, anger,
emotional content of a set of Twitter users over 6 months, to sadness) with a list of words. They find out that each of the
examine whether happiness is assortative in online social net- four key emotions changes over time in a manner that is partly
works. They find significant levels of happiness assortativity predictable (or at least interpretable). Joy rises in Christmas,
across Twitter, since users might be propense to connect to fear in Halloween, and especially negative mood started in
users with similar happiness values (homophilic attachment) October 2010, where massive cuts were announced in the
or converge on their friends’ happiness level (contagion). UK. Cresci et al. [315] use Instagram data to explore, among
This result suggests that real social networks may work sim- others, the differences that the cultural and social environ-
ilarly. With the use of Facebook, Kramer et al. [310] test ment bring on people’s smiles. They perform face recognition
whether emotions are contagious between users without the in a case study of over 2 million selfies shared from January
123
International Journal of Data Science and Analytics
to February 2015. In particular, they use a Face++ algorithm 3.2.2 Google trends
function to measure the smiling degree of the individuals
in their selfies. Results reveal that El Salvador, Brazil, and Another new data source is Google Trends, which provides
Panama have the highest smiling average. data on the frequency of specific search terms over time.
Other researchers use social media to study the variations Algan et al. [324] present Google Trends as a new data
of happiness as influenced by more than one dimension. For source for exploring happiness and its relevant dimensions.
example, Bollen et al. [316] conduct sentiment analysis on They consider it a promising data source for its timeli-
Twitter data from 2008. They find that events in the social and ness, since it provides computational social scientists with
cultural (a3), political (a5), and economic sphere (a4) have a immediate data, as well as offers the possibility to observe
significant effect on happiness. Dodds et al. [317] construct people’s behavior, as compared to analyzing textual opinions.
the Hedonometer to measure temporal patterns of societal On the other hand, working with Google Trends challenges
happiness, as influenced by basic needs (a2), as well as by researchers since the value of the series obtained directly
various social (a3), economic (a4) and political (a5) deter- from Google Trends is difficult to interpret, and this value on
minants. For indicating happiness using Hedonometer, they a given day cannot be compared between terms since they
create a data set of users’ tweets over 3 years (from September are normalized to the maximum value by term. In this study
2008 to September 2011 approximately). The results show [324], researchers cover 300 weeks from January 6, 2008, to
that in general, at an annual level, the average happiness January 4, 2014. Results reveal that happiness is associated
appears to increase till April 2009 and then to decrease grad- with job security, financial security (b4), family life (b2), and
ually. On a weekly basis, the average happiness peaks during leisure determinants (b3). An example of Google Trends data
the weekend and on an hourly basis, the happiest hour of the set is not provided since data are represented as time series
day is between 5 to 6 a.m. (US local time). Another example of the frequency.
is Iacus et al. [318], who analyze tweets from Italy, written
in the Italian language. In particular, they use the iSA (inte- 3.2.3 Crowdsourced data
grated Sentiment Analysis) method [319,320] to capture a set
of determinants that influence happiness, such as self-esteem Crowdsourcing, as discussed in Sect. 2.2, involves obtaining
(a1) and family relationships (a2), and aggregate them into work, information, or opinions from a large group of people
an index labeled SWBI (Social Well Being Index). Results who submit their data via the Internet, smartphone apps, etc.
suggest that the environmental and health conditions (a3) In particular, smartphones are lately appealing to happiness
anticipate several determinants of happiness as measured by researchers since they give access to previously inaccessible
SWBI. This study is one of the few to study both the emo- data related to daily social behavior [325,326]. Innovative
tional and structural components of happiness. Curini et al. smartphone sensor technology, such as accelerometers, GPS,
[321] use tweets posted in 2012 in Italy to build a happi- and Bluetooth, are used in combination with self-reports,
ness index, labeled iHappy. They demonstrate that variables such as mood tracking self-reports, in the form of EMA.
such as the overall quality of institutions (a5) seem to have a However, such methodologies bring the limitations of the
minor effect on the average level of happiness of the Italian traditional data sources (see the first rows of Table 9), since
provinces. In contrast, meteorological variables, such as rain happiness fluctuations are collected through self-reports.
and snow (a3), as well as events related to specific days, such Moreover, when hiring individuals to participate in crowd-
as the payday (a4), have a stronger impact on happiness. Fur- sourcing platforms, the crowd is not anymore for free, and the
thermore, Durahim et al. [322] use Twitter data to create the study might result in high costs. It is, therefore, hard to keep a
Gross National Happiness (GNH) for the country of Turkey. trade-off between initial objectives with results of quality and
The GNH created measures people’s happiness as varied due cost [327]. Additionally, some studies are conducted with a
to specific events, such as Saint Valentine’s Day (a2), Start- small number of data and might need to be replicated. Table
ing day of Gezi Park Protests (#occupygezi), and Day of 11 shows an example of crowdsourced data.
Ergenekon lawsuit verdict (a5). Last, Coviello et al. [323] For example, Lathia et al. [328] collect data of over 10,000
compare what people post on Facebook to data they have on individuals, by combining smartphone-based self-reports (in
the weather (a3), specifically the rainfall amount. They find the form of EMA) and the accelerator in the smartphones, to
that people tend to post less happy messages on Facebook if investigate the relationship between happiness and physical
it rains. This emotion seems to pass along their network (a2). activity (c3). Results show that there is indeed a relation-
For example, if a friend on Facebook is in a rainy area and ship between happiness and physical activities, including the
this affects the emotional content of her posts on Facebook, non-exercise ones, such as standing and walking. Asai et al.
then more likely, her friends might post a sadder message, [329] study 100,000 happy moments from HappyDB over
even though where they are the weather is better. 3 months, to find which are the short and long term deter-
minants of happiness. In particular, HappyDB is a database
123
International Journal of Data Science and Analytics
Table 11 The table contains a
Id Reflection period Text Num. sentences
subset of the information
returned by HappyDB, a 28775 24 h Donated blood. Painful 2
crowdsourced database
capturing happy moments 32612 24 h Morning yoga class 1
42663 24 h Children with butterflies 1
created through Amazon Mechanical Turk, for capturing peo- being or happiness, as well as their relevant dimensions
ple’s happy moments by asking every 24 h and once over 3 necessary for the conduction of a meaningful study. In addi-
months, people’s happiness status, and analyzing with NLP tion, we present a review of the data sources used for the
people’s responses. Results show that exercise, nature, and exploration of well-being, and we discuss existing related
leisure (c3) are short-term determinants, whereas social rela- studies. More specifically, we present the structure and the
tionships with loved ones (c2) and achievements (c3) are opportunities that each data source offers and the problems
long-term determinants. Bogomolov et al. [330] exploit a that researchers might encounter when working with these
data set of 117 individuals, who are equipped with a sens- data.
ing software between 2010 and 2011. This software collects The paper is primarily targeted at researchers interested
smartphone activity data of call logs, SMS and proximity in “Data Science for Social Good” (DS4SG) or similarly
data (acquired by scanning nearby phones and other Blue- “Artificial Intelligence for Social Good” (AI4SG). Harnessed
tooth devices every five minutes). It also collects personality correctly, artificial intelligence can inform and empower the
traits (the “Big Five” [331]) and daily happiness data by social good decision-making [335,336]. DS4SG or AI4SG is
self-report questionnaires. Results demonstrate that by using a vague concept, and there is not an adequate definition yet.
mobile phone data reflecting social interactions (c2), infor- However, Shi et al. [39] propose several societal application
mation concerning weather conditions (c3), and personality domains to shed light on this concept, such as healthcare
traits (c1), individuals’ daily happiness can be predicted. and well-being. In this study, we specifically aim to con-
tribute to the exploration of well-being through data science.
Researchers from various disciplines, from social science to
3.2.4 News data
computer science, could use this paper to understand data sci-
ence for well-being better and make a positive and tangible
Similarly to objective well-being, news data are a new
social impact.
promising data source for the further exploration of sub-
We would like to underline that this is not a complete
jective well-being. Its advantages and its disadvantages, as
review of studies conducted on well-being with the use of
well as a data set example, are discussed and presented in
innovative data sources. We aim to provide some examples of
Sect. 2.2. Carlquist et al. [332] study happiness with the
the most important evidence on these data sources and well-
use of news data. In particular, they study the concept of
being dimensions so that this study works as a reference point
well-being in Norwegian society by examining word use
for future research. We do not fully cover existing research
patterns in four electronically archived Norwegian newspa-
on a given link that is present in Figs. 1 and 2, but to the
pers media from 1992 to 2014. They demonstrate that about
best of our knowledge, a missing link entails that there is no
half of the words referring to affective approaches, cognitive
existing study connecting the two nodes. For example, there
or life satisfaction approaches, eudaimonic and humanistic
is no adequate literature on news data for the exploration of
approaches, and character strengths show systematic and sta-
the safety dimension (E5) of objective well-being. Therefore,
tistically significant patterns of change. The most notable rise
since nowadays, safety is an important dimension, due to
concerns the eudaimonic words (related to mastery, motiva-
constant conflicts around the world (e.g., political instability,
tion, and self-development), which show increasing trends
terrorist attacks), it shows great potential for future research.
in all newspapers. The authors state that certain happiness
Moreover, new data sources seem to be particularly
terms appearing more frequently could be interpreted as an
promising for a more in-depth exploration of subjective well-
increased and liberating focus on individual opportunity (d1)
being. Taking into consideration the subjective nature of
[333] or could demonstrate neoliberal ideology (d5) [334].
happiness, it has been traditionally measured through self-
reports. Although they have been proved to be valid, they
are very costly, and depending on the study might neglect to
4 Discussion capture either the emotional or the structural component of
well-being. Therefore, new data sources could be used, and
In this study, we provide researchers with the theoretical innovative methodologies, such as text analysis, could be
background on both the objective and the subjective well-
123
International Journal of Data Science and Analytics
applied for a complete, according to its definition, measure- 2. Fleurbaey, M.: Beyond gdp: the quest for a measure of social
ment of subjective well-being. Still, most studies using new welfare. J. Econ. Lit. 47(4), 1029–75 (2009)
3. Stiglitz, J.E., Sen, A., Fitoussi, J.P.: Report by the Commission on
data sources tap into the emotional component of subjective the Measurement of Economic Performance and Social Progress.
well-being and neglect the structural component. Conse- The Commission Paris (2009)
quently, we suggest further exploration of the novel data 4. Dodge, R., Daly, A.P., Huyton, J., Sanders, L.D.: The challenge
sources for the measurement of subjective well-being, cap- of defining wellbeing. Int. J. Wellbeing 2(3), 11 (2012)
5. Alkire, S.: Dimensions of human development. World Dev. 30(2),
turing both components. 181–205 (2002)
Undoubtedly, the research opportunities opened up by the 6. Organisation for Economic Co-operation and Development
innovative data sources discussed in this paper are plenty. How’s life? Measuring Well-Being. OECD, Paris (2011)
However, with the use of these data sources, researchers are 7. UNDP Sustainable Development Goals. https://
sustainabledevelopment.un.org/sdgs. Accessed Oct 2019
called to deal with new challenges comparing to traditional
(2015)
research. Since, usually, the data used are personal, if not sen- 8. Rapporto, BES Il benessere equo e sostenibile in Italia. ISTAT
sitive, and are analyzed to shape policy and to make decisions (2015)
[337,338], ethical concerns may arise, such as privacy and 9. Organisation for Economic Co-operation and Development
(OECD) OECD Guidelines on Measuring Subjective Well-Being.
respect to human rights. In the European Union, additional
OECD Publishing (2013)
attention to the topic has been brought after the implemen- 10. Veenhoven, R.: Conditions of Happiness, Reidel. Springer, Dor-
tation of the General Data Protection Regulation (GDPR). drecht (1984)
Researchers need to take into consideration the ethical chal- 11. Frey, B.S., Stutzer, A.: What can economists learn from happiness
research? J. Econ. Lit. 40(2), 402–435 (2002)
lenges and not overlook them but address them successfully. 12. Stiglitz, J.E., Sen, A., Fitoussi, J.P.: Measurement of economic
Only by facing ethical problems, researchers can maximize performance and social progress. Online document. http://www.
the contributing value of data science studies for society. bitly/JTwmG Accessed 26 June 2012 (2009)
13. Bartels, M., Boomsma, D.I.: Born to be happy? The etiology of
Acknowledgements This work was supported by the European Com- subjective well-being. Behav. Genet. 39(6), 605 (2009)
mission through the Horizon2020 European project “SoBigData Resea- 14. Bartels, M., Saviouk, V., De Moor, M.H., Willemsen, G., van
rch Infrastructure—Big Data and Social Mining Ecosystem” (Grant Beijsterveldt, T.C., Hottenga, J.J., De Geus, E.J., Boomsma, D.I.:
Agreement 654024). We would like to thank Daniele Fadda for support Heritability and genome-wide linkage scan of subjective happi-
on data visualization. ness. Twin Res. Hum. Genet. 13(2), 135–142 (2010)
15. Nes, R.B., Røysamb, E.: The heritability of subjective well-being:
Author contributions VV: conceptualization, writing, tables and fig- review and meta-analysis. In: The Genetics of Psychological
ures, LG: conceptualization and writing, IM: writing, tables and figures, Well-Being: The Role of Heritability and Genetics in Positive
SC: writing, RS: writing, MT: writing, LP: conceptualization, writing Psychology, pp. 75–96 (2015)
and managing. 16. Nes, R.B., Czajkowski, N., Tambs, K.: Family matters: happi-
ness in nuclear families and twins. Behav. Genet. 40(5), 577–590
(2010)
Compliance with ethical standards 17. Nes, R., Røysamb, E., Tambs, K., Harris, J., Reichborn-
Kjennerud, T.: Subjective well-being: genetic and environmental
contributions to stability and change. Psychol. Med. 36(7), 1033–
Conflict of interest On behalf of all authors, the corresponding author 1042 (2006)
states that there is no conflict of interests. 18. Røysamb, E., Harris, J.R., Magnus, P., Vittersø, J., Tambs, K.:
Subjective well-being. Sex-specific effects of genetic and environ-
Open Access This article is licensed under a Creative Commons mental factors. Personal. Individ. Differ. 32(2), 211–223 (2002)
Attribution 4.0 International License, which permits use, sharing, adap- 19. Røysamb, E., Tambs, K., Reichborn-Kjennerud, T., Neale, M.C.,
tation, distribution and reproduction in any medium or format, as Harris, J.R.: Happiness and health: environmental and genetic
long as you give appropriate credit to the original author(s) and the contributions to the relationship between subjective well-being,
source, provide a link to the Creative Commons licence, and indi- perceived health, and somatic illness. J. Pers. Soc. Psychol. 85(6),
cate if changes were made. The images or other third party material 1136 (2003)
in this article are included in the article’s Creative Commons licence, 20. Schnittker, J.: Happiness and success: genes, families, and the psy-
unless indicated otherwise in a credit line to the material. If material chological effects of socioeconomic position and social support.
is not included in the article’s Creative Commons licence and your Am. J. Sociol. 114(S1), S233–S259 (2008)
intended use is not permitted by statutory regulation or exceeds the 21. Pleeging, E., Burger, M., van Exel, J.: The relations between hope
permitted use, you will need to obtain permission directly from the copy- and subjective well-being: a literature overview and empirical
right holder. To view a copy of this licence, visit http://creativecomm analysis. Appl. Res. Qual. Life 1, 1–23 (2020)
ons.org/licenses/by/4.0/. 22. Kenrick, D.T., Griskevicius, V., Neuberg, S.L., Schaller, M.: Ren-
ovating the pyramid of needs: contemporary extensions built upon
ancient foundations. Perspect. Psychol. Sci. 5(3), 292–314 (2010)
23. Ryan, R.M., Deci, E.L.: Self-determination theory and the facili-
tation of intrinsic motivation, social development, and well-being.
References Am. Psychol. 55(1), 68 (2000)
24. Tay, L., Diener, E.: Needs and subjective well-being around the
1. Reinhart, C.M., Reinhart, V.R.: After the fall. Technical report. world. J. Pers. Soc. Psychol. 101(2), 354 (2011)
National Bureau of Economic Research (2010)
123
International Journal of Data Science and Analytics
25. Clark, A.E., Oswald, A.J.: Satisfaction and comparison income. 50. OECD.: OECD Better Life Index: Jobs. http://www.
J. Public Econ. 61(3), 359–381 (1996) oecdbetterlifeindex.org/topics/jobs/. Accessed Oct 2019 (2011a)
26. Shields, M.A., Price, S.W., Wooden, M.: Life satisfaction and the 51. OECD.: OECD Better Life Index: Income. http://www.
economic and social characteristics of neighbourhoods. J. Popul. oecdbetterlifeindex.org/topics/income/. Accessed Oct 2019
Econ. 22(2), 421–443 (2009) (2011b)
27. Powdthavee, N.: How much does money really matter? Estimating 52. OECD.: OECD Better Life Index: Environment. http://www.
the causal effects of income on happiness. Empir. Econ. 39(1), oecdbetterlifeindex.org/topics/environment/. Accessed Oct 2019
77–92 (2010) (2011c)
28. Nikolaev, B.: Living with mom and dad and loving it... or are you? 53. OECD.: OECD Better Life Index: Safety. http://www.
J. Econ. Psychol. 51, 199–209 (2015) oecdbetterlifeindex.org/topics/safety/. Accessed Oct 2019
29. Dolan, P., Peasgood, T., White, M.: Do we really know what makes (2011d)
us happy? A review of the economic literature on the factors asso- 54. Amerio, P., Roccato, M.: Psychological reactions to crime in Italy:
ciated with subjective well-being. J. Econ. Psychol. 29(1), 94–122 2002–2004. J. Commun. Psychol. 35(1), 91–102 (2007)
(2008) 55. OECD.: OECD Better Life Index: Civic Engagement. http://www.
30. Easterlin, R.A.: Does economic growth improve the human lot? oecdbetterlifeindex.org/topics/civic-engagement/. Accessed Oct
Some empirical evidence. In: Nations and Households in Eco- 2019 (2011)
nomic Growth, pp 89–125. Elsevier (1974) 56. Blondel, V.D., Decuyper, A., Krings, G.: A survey of results on
31. Veenhoven, R.: Is happiness relative? Soc. Indic. Res. 24(1), 1–34 mobile phone datasets analysis. EPJ Data Sci. 4(1), 10 (2015)
(1991) 57. Eagle, N., Pentland, A.S.: Eigenbehaviors: identifying structure
32. Diener, E., Tay, L., Oishi, S.: Rising income and the subjective in routine. Behav. Ecol. Sociobiol. 63(7), 1057–1066 (2009)
well-being of nations. J. Pers. Soc. Psychol. 104(2), 267 (2013) 58. Pappalardo, L., Simini, F., Rinzivillo, S., Pedreschi, D., Giannotti,
33. Veenhoven, R., Vergunst, F.: The Easterlin illusion: economic F., Barabási, A.L.: Returners and explorers dichotomy in human
growth does go with greater happiness. Int. J. Happiness Dev. mobility. Nat. Commun. 6, 8166 (2015)
1(4), 311–343 (2014) 59. Pappalardo, L., Rinzivillo, S., Simini, F.: Human mobility mod-
34. Sacks, D.W., Stevenson, B., Wolfers, J.: The new stylized facts elling: exploration and preferential return meet the gravity model.
about income and subjective well-being. Emotion 12(6), 1181 Proc. Comput. Sci. 83, 934–939 (2016). https://doi.org/10.1016/
(2012) j.procs.2016.04.188
35. Radcliff, B., Shufeldt, G.: Direct democracy and subjective well- 60. Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: A
being: the initiative and life satisfaction in the American states. data mining approach to assess privacy risk in human mobility
Soc. Indic. Res. 128(3), 1405–1423 (2016) data. ACM Trans. Intell. Syst. Technol. 9(3), 31:1–31:27 (2017).
36. Veenhoven, R.: Social conditions for human happiness: a review https://doi.org/10.1145/3106774
of research. Int. J. Psychol. 50(5), 379–391 (2015) 61. Pappalardo, L., Simini, F.: Data-driven generation of spatio-
37. Deaton, A.: The Analysis of Household Surveys: A Microecono- temporal routines in human mobility. Data Min. Knowl. Disc.
metric Approach to Development Policy. The World Bank (1997) 32(3), 787–829 (2018)
38. European Project.: SoBigData. http://sobigdata.eu/index. 62. Giannotti, F., Pappalardo, L., Pedreschi, D., Wang, D.: A Com-
Accessed Oct 2019 (2015) plexity Science Perspective on Human Mobility, pp. 297–314.
39. Shi, Z.R., Wang, C., Fang, F.: Artificial Intelligence for Social Cambridge University Press, Cambridge (2013). https://doi.org/
Good: A Survey. arXiv preprint arXiv:2001.01818 (2020) 10.1017/CBO9781139128926.016
40. Solomon, D.J.: Conducting web-based surveys. Pract. Assess. 63. Ranjan, G., Zang, H., Zhang, Z.L., Bolot, J.: Are call detail records
Res. Eval. 7(19), 12 (2001) biased for sampling human mobility? ACM SIGMOBILE Mob.
41. Daas, P.J., Puts, M.J., Buelens, B., Van den Hurk, P.A.: Big data Comput. Commun. Rev. 16(3), 33–44 (2012)
and official statistics. In: Proceedings of the NTTS, pp. 5–7. New 64. Iovan, C., Olteanu-Raimond, A.M., Couronné, T., Smoreda, Z,:
Techniques and Technologies for Statistics (2013) Moving and calling: mobile phone data quality measurements and
42. Struijs, P., Daas, P.: Quality approaches to big data in official spatiotemporal uncertainty in human mobility studies. In: Geo-
statistics. In: European Conference on Quality in Official Statistics graphic Information Science at the Heart of Europe, pp. 247–265.
(2014) Springer (2013)
43. Jahani, E., Sundsøy, P., Bjelland, J., Bengtsson, L., de Montjoye, 65. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understand-
Y.A., et al.: Improving official statistics in emerging markets using ing individual human mobility patterns. Nature 453(7196), 779
machine learning and mobile phone data. EPJ Data Sci. 6(1), 3 (2008)
(2017) 66. Barabasi, A.L.: The origin of bursts and heavy tails in human
44. Blumenstock, J.E.: Fighting poverty with data. Science dynamics. Nature 435(7039), 207 (2005)
353(6301), 753–754 (2016) 67. Oliver, N., Matic, A., Frias-Martinez, E.: Mobile network data for
45. United Nations.: A world that counts: mobilizing the data revolu- public health: opportunities and challenges. Front. Public Health
tion for sustainable development. Technical report (2014) 3, 189 (2015)
46. Sustainable Development Solutions Network: Indicators and a 68. Finger, F., Genolet, T., Mari, L., de Magny, G.C., Manga, N.M.,
Monitoring Framework for the Sustainable Development Goals. Rinaldo, A., Bertuzzo, E.: Mobile phone data highlights the role
Launching a Data Revolution for the SDGs, United Nations, New of mass gatherings in the spreading of cholera outbreaks. Proc.
York (2015) Nat. Acad. Sci. 113(23), 6421–6426 (2016)
47. WHO, World Health Organization: Geneva Macroeconomics and 69. Kafsi, M., Kazemi, E., Maystre, L., Yartseva, L., Grossglauser, M.,
health: investing in health for economic development-report of Thiran, P.: Mitigating epidemics through mobile micro-measures.
the commission on macroeconomics and health. Commission on arXiv preprint arXiv:1307.2084 (2013)
Macroeconomics and Health (2001) 70. Lima, A., De Domenico, M., Pejovic, V., Musolesi, M.: Disease
48. European Commission: The Lisbon strategy for growth and jobs containment strategies based on mobility and information dissem-
(2000) ination. Sci. Rep. 5, 10650 (2015)
49. OECD.: OECD Better Life Index: Health. http://www. 71. Madan, A., Cebrian, M., Lazer, D., Pentland, A.: Social sensing
oecdbetterlifeindex.org/topics/health/. Accessed Oct 2019 (2011) for epidemiological behavior change. In: Proceedings of the 12th
123
International Journal of Data Science and Analytics
ACM International Conference on Ubiquitous Computing, pp. data to improve air pollution exposure assessments. J. Expos. Sci.
291–300. ACM (2010) Environ. Epidemiol. 29(2), 278 (2019)
72. Pappalardo, L., Pedreschi, D., Smoreda, Z., Giannotti, F.: Using 90. Lu, X., Wrathall, D.J., Sundsøy, P.R., Nadiruzzaman, M., Wetter,
big data to study the link between human mobility and socio- E., Iqbal, A., Qureshi, T., Tatem, A.J., Canright, G.S., Engø-
economic development. In: 2015 IEEE International Conference Monsen, K., et al.: Detecting climate adaptation with mobile
on Big Data (Big Data), pp. 871–78 (2015) https://doi.org/10. network data in bangladesh: anomalies in communication, mobil-
1109/BigData.2015.7363835 ity and consumption patterns during cyclone mahasen. Clim.
73. Toole, J.L., Lin, Y.R., Muehlegger, E., Shoag, D., González, M.C., Change 138(3–4), 505–519 (2016)
Lazer, D.: Tracking employment shocks using mobile phone data. 91. Lu, X., Bengtsson, L., Holme, P.: Predictability of population
J. R. Soc. Interface 12(107), 20150185 (2015) displacement after the 2010 haiti earthquake. Proc. Nat. Acad.
74. Sundsøy, P., Bjelland, J., Reme, B.A., Jahani, E., Wetter, E., Sci. 109(29), 11576–11581 (2012)
Bengtsson, L.: Towards real-time prediction of unemployment 92. Bengtsson, L., Lu, X., Thorson, A., Garfield, R., Von Schreeb,
and profession. In: International Conference on Social Informat- J.: Improved response to disasters and outbreaks by tracking
ics, pp. 14–23. Springer (2017) population movements with mobile phone network data: a post-
75. Eagle, N., Macy, M., Claxton, R.: Network diversity and economic earthquake geospatial study in haiti. PLoS Med. 8(8), e1001083
development. Science 328(5981), 1029–1031 (2010) (2011)
76. Steele, J.E., Sundsøy, P.R., Pezzulo, C., Alegana, V.A., Bird, T.J., 93. Wilson, R., Zu Erbach-Schoenberg, E., Albert, M., Power, D.,
Blumenstock, J., Bjelland, J., Engø-Monsen, K., de Montjoye, Tudge, S., Gonzalez, M., Guthrie, S., Chamberlain, H., Brooks,
Y.A., Iqbal, A.M., et al.: Mapping poverty using mobile phone C., Hughes, C., et al.: Rapid and near real-time assessments of
and satellite data. J. R. Soc. Interface 14(127), 20160690 (2017) population displacement using mobile phone data following dis-
77. Mao, H., Shuai, X., Ahn, Y.Y., Bollen, J.: Quantifying socio- asters: the 2015 Nepal earthquake. PLoS Curr. 8, 1 (2016)
economic indicators in developing countries from mobile phone 94. Nyarku, M., Mazaheri, M., Jayaratne, R., Dunbabin, M., Rahman,
communication data: applications to côte d’ivoire. EPJ Data Sci. M.M., Uhde, E., Morawska, L.: Mobile phones as monitors of
4(1), 15 (2015) personal exposure to air pollution: Is this the future? PLoS ONE
78. Gutierrez, T., Krings, G., Blondel, V.D.: Evaluating socio- 13(2), e0193150 (2018)
economic state of a country analyzing airtime credit and mobile 95. Liu, H.Y., Skjetne, E., Kobernus, M.: Mobile phone tracking: in
phone datasets. arXiv preprint arXiv:1309.4496 (2013) support of modelling traffic-related air pollution contribution to
79. Blumenstock, J.: Calling for better measurement: estimating an individual exposure and its implications for public health impact
individual’s wealth and well-being. ACM KDD (Data Mining for assessment. Environ. Health 12(1), 93 (2013)
Social Good) (2014) 96. Decuyper, A., Rutherford, A., Wadhwa, A., Bauer, J.M., Krings,
80. Blumenstock, J., Cadamuro, G., On, R.: Predicting poverty and G., Gutierrez, T., Blondel, V.D., Luengo-Oroz, M.A.: Estimating
wealth from mobile phone metadata. Science 350(6264), 1073– food consumption and poverty indices with mobile phone data.
1076 (2015) arXiv preprint arXiv:1412.2595 (2014)
81. Frias-Martinez, V., Virseda, J.: On the relationship between socio- 97. Bogomolov, A., Lepri, B., Staiano, J., Oliver, N., Pianesi, F.,
economic factors and cell phone usage. In: Proceedings of the Pentland, A.: Once upon a crime: towards crime prediction
Fifth International Conference on Information and Communica- from demographics and mobile data. In: Proceedings of the 16th
tion Technologies and Development, pp. 76–84. ACM (2012) International Conference on Multimodal Interaction, pp. 27–434.
82. Soto, V., Frias-Martinez, V., Virseda, J., Frias-Martinez, E.: Pre- ACM (2014)
diction of socioeconomic levels using cell phone records. In: 98. Ferrara, E., De Meo, P., Catanese, S., Fiumara, G.: Detecting crim-
International Conference on User Modeling, Adaptation, and Per- inal organizations in mobile phone networks. Expert Syst. Appl.
sonalization, pp. 377–388. Springer (2011) 41(13), 5733–5750 (2014)
83. Frias-Martinez, V., Soguero-Ruiz, C., Frias-Martinez, E., Josephi- 99. Elgethun, K., Fenske, R.A., Yost, M.G., Palcisko, G.J.: Time-
dou, M.: Forecasting socioeconomic trends with cell phone location analysis for exposure assessment studies of children
records. In: Proceedings of the 3rd ACM Symposium on Com- using a novel global positioning system instrument. Environ.
puting for Development, p. 15. ACM (2013) Health Perspect. 111(1), 115–122 (2003)
84. Hernandez, M., Hong, L., Frias-Martinez, V., Frias-Martinez, 100. Dias, D., Tchepel, O.: Modelling of human exposure to air pollu-
E.: Estimating poverty using cell phone data: evidence from tion in the urban environment: a GPS-based approach. Environ.
Guatemala. The World Bank (2017) Sci. Pollut. Res. 21(5), 3558–3571 (2014)
85. Pappalardo, L., Vanhoof, M., Gabrielli, L., Smoreda, Z., 101. Beekhuizen, J., Kromhout, H., Huss, A., Vermeulen, R.: Perfor-
Pedreschi, D., Giannotti, F.: An analytical framework to now- mance of gps-devices for environmental exposure assessment. J.
cast well-being using mobile phone data. Int. J. Data Sci. Anal. Eposure Sci. Environ. Epidemiol. 23(5), 498 (2013)
2(1), 75–92 (2016). https://doi.org/10.1007/s41060-016-0013-2 102. Pappalardo, L., Simini, F., Barlacchi, G., Pellungrini, R.: Scikit-
86. Lotero, L., Cardillo, A., Hurtado, R., Gómez-Gardeñes, J.: Several mobility: a python library for the analysis, generation and risk
multiplexes in the same city: the role of socioeconomic differences assessment of mobility data. arXiv:1907.07062 (2019)
in urban mobility. In: Interconnected Networks, pp. 149–164. 103. Jankowska, M.M., Schipperijn, J., Kerr, J.: A framework for using
Springer (2016) GPS data in physical activity and sedentary behavior studies.
87. Amini, A., Kung, K., Kang, C., Sobolevsky, S., Ratti, C.: The Exerc. Sport Sci. Rev. 43(1), 48 (2015)
impact of social segregation on human mobility in developing 104. Kelly, P., Krenn, P., Titze, S., Stopher, P., Foster, C.: Quantify-
and industrialized regions. EPJ Data Sci. 3(1), 6 (2014) ing the difference between self-reported and global positioning
88. Smith-Clarke, C., Mashhadi, A., Capra, L.: Poverty on the cheap: systems-measured journey durations: a systematic review. Transp.
estimating poverty maps using aggregated mobile communication Rev. 33(4), 443–459 (2013)
networks. In: Proceedings of the SIGCHI Conference on Human 105. Meurs, H., Haaijer, R.: Spatial structure and mobility. Transp. Res.
Factors in Computing Systems, pp. 511–520. , ACM (2014) Part D Transp. Environ. 6(6), 429–446 (2001)
89. Picornell, M., Ruiz, T., Borge, R., García-Albertos, P., de la Paz, 106. Oliver, M., Badland, H., Mavoa, S., Duncan, M.J., Duncan, S.:
D., Lumbreras, J.: Population dynamics based on mobile phone Combining GPS, GIS, and accelerometry: methodological issues
123
International Journal of Data Science and Analytics
in the assessment of location and intensity of travel behaviors. J. 124. De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting
Phys. Activity Health 7(1), 102–108 (2010) depression via social media. ICWSM 13, 1–10 (2013)
107. Adams, S.A., Matthews, C.E., Ebbeling, C.B., Moore, C.G., 125. Signorini, A., Segre, A.M., Polgreen, P.M.: The use of Twitter
Cunningham, J.E., Fulton, J., Hebert, J.R.: The effect of social to track levels of disease activity and public concern in the US
desirability and social approval on self-reports of physical activ- during the influenza A H1N1 pandemic. PLoS ONE 6(5), e19467
ity. Am. J. Epidemiol. 161(4), 389–398 (2005) (2011)
108. Pappalardo, L., Rinzivillo, S., Qu, Z., Pedreschi, D., Giannotti, 126. Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves
F.: Understanding the patterns of car travel. Eur. Phys. J. Spec. influenza forecasting. PLoS Curr. 6, 12 (2014)
Top. 215(1), 61–73 (2013). https://doi.org/10.1140/epjst/e2013- 127. Lampos, V., Cristianini, N.: Tracking the flu pandemic by mon-
01715-5 itoring the social web. In: 2010 2nd International Workshop on
109. Chaix, B., Kestens, Y., Duncan, D.T., Brondeel, R., Méline, J., Cognitive Information Processing, pp. 411–416. IEEE (2010)
El Aarbaoui, T., Pannier, B., Merlo, J.: A GPS-based methodol- 128. Lampos, V., Cristianini, N.: Nowcasting events from the social
ogy to analyze environment-health associations at the trip level: web with statistical learning. ACM Trans. Intell. Syst. Technol.
case-crossover analyses of built environments and walking. Am. 3(4), 72 (2012)
J. Epidemiol. 184(8), 579–589 (2016) 129. Chen, X., Yang, X.: Does food environment influence food
110. Kerr, J., Duncan, S., Schipperjin, J.: Using global positioning sys- choices? A geographical analysis through “tweets”. Appl. Geogr.
tems in health research: a practical approach to data collection and 51, 82–89 (2014)
processing. Am. J. Prev. Med. 41(5), 532–540 (2011) 130. Llorente, A., Garcia-Herranz, M., Cebrian, M., Moro, E.: Social
111. Saelens, B.E., Vernez Moudon, A., Kang, B., Hurvitz, P.M., Zhou, media fingerprints of unemployment. PLoS ONE 10(5), e0128692
C.: Relation between higher physical activity and public transit (2015)
use. Am. J. Public Health 104(5), 854–859 (2014) 131. Antenucci, D., Cafarella, M., Levenstein, M., Ré, C., Shapiro,
112. Rundle, A.G., Sheehan, D.M., Quinn, J.W., Bartley, K., Eisen- M.D.: Using social media to measure labor market flows. Tech-
hower, D., Bader, M.M., Lovasi, G.S., Neckerman, K.M.: Using nical report. National Bureau of Economic Research (2014)
GPS data to study neighborhood walkability and physical activity. 132. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock
Am. J. Prev. Med. 50(3), e65–e72 (2016) market. J. Comput. Sci. 2(1), 1–8 (2011)
113. Sadler, R.C., Gilliland, J.A.: Comparing children’s GPS tracks 133. Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., Goldstein, G.
with geospatial proxies for exposure to junk food. Spat. Spat. Identifying and following expert investors in stock microblogs. In:
Temp. Epidemiol. 14, 55–61 (2015) Proceedings of the Conference on Empirical Methods in Natural
114. Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive Language Processing, pp 1310–1319. Association for Computa-
monitoring of depressive states by means of smartphone mobility tional Linguistics (2011)
traces analysis. In: Proceedings of the 2015 ACM International 134. De Choudhury, M., Sundaram, H., John, A., Seligmann, D.D.: Can
Joint Conference on Pervasive and Ubiquitous Computing, pp. blog communication dynamics be correlated with stock market
1293–1304. ACM (2015) activity? In: Proceedings of the Nineteenth ACM Conference on
115. Marchetti, S., Giusti, C., Pratesi, M., Salvati, N., Giannotti, F., Hypertext and Hypermedia, pp. 55–60. ACM (2008)
Pedreschi, D., Rinzivillo, S., Pappalardo, L., Gabrielli, L.: Small 135. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: $FAKE:
area model-based estimators using big data sources. J. Off. Stat. Evidence of spam and bot activity in stock microblogs on Twitter.
31(2), 263–281 (2015) In: Proceedings of the 12th International Conference on Web and
116. Smith, C., Quercia, D., Capra, L.: Finger on the pulse: identifying Social Media (ICWSM’18), pp. 580–583. AAAI (2018)
deprivation using transit flow analysis. In: Proceedings of the 2013 136. Cresci, S., Lillo, F., Regoli, D., Tardelli, S., Tesconi, M.: Cash-
Conference on Computer Supported Cooperative Work, pp. 683– tag piggybacking: uncovering spam and bot activity in stock
692. ACM (2013) microblogs on twitter. ACM Trans. Web (TWEB) 13(2), 11 (2019)
117. Lathia, N., Quercia, D., Crowcroft, J.: The hidden image of the 137. Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., Tesconi, M.:
city: sensing community well-being from urban mobility. In: Predictability or early warning: using social media in modern
International Conference on Pervasive Computing, pp. 91–98. emergency response. IEEE Internet Comput. 20(6), 4–6 (2016)
Springer (2012) 138. Kryvasheyeu, Y., Chen, H., Obradovich, N., Moro, E., Van Hen-
118. Robinson, A.I., Carnes, F., Oreskovic, N.M.: Spatial analysis of tenryck, P., Fowler, J., Cebrian, M.: Rapid assessment of disaster
crime incidence and adolescent physical activity. Prev. Med. 85, damage using social media activity. Sci. Adv. 2(3), e1500779
74–77 (2016) (2016)
119. Ariel, B., Partridge, H.: Predictable policing: measuring the crime 139. Avvenuti, M., Cresci, S., La Polla, M.N., Meletti, C., Tesconi, M.:
control benefits of hotspots policing at bus stops. J. Quant. Crim- Nowcasting of earthquake consequences using big social data.
inol. 33(4), 809–833 (2017) IEEE Internet Comput. 6, 37–45 (2017)
120. Spinsanti, L., Berlingerio, M., Pappalardo, L.: Mobility and Geo- 140. Mendoza, M., Poblete, B., Valderrama, I.: Nowcasting earthquake
Social Networks, pp. 315–333. Cambridge University Press, Cam- damages with twitter. EPJ Data Sci. 8(1), 3 (2019)
bridge (2013). https://doi.org/10.1017/CBO9781139128926.017 141. Avvenuti, M., Cresci, S., Del Vigna, F., Tesconi, M.: Impromptu
121. Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, crisis mapping to prioritize emergency response. Computer 49(5),
methodological pitfalls, and ethical boundaries. Front. Big Data 28–37 (2016)
2, 13 (2019) 142. Avvenuti, M., Cresci, S., Del Vigna, F., Fagni, T., Tesconi, M.:
122. Rost, M., Barkhuus, L., Cramer, H., Brown, B.: Representation CrisMap: a big data crisis mapping system based on damage detec-
and communication: challenges in interpreting large social media tion and geoparsing. Inf. Syst. Front. 1, 1–19 (2018)
datasets. In: Proceedings of the 2013 Conference on Computer 143. Preis, T., Moat, H.S., Bishop, S.R., Treleaven, P., Stanley, H.E.:
Supported Cooperative Work, pp. 357–362. ACM (2013) Quantifying the digital traces of hurricane sandy on flickr. Sci.
123. Eichstaedt, J.C., Schwartz, H.A., Kern, M.L., Park, G., Labarthe, Rep. 3, 3141 (2013)
D.R., Merchant, R.M., Jha, S., Agrawal, M., Dziurzynski, L.A., 144. Chen, X., Cho, Y, Jang, S.Y.: Crime prediction using twitter senti-
Sap, M., et al.: Psychological language on twitter predicts county- ment and weather. In: 2015 Systems and Information Engineering
level heart disease mortality. Psychol. Sci. 26(2), 159–169 (2015) Design Symposium, pp. 63–68. IEEE (2015)
123
International Journal of Data Science and Analytics
145. Al Boni, M., Gerber, M.S.: Predicting crime with routine activity 163. Leetaru, K.: The GDELT Project. https://www.gdeltproject.org/.
patterns inferred from social media. In: 2016 IEEE Interna- Accessed Oct 2019 (2013)
tional Conference on Systems, Man, and Cybernetics (SMC), pp. 164. Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van
001233–001238. IEEE (2016) Der Goot, E., Halkia, M., Pouliquen, B., Belyaeva, J.: Sentiment
146. Kadar, C., Brüngger, R.R., Pletikosa, I.: Measuring ambient pop- analysis in the news. arXiv preprint arXiv:1309.6202 (2013)
ulation from location-based social networks to describe urban 165. Dehghan, A., Montgomery, L., Arciniegas-Mendez, M., Ferman-
crime. In: International Conference on Social Informatics, pp. Guerra, M.: Predicting News Bias (2016)
521–535. Springer (2017) 166. Grein, T.W., Kamara, K., Rodier, G., Plant, A.J., Bovier, P., Ryan,
147. Chen, F., Neill, D.B.: Non-parametric scan statistics for event M.J., Ohyama, T., Heymann, D.L.: Rumors of disease in the global
detection and forecasting in heterogeneous social media graphs. village: outbreak verification. Emerg. Infect. Dis. 6(2), 97 (2000)
In: Proceedings of the 20th ACM SIGKDD International Confer- 167. Heymann, D.L., Rodier, G.R., et al.: Hot spots in a wired world:
ence on Knowledge Discovery and Data Mining, pp. 1166–1175. Who surveillance of emerging and re-emerging infectious dis-
ACM (2014) eases. Lancet. Infect. Dis 1(5), 345–353 (2001)
148. Nobles, M., Neill, D.B., Flaxman, S.: Predicting and Preventing 168. Brownstein, J.S., Freifeld, C.C., Reis, B.Y., Mandl, K.D.: Surveil-
Emerging Outbreaks of Crime (2014) lance sans frontieres: Internet-based emerging infectious disease
149. Neill, D.B., Gorr, W.L.: Detecting and preventing emerging epi- intelligence and the healthmap project. PLoS Med. 5(7), e151
demics of crime. Adv. Dis. Surveill. 4(13), 18 (2007) (2008)
150. Colleoni, E., Rozza, A., Arvidsson, A.: Echo chamber or public 169. Wilson, K., Brownstein, J.S.: Early detection of disease outbreaks
sphere? Predicting political orientation and measuring political using the internet. CMAJ 180(8), 829–831 (2009)
homophily in Twitter using big data. J. Commun. 64(2), 317–332 170. Chunara, R., Andrews, J.R., Brownstein, J.S.: Social and news
(2014) media enable estimation of epidemiological patterns early in the
151. Goh, T.T., Xin, Z., Jin, D.: Habit formation in social media con- 2010 haitian cholera outbreak. Am. J. Trop. Med. Hyg. 86(1),
sumption: a case of political engagement. Behav. Inf. Technol. 39–45 (2012)
38(3), 273–288 (2019) 171. Alanyali, M., Moat, H.S., Preis, T.: Quantifying the relationship
152. Ferrara, E.: Manipulation and abuse on social media. ACM SIG- between financial news and the stock market. Sci. Rep. 3, 3578
WEB Newsl. 2015(Spring), 4 (2015) (2013)
153. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, 172. Lillo, F., Miccichè, S., Tumminello, M., Piilo, J., Mantegna, R.N.:
M.: The paradigm-shift of social spambots: evidence, theories, How news affects the trading behaviour of different categories of
and tools for the arms race. In: Proceedings of the 26th Interna- investors in a financial market. Quant. Finance 15(2), 213–229
tional Conference on World Wide Web Companion, International (2015)
World Wide Web Conferences Steering Committee, pp 963–972 173. Kleinschmit, D., Sjöstedt, V.: Between science and politics:
(2017) Swedish newspaper reporting on forests in a changing climate.
154. Goldstein, B.A., Navar, A.M., Pencina, M.J., Ioannidis, J.: Oppor- Environ. Sci. Policy 35, 117–127 (2014)
tunities and challenges in developing risk prediction models with 174. Boykoff, M.T.: Lost in translation? united states television news
electronic health records data: a systematic review. J. Am. Med. coverage of anthropogenic climate change, 1995–2004. Clim.
Inform. Assoc. 24(1), 198–208 (2017) Change 86(1–2), 1–11 (2008)
155. Wilson, P.W., D’Agostino, R.B., Levy, D., Belanger, A.M., Silber- 175. Van Aelst, P., De Swert, K.: Politics in the News: Do Campaigns
shatz, H., Kannel, W.B.: Prediction of coronary heart disease using Matter? A Comparison of Political News During Election Periods
risk factor categories. Circulation 97(18), 1837–1847 (1998) and Routine Periods in Flanders (Belgium). Walter de Gruyter
156. Sultana, J., Leal, I., de Wilde, M., de Ridder, M., van der Lei, GmbH & Co, KG, Belgium (2009)
J., Sturkenboom, M., et al.: Identifying data elements to measure 176. Eurostat Practical Guide for Processing Supermarket Scanner
frailty in a dutch nationwide electronic medical record database Data (2017)
for use in postmarketing safety evaluation: an exploratory study. 177. Griffith, R., O’Connell, M.: The use of scanner data for research
Drug Saf. 12, 1–7 (2019) into nutrition. Fiscal Stud. 30(3–4), 339–365 (2009)
157. Ghaderighahfarokhi, S., Sadeghifar, J.: A model to predict low 178. Baron, S., Lock, A.: The challenges of scanner data. J. Oper. Res.
birth weight infants and affecting factors using data mining tech- Soc. 46(1), 50–61 (1995)
niques. J. Basic Res. Med. Sci. 5(3), 1–8 (2018) 179. Eurostat Practical Guide for Processing Supermarket Scanner
158. Metzger, M.H., Tvardik, N., Gicquel, Q., Bouvry, C., Poulet, E., Data. https://circabc.europa.eu/sd/a/8e1333df-ca16-40fc-bc6a-
Potinet-Pagliaroli, V.: Use of emergency department electronic 1ce1be37247c/Practical-Guide-Supermarket. Accessed Oct
medical records for automated epidemiological surveillance of 2019 (2017)
suicide attempts: a french pilot study. Int. J. Methods Psychiatric 180. Diewert, W.E.: Harmonized indexes of consumer prices: their con-
Res. 26(2), e1522 (2017) ceptual foundations (2002)
159. Mhaskar, H.N., Pereverzyev, S.V., van der Walt, M.D.: A deep 181. Magruder, S.: Evaluation of over-the-counter pharmaceutical
learning approach to diabetic blood glucose prediction. Front. sales as a possible early warning indicator of human disease. Johns
Appl. Math. Stat. 3, 14 (2017) Hopkins Univ. APL Tech. Dig. 24(4), 349–353 (2003)
160. Santillana, M., Nsoesie, E.O., Mekaru, S.R., Scales, D., Brown- 182. Bonnet, C., Dubois, P., Réquillart, V.: The dynamics of satured fat
stein, J.S.: Using clinicians’ search query data to monitor influenza consumption in france. Technical. report. Toulouse mimeo (2008)
epidemics. Clin. Infect. Dis. Off. Publ. Infect. Dis. Soc. Am. 183. Griffith, R., Leibtag, E., Leicester, A., Nevo, A.: Consumer shop-
59(10), 1446 (2014) ping behavior: how much do consumers save? J. Econ. Perspect.
161. Althoff, T., Hicks, J.L., King, A.C., Delp, S.L., Leskovec, J., 23(2), 99–120 (2009)
et al.: Large-scale physical activity data reveal worldwide activity 184. Janssen, A., Parslow, E.: Pregnancy and alcohol purchases: evi-
inequality. Nature 547(7663), 336 (2017) dence from scanner data. Avail. SSRN 3446559, 12 (2019)
162. Hayeri, A.: Predicting future glucose fluctuations using machine 185. Rider, J., Berck, P., Villas-Boas, S.B.: Eating Healthy in Lean
learning and wearable sensor data. Diabetes (2018). https://doi. Times: The Relationship Between Unemployment and Grocery
org/10.2337/db18-738-P Purchasing Patterns (2012)
123
International Journal of Data Science and Analytics
186. Van der Grient, H.A., de Haan, J.: The use of supermarket scanner 209. McCarthy, M.J.: Internet monitoring of suicide risk in the popu-
data in the dutch cpi. In: Joint ECE/ILO Workshop on Scanner lation. J. Affect. Disord. 122(3), 277–279 (2010)
Data, vol. 10 (2010) 210. Kristoufek, L., Moat, H.S., Preis, T.: Estimating suicide occur-
187. Silver, M., Heravi, S.: Scanner data and the measurement of infla- rence statistics using google trends. EPJ Data Sci. 5(1), 32 (2016)
tion. Econ. J. 111(472), 383–404 (2001) 211. Adler, N., Cattuto, C., Kalimeri, K., Paolotti, D., Tizzoni, M.,
188. Pennacchioli, D., Coscia, M., Rinzivillo, S., Giannotti, F., Verhulst, S., Yom-Tov, E., Young, A.: How search engine data
Pedreschi, D.: The retail market as a complex system. EPJ Data enhance the understanding of determinants of suicide in india
Sci. 3(1), 33 (2014) and inform prevention: observational study. J. Med. Internet Res.
189. Sobolevsky, S., Massaro, E., Bojic, I., Arias, J.M., Ratti, C.: Pre- 21(1), e10179 (2019). https://doi.org/10.2196/10179
dicting regional economic indices using big data of individual 212. Ettredge, M., Gerdes, J., Karuga, G.: Using web-based search
bank card transactions. In: 2017 IEEE International Conference data to predict macroeconomic statistics. Commun. ACM 48(11),
on Big Data (Big Data), pp. 1313–1318. IEEE (2017) 87–92 (2005)
190. Panzone, L.A., Wossink, A., Southerton, D.: The design of an 213. Askitas, N., Zimmermann, K.: Google econometrics and unem-
environmental index of sustainable food consumption: a pilot ployment forecasting. Appl. Econ. Quart. 55(2), 107–120 (2009)
study using supermarket data. Ecol. Econ. 94, 44–55 (2013) 214. Francesco/FD D, Marcucci J “google it!” forecasting the us
191. Gadema, Z., Oglethorpe, D.: The use and usefulness of carbon unemployment rate with a google job search index. Mpra paper.
labelling food: a policy perspective from a survey of uk super- University Library of Munich, Germany. https://EconPapers.
market shoppers. Food Policy 36(6), 815–822 (2011) repec.org/RePEc:pra:mprapa:18248 (2009)
192. Brancoli, P., Rousta, K., Bolton, K.: Life cycle assessment of 215. Suhoy, T., et al.: Query indices and a 2008 downturn: Israeli data.
supermarket food waste. Resour. Conserv. Recycl. 118, 39–46 Technical report. Bank of Israel (2009)
(2017) 216. Baker, S., Fradkin, A., et al.: What drives job search? evidence
193. Scholz, K., Eriksson, M., Strid, I.: Carbon footprint of supermar- from google search data. Discussion Papers, pp. 10–20 (2011)
ket food waste. Resour. Conserv. Recycl. 94, 56–65 (2015) 217. McLaren, N., Shanbhogue, R.: Using internet search data as eco-
194. Goel, S., Hofman, J.M., Lahaie, S., Pennock, D.M., Watts, D.J.: nomic indicators. Bank Engl. Quart. Bull. 51(2), 134–140 (2011)
Predicting consumer behavior with web search. Proc. Nat. Acad. 218. Choi, H., Varian, H.: Predicting initial claims for unemployment
Sci. 107(41), 17486–17490 (2010) benefits. Google Inc, pp. 1–5 (2009)
195. Cooper, C.P., Mallon, K.P., Leadbetter, S., Pollack, L.A., Peipins, 219. Choi, H., Varian, H.: Predicting the present with google trends.
L.A.: Cancer internet search activity on a major search engine, Econ. Rec. 88, 2–9 (2012)
united states 2001–2003. J. Med. Internet Res. 7(3), e36 (2005) 220. Koop, G., Onorante, L.: Macroeconomic nowcasting using google
196. Polgreen, P.M., Chen, Y., Pennock, D.M., Nelson, F.D., Wein- probabilities. In: First International Conference on Advanced
stein, R.A.: Using internet searches for influenza surveillance. Research Methods and Analytics, CARMA2016. https://doi.org/
Clin. Infect. Dis. 47(11), 1443–1448 (2008) 10.4995/CARMA2016.2016.4213 (2016)
197. Hulth, A., Rydevik, G., Linde, A.: Web queries as a source for 221. Guzman, G.: Internet search behavior as an economic forecasting
syndromic surveillance. PLoS ONE 4(2), e4378 (2009) tool: the case of inflation expectations. J. Econ. Soc. Meas. 36(3),
198. Yuan, Q., Nsoesie, E.O., Lv, B., Peng, G., Chunara, R., Brown- 119–167 (2011)
stein, J.S.: Monitoring influenza epidemics in china with search 222. Preis, T., Reith, D., Stanley, H.E.: Complex dynamics of our eco-
query from baidu. PLoS ONE 8(5), e64323 (2013) nomic life on different scales: insights from search engine query
199. Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, data. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 368(1933),
M.S., Brilliant, L.: Detecting influenza epidemics using search 5707–5719 (2010). https://doi.org/10.1098/rsta.2010.0284
engine query data. Nature 457(7232), 1012 (2009) 223. Preis, T., Moat, H.S., Stanley, H.E.: Quantifying trading behavior
200. Google: Google Flu Trends. http://www.google.org/flutrends. in financial markets using google trends. Sci. Rep. (2013). https://
Accessed Oct 2019 (2008) doi.org/10.1038/srep01684
201. Nsoesie, E., Mararthe, M., Brownstein, J.: Forecasting peaks of 224. Curme, C., Preis, T., Stanley, H.E., Moat, H.S.: Quantifying the
seasonal influenza epidemics. PLoS Curr. 5, 8 (2013) semantics of search behavior before stock market moves. Proc.
202. Yang, W., Lipsitch, M., Shaman, J.: Inference of seasonal and Natl. Acad. Sci. 111(32), 11600–11605 (2014). https://doi.org/
pandemic influenza transmission dynamics. Proc. Nat. Acad. Sci. 10.1073/pnas.1324054111
112(9), 2723–2728 (2015) 225. Bordino, I., Battiston, S., Caldarelli, G., Cristelli, M., Ukkonen,
203. Wilson, N., Mason, K., Tobias, M., Peacey, M., Huang, Q., Baker, A., Weber, I.: Web search queries can predict stock market vol-
M.: Interpreting “google flu trends” data for pandemic h1n1 umes. PLoS ONE 7(7), e40014 (2012)
influenza: the new zealand experience. Eurosurveillance 14(44), 226. Moat, H.S., Curme, C., Avakian, A., Kenett, D.Y., Stanley, H.E.,
19386 (2009) Preis, T.: Quantifying wikipedia usage patterns before stock mar-
204. Chan, E.H., Sahai, V., Conrad, C., Brownstein, J.S.: Using web ket moves. Sci. Rep. 3, 1801 (2013)
search query data to monitor dengue epidemics: a new model for 227. Qi, H., Manrique, P., Johnson, D., Restrepo, E., Johnson, N.F.:
neglected tropical disease surveillance. PLoS Neglect. Trop. Dis. Open source data reveals connection between online and on-street
5(5), e1206 (2011) protest activity. EPJ Data Sci. 5(1), 18 (2016a)
205. Althouse, B.M., Ng, Y.Y., Cummings, D.A.: Prediction of dengue 228. Qi, H., Manrique, P., Johnson, D., Restrepo, E., Johnson, N.F.:
incidence using search query surveillance. PLoS Neglect. Trop. Association between volume and momentum of online searches
Dis. 5(8), e1258 (2011) and real-world collective unrest. Results Phys. 6, 414–419 (2016b)
206. Dukic, V.M., David, M.Z., Lauderdale, D.S.: Internet queries and 229. Chykina, V., Crabtree, C.: Using google trends to mea-
methicillin-resistant staphylococcus aureus surveillance. Emerg. sure issue salience for hard-to-survey populations. Socius 4,
Infect. Dis. 17(6), 1068 (2011) 2378023118760414 (2018)
207. Ocampo, A.J., Chunara, R., Brownstein, J.S.: Using search queries 230. Reilly, S., Richey, S., Taylor, J.B.: Using google search data for
for malaria surveillance, Thailand. Malaria J. 12(1), 390 (2013) state politics research: an empirical validity test using roll-off data.
208. Yang, A.C., Tsai, S.J., Huang, N.E., Peng, C.K.: Association of State Polit. Policy Quart. 12(2), 146–159 (2012)
internet search trends with suicide death in taipei city, taiwan,
2004–2009. J. Affect. Disord. 132(1–2), 179–184 (2011)
123
International Journal of Data Science and Analytics
231. Kleemann, F., Voß, G.G., Rieder, K.: Un (der) paid innovators: the 249. Grainger, A.: Citizen observatories and the new earth observation
commercial utilization of consumer work through crowdsourcing. science. Remote Sens. 9(2), 153 (2017)
Sci. Technol. Innov. Stud. 4(1), 5–26 (2008) 250. Schneider, P., Castell, N., Vogt, M., Lahoz W., Bartonova A.:
232. Behrend, T.S., Sharek, D.J., Meade, A.W., Wiebe, E.N.: The via- Making sense of crowdsourced observations: data fusion tech-
bility of crowdsourcing for survey research. Behav. Res. Methods niques for real-time mapping of urban air quality. In: EGU General
43(3), 800 (2011) Assembly Conference Abstracts, p. 17 (2015)
233. Paolotti, D., Carnahan, A., Colizza, V., Eames, K., Edmunds, J., 251. Meier, F., Fenner, D., Grassmann, T., Jänicke, B., Otto, M.,
Gomes, G., Koppeschaar, C., Rehn, M., Smallenburg, R., Turbe- Scherer, D.: Challenges and benefits from crowd sourced atmo-
lin, C., et al.: Web-based participatory surveillance of infectious spheric data for urban climate research using Berlin, Germany, as
diseases: the influenzanet participatory surveillance experience. testbed. In: ICUC9–9th International Conference on Urban Cli-
Clin. Microbiol. Infect. 20(1), 17–21 (2014) mate jointly with 12th Symposium on the Urban Environment
234. Dalton, C., Durrheim, D., Fejsa, J., Francis, L., Carlson, S., (2015)
d’Espaignet, E.T., Tuyl, F., et al.: Flutracking: a weekly australian 252. Chapman, L., Bell, C., Bell, S.: Can the crowdsourcing data
community online survey of influenza-like illness in 2006, 2007 paradigm take atmospheric science to a new level? a case study of
and 2008. Commun. Dis. Intell. Quart. Rep. 33(3), 316 (2009) the urban heat island of london quantified using netatmo weather
235. Smolinski, M.S., Crawley, A.W., Baltrusaitis, K., Chunara, R., stations. Int. J. Climatol. 37(9), 3597–3605 (2017)
Olsen, J.M., Wójcik, O., Santillana, M., Nguyen, A., Brownstein, 253. Lea, S.G., D’Silva, E., Asok, A.: Women’s strategies addressing
J.S.: Flu near you: crowdsourced symptom reporting spanning sexual harassment and assault on public buses: an analysis of
2 influenza seasons. Am. J. Public Health 105(10), 2124–2130 crowdsourced data. Crime Prev. Commun. Saf. 19(3–4), 227–239
(2015) (2017)
236. Hashemian, M., Knowles, D., Calver, J., Qian, W., Bullock, MC., 254. Gosselt, J.F., Van Hoof, J.J., Gent, B.S., Fox, J.P.: Violent frames:
Bell, S., Mandryk, R.L., Osgood, N., Stanley, K.G.: iepi: an end analyzing internet movie database reviewers’ text descriptions of
to end solution for collecting, conditioning and utilizing epi- media violence and gender differences from 39 years of us action,
demiologically relevant data. In: Proceedings of the 2nd ACM thriller, crime, and adventure movies. Int. J. Commun. 9, 547–567
International Workshop on Pervasive Wireless Healthcare. pp. 3– (2015)
8. ACM (2012) 255. Ozkan, T., Worrall, J.L., Zettler, H.: Validating media-driven and
237. Madan, A., Cebrian, M., Moturu, S., Farrahi, K., et al.: Sensing crowdsourced police shooting data: a research note. J. Crime Jus-
the “health state” of a community. IEEE Pervasive Comput. 11(4), tice 41(3), 334–345 (2018)
36–45 (2011) 256. Avvenuti, M., Bellomo, S., Cresci, S., La Polla, M.N., Tesconi, M.:
238. Martinucci, I., Natilli, M., Lorenzoni, V., Pappalardo, L., Mon- Hybrid crowdsensing: A novel paradigm to combine the strengths
reale, A., Turchetti, G., Pedreschi, D., Marchi, S., Barale, R., of opportunistic and participatory crowdsensing. In: Proceed-
de Bortoli, N.: Gastroesophageal reflux symptoms among ital- ings of the 26th International Conference on World Wide Web
ian university students: epidemiology and dietary correlates using Companion, International World Wide Web Conferences Steer-
automatically recorded transactions. BMC Gastroenterol. 18(1), ing Committee, pp. 1413–1421 (2017)
116 (2018) 257. Dennis, J.: United by what divides us: 38 degrees and the eu
239. Green, T.C., Huang, R., Wen, Q., Zhou, D.: Crowdsourced referendum. In: EU Referendum Analysis 2016: Media, Voters
employer reviews and stock returns. J. Financ. Econ. 2, 18 (2019) and the Campaign. Bournemouth University, p. 100 (2016)
240. Dabirian, A., Kietzmann, J., Diba, H.: A great place to work!? 258. Yasseri, T., Bright, J.: Wikipedia traffic data and electoral predic-
understanding crowdsourced employer branding. Bus. Horiz. tion: towards theoretically informed models. EPJ Data Sci. 5(1),
60(2), 197–205 (2017) 22 (2016)
241. Könsgen, R., Schaarschmidt, M., Ivens, S., Munzel, A.: Finding 259. Gellers, J.C.: Crowdsourcing global governance: sustainable
meaning in contradiction on employee review sites-effects of dis- development goals, civil society, and the pursuit of democratic
crepant online reviews on job application intentions. J. Interact. legitimacy. Int. Environ. Agreements Polit. Law Econ. 16(3), 415–
Mark. 43, 165–177 (2018) 432 (2016)
242. Tingzon, I., Orden, A., Sy, S., Sekara, V., Weber, I., Fatehkia, 260. Burger, R.: Aristotle’s Dialogue with Socrates: On the “Nico-
M., Herranz, M.G., Kim, D.: Mapping Poverty in the Philippines machean Ethics”. University of Chicago Press, Chicago (2009)
Using Machine Learning, Satellite Imagery, and Crowd-sourced 261. Diener, E.: Subjective well-being. Psychol. Bull. 95(3), 542
Geospatial Information (missing year) (1984)
243. OpenStreetMap Community Openstreetmap. https://www. 262. Veenhoven, R.: How do we assess how happy we are? tenets,
openstreetmap.org/#map=5/42.088/12.564. Accessed Oct 2019 implications and tenability of three theories. Happiness Econ.
(2004) Polit. 25, 45–69 (2009)
244. Piaggesi, S., Gauvin, L., Tizzoni, M., Cattuto, C., Adler, N., Ver- 263. Alesina, A., Di Tella, R., MacCulloch, R.: Inequality and hap-
hulst, S., Young, A., Price, R., Ferres, L., Panisson, A.: Predicting piness: are europeans and americans different? J. Public Econ.
city poverty using satellite imagery. In: Proceedings of the IEEE 88(9–10), 2009–2042 (2004)
Conference on Computer Vision and Pattern Recognition Work- 264. Watson, D., Clark, L.A., Tellegen, A.: Development and valida-
shops, pp. 90–96 (2019) tion of brief measures of positive and negative affect: the PANAS
245. Abelson, B., Varshney, K.R., Sun, J.: Targeting direct cash trans- scales. J. Pers. Soc. Psychol. 54(6), 1063 (1988)
fers to the extremely poor. In: Proceedings of the 20th ACM 265. Watson, D., Clark, L.A.: The Panas-x: Manual for the Positive
SIGKDD International Conference on Knowledge Discovery and and Negative Affect Schedule-Expanded Form. Psychology Pub-
Data Mining, pp. 1563–1572. ACM (2014) lications, New York (1999)
246. Hersman, E., Okolloh, O., Rotich, J., Kobia, D.: Ushahidi. https:// 266. Diener, E., Oishi, S., Tay, L.: Advances in subjective well-being
www.ushahidi.com. Accessed Oct 2019 (2008) research. Nat. Hum. Behav. 2, 1 (2018)
247. Meier, P.: Digital Humanitarians: How Big Data is Changing the 267. Hudson, N.W., Anusic, I., Lucas, R.E., Donnellan, M.B.: Com-
Face of Humanitarian Response. Routledge, London (2015) paring the reliability and validity of global self-report measures of
248. European Commission Citizens’ Observatories. https://www. subjective well-being with experiential day reconstruction mea-
ushahidi.com. Accessed Oct 2019 (2016) sures. Assessment 2, 26 (2017)
123
International Journal of Data Science and Analytics
268. Anusic, I., Schimmack, U.: Stability and change of personality 291. Hudson, J.: Institutional trust and subjective well-being across the
traits, self-esteem, and well-being: introducing the meta-analytic eu. Kyklos 59(1), 43–62 (2006)
stability and change model of retest correlations. J. Pers. Soc. 292. Hayo, B. Happiness in Eastern Europe. Marburg Economic Work-
Psychol. 110(5), 766 (2016) ing Paper No 12 (2004)
269. Tay, L., Chan, D., Diener, E.: The metrics of societal happiness. 293. Ferrer-i Carbonell, A., Gowdy, J.M.: Environmental degradation
Soc. Indic. Res. 117(2), 577–600 (2014) and happiness. Ecol. Econ. 60(3), 509–516 (2007)
270. Deaton, A.: Income, health, and well-being around the world: 294. Gardner, J., Oswald, A.J.: Money and mental wellbeing: a longi-
evidence from the gallup world poll. J. Econ. Perspect. 22(2), tudinal study of medium-sized lottery wins. J. Health Econ. 26(1),
53–72 (2008) 49–60 (2007)
271. Easterlin, R.A., Angelescu, L.: Happiness and growth the world 295. Tay, L., Zyphur, M., Batz, C.: Income and Subjective Well-Being:
over: time series evidence on the happiness-income paradox. Review, Synthesis, and Future Research. Handbook of Well-
Technical report. Institute of Labor Economics (IZA) (2009) Being. DEF Publishers, Salt Lake City (2017)
272. Kahneman, D., Deaton, A.: High income improves evaluation of 296. Wijngaards, I., Hendriks, M., Burger, M.J.: Steering towards hap-
life but not emotional well-being. Proc. Nat. Acad. Sci. 107(38), piness: an experience sampling study on the determinants of
16489–16493 (2010) happiness of truck drivers. Transp. Res. Part A Policy Pract. 128,
273. Frijters, P., Beatton, T.: The mystery of the u-shaped relationship 131–148 (2019)
between happiness and age. J. Econ. Behav. Organ. 82(2–3), 525– 297. van der Zwan, P., Hessels, J., Burger, M.: Happy free willies?
542 (2012) Investigating the relationship between freelancing and subjective
274. Stevenson, B., Wolfers, J.: The paradox of declining female hap- well-being. Small Bus. Econ. 8, 1–17 (2019)
piness. Am. Econ. J. Econ. Policy 1(2), 190–225 (2009) 298. Blanchflower, D.G., Bell, D.N., Montagnoli, A., Moro, M.:
275. Deaton, A., Stone, A.A.: Understanding context effects for a mea- The happiness trade-off between unemployment and inflation. J.
sure of life evaluation: how responses matter. Oxf. Econ. Pap. Money Credit Bank. 46(S2), 117–141 (2014)
68(4), 861–870 (2016) 299. Knabe, A., Schöb, R., Weimann, J.: Partnership, gender, and the
276. Yap, S.C., Wortman, J., Anusic, I., Baker, S.G., Scherer, L.D., well-being cost of unemployment. Soc. Indic. Res. 129(3), 1255–
Donnellan, M.B., Lucas, R.E.: The effect of mood on judgments 1275 (2016)
of subjective well-being: nine tests of the judgment model. J. Pers. 300. Brulé, G., Veenhoven, R.: Why are Latin Europeans less happy?
Soc. Psychol. 113(6), 939 (2017) Polyphonic Anthropology-Theoretical and Empirical Cross-
277. Lucas, R.E., Lawless, N.M.: Does life seem better on a sunny Cultural Fieldwork. The Impact of Hierarchy. InTech (2012)
day? Examining the association between daily weather conditions 301. Bartolini, S., Mikucka, M., Sarracino, F.: Money, trust and happi-
and life satisfaction judgments. J. Pers. Soc. Psychol. 104(5), 872 ness in transition countries: evidence from time series. Soc. Indic.
(2013) Res. 130(1), 87–106 (2017)
278. Kahneman, D., Diener, E., Schwarz, N.: Well-Being: Founda- 302. Ott, J.C.: Good governance and happiness in nations: technical
tions of Hedonic Psychology. Russell Sage Foundation, New York quality precedes democracy and quality beats size. J. Happiness
(1999) Stud. 11(3), 353–368 (2010)
279. Kahneman, D., Krueger, A.B., Schkade, D.A., Schwarz, N., Stone, 303. Fowler, J.H., Christakis, N.A.: Dynamic spread of happiness in
A.A.: A survey method for characterizing daily life experience: a large social network: longitudinal analysis over 20 years in the
the day reconstruction method. Science 306(5702), 1776–1780 framingham heart study. BMJ 337, a2338 (2008)
(2004) 304. Luhmann, M.: Using big data to study subjective well-being. Curr.
280. Courvoisier, D.S., Eid, M., Lischetzke, T.: Compliance to a cell Opin. Behav. Sci. 18, 28–33 (2017)
phone-based ecological momentary assessment study: the effect 305. Nederhof, A.J.: Methods of coping with social desirability bias:
of time and personality characteristics. Psychol. Assess. 24(3), a review. Eur. J. Soc. Psychol. 15(3), 263–280 (1985)
713 (2012) 306. Quercia, D., Ellis, J., Capra, L., Crowcroft, J.: Tracking gross
281. Shiffman, S., Stone, A.A., Hufford, M.R.: Ecological momentary community happiness from tweets. In: Proceedings of the ACM
assessment. Annu. Rev. Clin. Psychol. 4, 1–32 (2008) 2012 Conference on Computer Supported Cooperative Work, pp.
282. Eid, M.E., Diener, E.E.: Handbook of Multimethod Measurement 965–968. ACM (2012)
in Psychology. American Psychological Association, New York 307. Bollen, J., Gonçalves, B., van de Leemput, I., Ruan, G.: The hap-
(2006) piness paradox: your friends are happier than you. EPJ Data Sci.
283. Diener, E., Seligman, M.E.: Beyond money: toward an economy 6(1), 4 (2017)
of well-being. Psychol. Sci. Public Interest 5(1), 1–31 (2004) 308. Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe,
284. Costa, P.T., McCrae, R.R.: Influence of extraversion and neuroti- J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder:
cism on subjective well-being: happy and unhappy people. J. Pers. a system for subjectivity analysis. In: Proceedings of hlt/emnlp on
Soc. Psychol. 38(4), 668 (1980) Interactive Demonstrations. Association for Computational Lin-
285. Zweig, J.S.: Are women happier than men? Evidence from the guistics, pp. 34–35 (2005)
Gallup World Poll. J. Happiness Stud. 16(2), 515–541 (2015) 309. Bollen, J., Gonçalves, B., Ruan, G., Mao, H.: Happiness is assor-
286. Deaton, A.S., Tortora, R.: People in Sub-Saharan Africa rate their tative in online social networks. Artif. Life 17(3), 237–251 (2011)
health and health care among the lowest in the world. Health Aff. 310. Kramer, A.D., Guillory, J.E., Hancock, J.T.: Experimental evi-
34(3), 519–527 (2015) dence of massive-scale emotional contagion through social net-
287. Veenhoven, R., Ehrhardt, J.: The cross-national pattern of hap- works. In: Proceedings of the National Academy of Sciences, p.
piness: test of predictions implied in three theories of happiness. 201320040 (2014)
Soc. Indic. Res. 34(1), 33–68 (1995) 311. Lim, K.H., Lee, K.E., Kendal, D., Rashidi, L., Naghizade, E.,
288. Cuñado, J., de Gracia, F.P.: Does education affect happiness? Evi- Winter, S., Vasardani, M.: The grass is greener on the other side:
dence for spain. Soc. Indic. Res. 108(1), 185–196 (2012) Understanding the effects of green spaces on twitter user senti-
289. Nikolaev, B.: Does higher education increase hedonic and eudai- ments. In: Companion of the The Web Conference 2018 on The
monic happiness? J. Happiness Stud. 19(2), 483–504 (2018) Web Conference 2018, International World Wide Web Confer-
290. Rehdanz, K., Maddison, D.: Climate and happiness. Ecol. Econ. ences Steering Committee, pp. 275–282 (2018)
52(1), 111–125 (2005)
123
International Journal of Data Science and Analytics
312. Mitchell, L., Frank, M.R., Harris, K.D., Dodds, P.S., Danforth, 327. Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced
C.M.: The geography of happiness: connecting twitter sentiment data management: overview and challenges. In: Proceedings of
and expression, demographics, and objective characteristics of the 2017 ACM International Conference on Management of Data,
place. PLoS ONE 8(5), e64417 (2013) pp. 1711–1716. ACM (2017)
313. Golder, S.A., Macy, M.W.: Diurnal and seasonal mood vary 328. Lathia, N., Sandstrom, G.M., Mascolo, C., Rentfrow, P.J.: Happier
with work, sleep, and daylength across diverse cultures. Science people live more active lives: using smartphones to link happiness
333(6051), 1878–1881 (2011) and physical activity. PLoS ONE 12(1), e0160589 (2017)
314. Lansdall-Welfare, T., Lampos, V., Cristianini, N.: Nowcasting the 329. Asai, A., Evensen, S., Golshan, B., Halevy, A., Li, V., Lopatenko,
mood of the nation. Significance 9(4), 26–28 (2012) A., Stepanov, D., Suhara, Y., Tan, W.C., Xu, Y. Happydb: a cor-
315. Cresci, S., La Polla, M.N., Mazza, M., Tesconi, M., Del Vigna, pus of 100,000 crowdsourced happy moments. arXiv preprint
F.: #selfie: mapping the phenomenon. Consiglio Nazioonale delle arXiv:1801.07746 (2018)
Ricerche IIT TR-08/2016 Technical Report (2016) 330. Bogomolov, A., Lepri, B., Pianesi, F.: Happiness recognition
316. Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: from mobile phone data. In: Social Computing (SocialCom),
twitter sentiment and socio-economic phenomena. ICWSM 11, 2013 International Conference on Social Computing, pp. 790–
450–453 (2011) 795. IEEE (2013)
317. Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Dan- 331. Goldberg, L.R.: An alternative “description of personality”: the
forth, C.M.: Temporal patterns of happiness and information in big-five factor structure. J. Pers. Soc. Psychol. 59(6), 1216 (1990)
a global social network: hedonometrics and twitter. PLoS ONE 332. Carlquist, E., Nafstad, H.E., Blakar, R.M., Ulleberg, P., Delle
6(12), e26752 (2011) Fave, A., Phelps, J.M.: Well-being vocabulary in media language:
318. Iacus, S.M., Porro, G., Salini, S., Siletti, E.: Social networks, an analysis of changing word usage in Norwegian newspapers. J.
happiness and health: from sentiment analysis to a multidi- Positive Psychol. 12(2), 99–109 (2017)
mensional indicator of subjective well-being. arXiv preprint 333. Seligman, M.E.: Flourish: A New Understanding of Happiness
arXiv:1512.01569 (2015) and Well-Being and How to Achieve Them. Nicholas Brealey,
319. Ceron, A., Curini, L., Iacus, S.M.: Social Media e Sentiment Anal- Boston (2011)
ysis: L’evoluzione dei fenomeni sociali attraverso la Rete, vol. 9. 334. Greco, M., Stenner, P.: Happiness and the art of life: diagnosing the
Springer, New York (2014) psychopolitics of wellbeing. Health Cult. Soc. 5(1), 1–19 (2013)
320. Ceron, A., Curini, L., Iacus, S.M.: ISA: a fast, scalable and accu- 335. Coulton, C.J., Goerge, R., Putnam-Hornstein, E., de Haan, B.:
rate algorithm for sentiment analysis of social media content. Inf. Harnessing Big Data for Social Good: A Grand Challenge for
Sci. 367, 105–124 (2016) Social Work, pp. 1–20. American Academy of Social Work and
321. Curini, L., Iacus, S., Canova, L.: Measuring idiosyncratic happi- Social Welfare, Cleveland (2015)
ness through the analysis of twitter: an application to the italian 336. Lepri, B., Staiano, J., Sangokoya, D., Letouzé, E., Oliver, N.: The
case. Soc. Indic. Res. 121(2), 525–542 (2015) tyranny of data? The bright and dark sides of data-driven decision-
322. Durahim, A.O., Coşkun, M.: # iamhappybecause: gross national making for social good. In: Transparent Data Mining for Big and
happiness through twitter analysis and big data. Technol. Forecast. Small Data, pp. 3–24. Springer (2017)
Soc. Change 99, 92–105 (2015) 337. Floridi, L., Taddeo, M.: What is data ethics? The Royal Society
323. Coviello, L., Sohn, Y., Kramer, A.D., Marlow, C., Franceschetti, (2016)
M., Christakis, N.A., Fowler, J.H.: Detecting emotional contagion 338. Hand, D.J.: Aspects of data ethics in a changing world: where are
in massive social networks. PLoS ONE 9(3), e90315 (2014) we now? Big Data 6(3), 176–190 (2018)
324. Algan, Y., Beasley, E., Guyot, F., Higa, K., Murtin, F., Senik, C.,
et al. Big Data Measures of Well-Being: Evidence from a Google
Well-Being Index in the United States. OECD Statistics Working
Publisher’s Note Springer Nature remains neutral with regard to juris-
Papers 2016 (2016)
dictional claims in published maps and institutional affiliations.
325. Lane, N.D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T.,
Campbell, A.T.: A survey of mobile phone sensing. IEEE Com-
mun. Mag. 48(9), 140–150 (2010)
326. Staiano, J., Lepri, B., Aharony, N., Pianesi, F., Sebe, N., Pentland,
A.: Friends don’t lie: inferring personality traits from social net-
work structure. In: Proceedings of the 2012 ACM Conference on
Ubiquitous Computing, pp. 321–330. ACM (2012)
123