CHAPTER 5
Getting a Sense of Big Data and Well-being
5.1 What Even Is ‘Big Data’?
Big data generally capture what is easy to ensnare—data that are openly
expressed (what is typed, swiped, scanned, sensed, etc.; people’s actions and
behaviours; the movement of things)—as well as data that are the ‘exhaust’,
a by-product … It takes these data at face value, despite the fact that they
may not have been designed to answer specific questions and the data pro-
duced might be messy and dirty. (Kitchin 2014, Chap. 2, p. 3 of individual
chapter version)
Rob Kitchin is possibly one of the most cited definers of ‘Big Data’,
opening books and dissertations up and down the land. Yet, as we are
about to discover, Kitchin himself tells us that while the term ‘Big Data’ is
repeatedly defined (Kitchin 2014, Chap. 2, p. 3), big data themselves defy
categorical labelling. So, it is not clear-cut, because differentiating what
‘it’ is and what they are not is often side-stepped, or comes with caveats.1
We encountered something similar before, if you remember, in Chap. 2.
When it comes to understanding what well-being is, those inclined to
measure are sometimes keen to measure well-being to understand it,
rather than define what it is that is being measured. In a similar way, those
describing Big Data are often more concerned with what Big Data does (or
do), rather than what Big Data is, or are.
© The Author(s) 2021 175
S. Oman, Understanding Well-being Data,
New Directions in Cultural Policy Research,
https://doi.org/10.1007/978-3-030-72937-0_5
176 S. OMAN
In this chapter on Big Data, we will discover that how they are used
can defy some of the old definitions of how to use data or what data are
for. So, let us start with some definitions and what is different. For
Kitchin, the lack of ‘ontological clarity’ of Big Data (as the individual
concepts and categories of Big Data and the relations between them)
means the term acts as a vague, catch-all label for a wide selection of data
(Kitchin 2014, Chap. 2, p. 3). Despite this, he has reviewed how other
people define it and proposes the key traits of Big Data. These qualities
are outlined in Table 5.1. Given the word ‘big’, it is probably no surprise
that volume is one of ‘the 3Vs’ identified by Doug Laney back in 2001.
The other two being velocity and variety. Other qualities include
exhaustivity, resolution, indexicality, relationality, extensionality and
Table 5.1 Ways that Big Data are different
Label/definition Origin Meaning Pre Big Data Big Data
Volume Laney (2001)Consisting of enormous Limited to Very large
quantities of data large
Velocity Laney (2001) Created in real-time Slow, Fast,
freeze- continuous
framed/
bundled
Variety Laney (2001) Being structured, Narrow2 Wide
semi-structured and
unstructured
Exhaustivity Mayer- An entire system is Samples Entire
Schönberger and captured, populations
Cukier (2013) Rather than being
sampled
Resolution and Dodge and Fine-grained (in Coarse and Tight and
identification Kitchin (2005) resolution) and weak to tight strong
uniquely indexical and strong
(in identification)
Relationality Boyd and Containing common Weak to Strong
Crawford fields that enable the strong
(2012) conjoining of different
datasets
Flexible and Marz and Can add/change new Low to High
scalable Warren (2012) fields easily and can middling
expand in size rapidly
Adapted from tables in Kitchin (2014) and Kitchin and McArdle (2016)
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 177
scalability (Kitchin and McArdle 2016; Kitchin 2014). But what does
this mean? How do these characteristics help us understand the data?
Having established a series of classifications for Big Data, Kitchin tested
his taxonomy of traits with co-author McArdle a few years later (Kitchin
and McArdle 2016). They applied the categories to 26 datasets which are
widely considered Big Data and drawn from across seven sources: mobile
communication, websites, social media/crowdsourcing, sensors, cam-
eras/lasers, transaction process generated data and administrative data
(2016). The authors find all seven traits in Table 5.1 are only applicable to
‘a handful’ of these datasets (Kitchin and McArdle 2016, 9). This shows
how difficult it is to diagnose what Big Data actually are. Rather than the
qualities of the data themselves, it might be more useful to instead turn to
thinking about the contexts of data again: where they come from, and
what they do (Oman n.d.).
The key differences in the characteristics of Big Data are context, which
is often missing when presented. Table 5.2 represents how difficult it is to
diagnose what Big Data actually are, without considering the qualities that
affect their use. It shows there are additional Vs: veracity, value and vari-
ability—these are concerned with how the data suit their re-purposing.
Given the multiple insights and applications of data outside of their origi-
nal setting, it can be difficult—even more difficult—to find certainty from
them. This is because the data were collected, generated and produced for
a specific reason, or as a by-product, that differs from how they are re-used.
The value of Big Data is the variety of insights that are possible and that
can be used for other purposes. However, there are many things in the
data that may not be useful. This also means using Big Data can increase
the risk of confounding more traditional causal explanations. Instead, the
mess of Big Data lends them to correlation with many insights, which can
Table 5.2 Some qualities of Big Data
Label / Origin Qualities of data that affect their use
definition
Veracity Marr (2014) The data can be messy, noisy and contain uncertainty and
error.
Value Marr (2014) Many insights can be extracted and the data repurposed.
Variability McNulty Data whose meaning can be constantly shifting in relation
(2014) to the context in which they are generated.
Synthesised from Kitchin and McArdle (2016)
178 S. OMAN
be used to enable prediction of well-being for individuals and society. We
shall return to correlations and well-being in our case studies later in this
chapter.
Table 5.3 looks at sources of different kinds of data typically used to
predict well-being along with their pros and cons. These sources were
drawn from an article in a journal for Data Science Analytics (Voukelatou
et al. 2020), and I have synthesised these with Kitchin’s seven sources
(mobile communication, websites, social media/crowdsourcing, sensors,
cameras/lasers, transaction process generated data and administrative
data) retaining commentary from Voukelatou et al. on the pros and cons
for their use to understand well-being. You may look at these and feel like
these data sources seem like strange ways to understand people’s well-
being: the difference in origins and what they may be used for. You may
also note that the authors’ presentation of the pros and cons, based on
these sources, does not really prompt consideration for the people whose
data they are, more their ease of use for the Data Scientist.
Returning to contexts of use: mobile phone data, for example, have a
primary purpose which is for billing, or because apps need location data to
work (such as maps or for local restaurant recommendations). This is very
different from these data being used to understand trends about people
and society. Our previous examples of data re-use (or secondary analysis)
have largely involved data that were collected in national surveys, or
through more qualitative methods with smaller samples to understand a
specific aspect of people and society more deeply in some way. Notably,
even if the research question is different when data are re-used in Chap.
3’s examples, the purpose of the data’s collection is not as different, or as
removed, as this ‘exhaust’, ‘by-product’ nature of the data Kitchin refers to.
The process which has come to be known as ‘datafication’ (as coined by
Mayer-Schönberger and Cukier 2013) describes the increased demand for
and uses of data. As we have seen in previous centuries, appetite for num-
bers (pandemics being one accelerator of data desire) has coincided with
technological evolutions with numbers. In turn, and as we have seen over
the last four chapters, different disciplines have increased and expanded
their capacities for data and knowing the human experience in their own,
particular way, and ‘new sciences’ have been declared. ‘Big Data’, as data
with the qualities presented above, result from mounting capacity and
faster instruments that increase the possibilities for the origins and vol-
umes of data that can be stored in expanding databases, or in different
databases which can be readily linked for a variety of purposes. As we have
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 179
Table 5.3 Sources of Big Data and their pros and cons for well-being
measurement
Data Source Pros Cons
Mobile communications Captures temporal, spatial and Not publicly available,
data (including GPS) social dimensions, sparsity, geographically
Worldwide diffusion, Imprecise
Repeatability Limited coverage in rural
Unbiased and classified, areas
real-time monitoring Indoor/altitude spatial
inaccuracy
Social media Measuring social dynamics, Privacy issues,
publicly available overrepresentation,
Social desirability bias
Disturbance of normal
activities to post
Health and fitness Cost-effective, Not publicly available, not
(including mental health Prediction of near-term risk of necessarily representative of
and well-being apps) events the population
Reduced respondent burden Requests for data input can
disrupt daily activities
Data can neglect moment-
to-moment variations in
mood.
News Variety of subject domains, Gatekeeping bias,
Variety of data Coverage bias,
Range of targets, Statement bias
24/h updated,
Archived historical news
Transaction process Modelling of dynamic Dependency on retailer’s
generated data household behaviour, permission,
Temporal accuracy, Legal constraints
Long-term coverage,
Quality
Websites and searches Publicly available. Population size varies across
Speed, convenience, flexibility, domains.
ease of analysis Relevant queries difficult to
Timeliness, observation of identify
people’s behaviour through Bias of content and terms
searches Comparability of different
search terms on different
days
(continued)
180 S. OMAN
Table 5.3 (continued)
Data Source Pros Cons
Crowdsourcing Large number of data Risk of low-quality results,
Speed, relative low-cost trade-off between quality
measurement of daily and cost
behaviour and activity Use of self-reports
Paid participation of users
Administration data Accurate, temporal stability, Limited understanding of
valid for community-level human experience in
understanding and cross- administration data
cultural comparisons
NOTES: Made from synthesising across Rob Kitchin’s 7: mobile communication; websites; social media/
crowdsourcing; sensors; cameras/lasers; transaction process generated data; and administrative data &
Voukelatou et al. (2020)—with the data examples in this chapter
also seen before, it can be difficult to decide which came first: appetite for
data, or capacity to expand on data possibilities.
In the age of Big Data, these newer data sources hold a wide variety of
easy-to-capture data points, including observations of how we feel, where
we are (or were), who we know, what we spend—and on what. These
provide information on what products we have clicked on, and those we
have not bought (Turow 2011). They can show how and where we spend
our spare time and our money, both off and online. They are, therefore,
incredibly valuable for research and commerce.
It is not these individual data points that are important, per se, but the
links between them, that make them valuable. Through linking, assump-
tions can be made about how our behaviour, such as online spending, or
improved mood, can be replicated in another place or time. These insights
are also linked with other more familiar data points from administrative
records, for example: where we were born, how much we earn, whether
we own our own house. Other data are produced by loyalty cards, smart-
phones and in-house devices, such as Alexa, expanding such linking
opportunities. Those who may try to avoid ‘being known’ by these other
data will try to bypass the systems that gather these data. However, this
resistance also becomes data in and of themselves; avoidance still produces
digital traces that can be used to gather insights. Corporations may still
create an automated profile of sorts, and assumptions will be made about
the kind of products ‘the resistors’ buy. The persistence of data practices
and their seeming inescapability are the reason we are starting to think
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 181
about the experience of Big Data as something we ‘live with’ (Kennedy
et al. 2020) and as something we ‘feel’.
This chapter covers some of the pervasiveness of Big Data, alongside
the possibilities that come with that. Crucially, we look at what that means
for well-being. We start by looking at the ways that data about mundane
aspects of our lives is increasing, alongside how normalised increasing data
collection, analysis and re-use are. These ‘data practices’ present new pos-
sibilities and realities of data-driven systems and decision-making that
affect culture and society.
In this chapter, we touch on some of the uncomfortable aspects of
these new realities, before historicising Big Data as well-being data to con-
textualise contemporary concerns regarding data practices that can be
harmful. The second half of the chapter uses case studies to explore these
concerns about well-being and data. Firstly, we consider a high-profile
case that was billed as the promise of Big Data: Google Flu Trends (GFT),
looking back from the age of COVID-19. Three further, short examples
show the possibilities of social media data, place-based data, and health
and fitness data to understand well-being for social and cultural policy and
culture and society more generally.
5.2 Big Data: A New Way to Understand
Well-being?
“Big Data”, was cited 40,000 times in 2017 in Google Scholar, about as
often as “happiness”! (Bellet and Frijters 2019)
The datafication of social life has led to a profound transformation in how
society is ordered, decisions are made, and citizens are governed. (Hintz and
Brand n.d., 2)
Digital devices and data are becoming an ever more pervasive and part of
social, commercial, governmental and academic practices. (Ruppert
et al. 2013, 2)
The majority of Big Data are collected in a different way to the national
surveys and interviews we encountered in Chaps. 3 and 4, and conse-
quently has numerous different qualities. One is that surveys and question-
naires are, by and large, overt methods, in that it is obvious you are asking
questions to generate data. The new technologies use data which are col-
lected covertly and so often gathered on individuals without their
182 S. OMAN
‘considered consent’, and are often processed without transparency.
Figure 5.1 shows just a small selection of the types of personal data that are
useful and valuable for social analytics and that are covered in this chapter.
Fig. 5.1 Some examples of personal data used for social analytics in the era of
Big Data
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 183
Social analytics involve the: monitoring, analysing, measuring and
interpreting of data about people’s movements, characteristics, interac-
tions, relationships, feelings, ideas and other content. Figure 5.1 shows
only a few of many more examples. Here, they are categorised into
domains that share the same names as the UK’s well-being measures, to
enable you to cross reference the different kinds of insights available under
each domain from these data (although biometrics is a new addition).
The data are from ‘observations’ of how we move around the on and
offline world. They can include behaviours collected by sensors (think of
how your mobile phone uses data via GPS to tell you when the next bus
is, or that you are about to encounter traffic on the motorway). They
include our feelings, shared by social media data, or in apps. While demo-
graphic data have long been collected, as we know, these newer forms of
data can say much more about us, our well-being and quality of life. As we
shall discover, this is both for good and bad and any insights gained need
to be put into context.
As we have also discovered, data are not only numbers or text, but can
be sound and pictures. Analysing these kinds of qualitative data as Big
Data holds new possibilities. In some ways it is these new possibilities that
feel the most uncomfortably non-human. Whether it is concern that your
phone is always listening to you, or, rather, that Alexa or Siri are (to huma-
nise these technologies). Even the Street View option of Google Maps
allows us to look at other people’s homes. I remember keenly finding the
image of the flat I rented in London for years, only to see my washing-up
through the kitchen window. I couldn’t help but think, I wish I had
known they were coming.
More notable than my neglected washing-up being on public view for
judgement are other visual data used for training datasets, particularly for
facial recognition. There are the moments when you know that facial rec-
ognition technology is being used: to log in to your phone, or at passport
control at the airport, perhaps. However, they are also being developed
for schools, public transport systems, workplaces and healthcare facilities
(Ada Lovelace Institute 2019). Revelations about its use in shopping cen-
tres prompted media and public outrage, regulatory investigation and
political criticism (Denham 2019; BBC 2019). These reactions are in part
about the further encroachment on the way we live (like the call centre
example from the 1990s that opens the book) and in part the lack of con-
sent and knowledge about these data being collected about us.
184 S. OMAN
Some people who uploaded photos to Flickr, some 10–15 years ago,
more recently discovered they (as in the people’s faces and their photos)
appeared in a huge facial-recognition database called MegaFace (Hill and
Krolik 2019). They found the database held facial data on around 700,000
individuals, including their children, and was being downloaded by vari-
ous companies to train face-identification algorithms. These algorithms
were then being used to track protesters, surveil terrorists, spot problem
gamblers and spy on the public at large (Hill and Krolik 2019). Notably, a
colleague who read this chapter before publication—a digital sociologist,3
no less—confessed to me their shock at reading this anecdote, as they had
used Flickr and were not aware of this story. Therefore, not only are our
personal data collected and used without our knowledge, but the contro-
versies surrounding their re-use are not even shared with users. This poses
questions for accountability and transparency.
The questions of who is collecting these data, and who is using them,
and for what, present a more complex issue than before. Public support
for the police to use facial recognition technology is conditional upon
limitations and subject to appropriate safeguards, but there is no trust in
private company use (Ada Lovelace Institute 2019). As we have been dis-
covering—it is the contexts of data collection and uses that we need to
understand: it is the who, what, where, why and what for? that are
important.
Why We Need to Ask Critical Questions of Data in the Context
of Well-being
Many issues related to Big Data don’t have clear-cut answers, especially
where well-being is concerned. While data reveal details of the vulnerable,
often involving risk for these people and their communities, the State uses
data systems that people increasingly need to be a part of to access health-
care and welfare support (Dencik 2020). This is why the growing amount
of research which problematises the utility and ethics of Big Data, and how
they are used, is vital. In this area of critical data science (see Bates 2016),
some researchers use Big Data to reveal the limits and social issues con-
nected to everyday datasets that we all use, such as a search engine’s image
database (e.g. Otterbacher et al. 2017). These critical studies of data and
their effects on society reveal how data are capable of not only new prob-
lems, but persistent racism and misogyny, as we discovered in Chap. 1 with
Virginia Noble’s example of what happens when you search for the phrase
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 185
‘black girls’ (Noble 2018). These projects reveal data’s negative social
effects, and how they are already embedded in society, exacerbating issues.
Other research aims to investigate what people know and think is going
on. Also looking at the possibilities of Big Data (and their associated tech-
nologies) to understanding aspects of well-being. One such example
(Living With Data n.d.) presents real-life cases of public sector data prac-
tices to members of the public. It wants to understand how much people
appreciate the possible benefits and how much they doubt or distrust the
possible implications of data systems and sharing in their everyday lives.
One option being, of course, that many people may not really care as
much as we think they do, or should.
We touch on these issues in this chapter. Most notable is the increase in
concerns regarding the harms that Big Data and new technologies are
capable of, and which are happening unchecked (i.e. the UK’s Data Justice
Lab n.d.; Eubanks 2018; O’Neil 2016; Noble 2018; Benjamin 2019).
There are two main problems here. One is that we are compromising well-
being in the so-called aim of better understanding the human condition.
The second is that we are not only using these data and technologies to
understand people but also sorting and managing them in different ways
that suit those who are already more powerful.
It is vital to note that key to concerns about datafication are how these
practices disproportionately affect the well-being of those already most
vulnerable. Facial recognition, for example, negatively impacts people
already disadvantaged, owing to its own gendered, heteronormative
classed and racialised biases (Ada Lovelace Institute 2019). These tech-
nologies are also being trialled in policing in the UK and have reported
more than 90% of incorrect matches (Fussey and Murray 2019; Davies
et al. 2018). In a more general way, all public services are adopting new
data practices and possibilities.
Data-driven decision-making is growing as an everyday feature of pub-
lic services. Who receives welfare (Eubanks 2018, 37) housing (Eubanks
2018, 93) and other interventions, such as child protection (Eubanks
2018, 135) or education (O’Neil 2016, 5-9; 52–60) are decisions increas-
ingly made by algorithms, rather than people. Even when automated deci-
sions are questioned by people (Eubanks 2018, 141), it is unclear whether
‘experienced workers’ (Eubanks 2018, 77) or the data system has the
greater influence in key decisions.
Beyond welfare, algorithms intervene in other social policy areas. They
monitor the ‘quality’ of education, using dubious proxies (O’Neil 2016),
186 S. OMAN
with various bad outcomes, including teachers undeservedly losing their
jobs.4 In COVID-19 UK in 2020, an algorithm also decided the grades
awarded to school-leavers in the absence of exams, owing to social distanc-
ing measures. One national media headline (Pidd 2020) called this ‘pun-
ishment by statistics’.
The UK’s A Level algorithm example was extremely high profile, causing
outrage that data-driven decision-making would have such an enormous
effect on the futures of these young people. It was seen as morally outra-
geous for a number of reasons. First, because our society dictates that these
young people’s well-being should be protected. Second, this algorithm used
data that no one had consented to: no one knew at the time that their prior
grades could be used as a final grade. Third, the data model also included
proxies for expected performance which were nothing to do with each stu-
dent’s own academic record. Instead, they used their school’s overall perfor-
mance in previous years, which were scores based on previous students’
grades, not theirs. While the governing body, Ofqual, insisted its standardi-
sation arrangements ‘are the fairest possible to facilitate students progressing
on to further study or employment as planned’ (Pidd 2020), there were
further controversies over transparency around how they had arrived at ‘fair’.
After which, Ofqual published a 319-page document explaining its method-
ology (Pidd 2020) which was criticised for not being accessible to the gen-
eral public. Therefore, not only did the whole thing seem far from fair, but
Ofqual didn’t make explicit how the approach was fair to those affected.
Here we see public services failing to look after well-being through the
use of data in ways which go against the moral code of fairness, account-
ability and transparency5—and without the young people’s consent.
Beyond their high-profile nature, what is different about these data uses?
While Chap. 2 discussed the greater role of data in public services from the
1980s onwards, this ostensibly had a different rationale. It aimed to evalu-
ate qualities of these services, such as efficiency or cost-effectiveness. While
these approaches led to flawed decisions and evaluations, assessments were
made at a societal level. Contemporary data-driven decision-making,
whether the allocation of resources to people or the labelling of individu-
als at risk, is a different approach and uses data on a different level. Or, to
use the language of Chap. 3, there is a different unit of analysis, and that
unit could be a vulnerable person.
In sum, why do we need to ask critical questions about how people and
their well-being are being understood or about how data and data systems
used to understand people can compromise well-being? Going back to
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 187
those definitions, people are often concerned with the speed and size, and
so on, of Big Data. Actually, as Kitchin indicates, it is the contexts of these
data that are the most important ways that they are different. Not only are
the contexts of origin of Big Data more different, and further from the
contexts of use, than before, but the practices of analysing data feel less
human. By this I mean that less human attention is now required in data
analysis and in important processes that require data. What does that mean
for decisions made about people and well-being?
As we will discover in a few sections, the response to COVID-19
required older data and data systems—and more human judgement—than
you would have imagined if you were looking at media reports of the
promise of artificial intelligence (AI) in the first half of 2020. However, as
the financial value of data increases, the more expediently they can be ana-
lysed, and here we must ask other questions. Who stands to gain and who
stands to lose? Who has chosen to participate? But then did people ever
get to choose to participate in systems of well-being data? Or were we
even thinking about data as ‘a thing’ about us, that affects our lives and
was valuable? The next two sections deconstruct the financial value of Big
Data and whether this reality is even new.
Value
Another major reason why we need to ask critical questions about Big
Data and well-being concerns the financial value of knowing more about
people and the financial value of the systems that sort people for public
services and welfare distribution (Eubanks 2018). Beyond public services,
the value of the new ways that Big Data can work is not just in knowing
more about people, but because of the potential this knowledge has to
orient people’s thinking through suggestion and in some high-profile
cases to manipulate what they do. They enable marketers to sell you prod-
ucts you might be most tempted by, knowing when you might be most
susceptible too, based on your previous sales or what else you’ve looked at
(Turow 2011). They also enable political campaigns to target their mes-
sages in the same way and change voting behaviour (Avila 2019; Bates
et al. 2016; Murgia 2017). The recent Cambridge Analytica scandal saw
Facebook implicated in not only the unethical use of people’s data, and
knowledge it had on their behaviour, but in misinformation that is thought
to have changed the results of the US presidential election 2016 and
Brexit in the UK the same year.
188 S. OMAN
The first and second waves of well-being (Bache and Reardon 2013)
from Chap. 2, and to which we keep returning, evolved as historical
moments in which data capabilities married policy-makers’ aims: improv-
ing the way we think about measuring human progress. Similarly, well-
being metrics became more viable because well-being methodologies were
evolving in a way that politicians saw as favourable. Political will and aca-
demic developments work with evolving infrastructure and technological
development to enable datasets to be created with more detailed and
nuanced information about quality of life. These factors work together for
new methodologies to generate new kinds of data and analytical approaches
which then, by extension, affect research and policy-making, which in turn
impact upon our quality of life.
The increasing emphasis on Big Data as ‘the new oil’6 (a misnomer, of
course) is not because datasets are ‘better’ (which would need some quali-
fication) or because the technologies are new (though admittedly this is
partly why it has become such a fixation). Instead, ‘Big Data’ datasets offer
data with different qualities than more traditional data acquired by surveys.
This means big datasets offer capacity to answer different research ques-
tions—or answer the same research questions differently. Most importantly,
they have been called the new oil because: (1) ‘data powers today’s most
profitable corporations, just like fossil fuels energized those of the past’
(Matsakis 2019) and (2) this means these qualities can be financialised.
The amount of data on individuals that are now collected is almost
impossible to visualise in our minds. The growing number of devices and
sensors means we are generating more and more data than can be col-
lected: the International Data Corporation predicts that by 2025, the total
amount of digital data created worldwide will rise to 163 zettabytes
(Coughlin 2018). That is 1021 (1,000,000,000,000,000,000,000 bytes)
or one trillion Gigabytes. The European Commission forecasted the
European ‘data market’ to be worth as much as €106.8 billion by 2020
(Ram and Murgia 2019). These kinds of numbers reinforce the impor-
tance of looking at Big Data as social phenomena—with social effects, but
how new are large datasets about people and populations?
5.3 Are Big Data Even Actually New?
While data are ‘sold’ to us as ‘the new oil’ (The Economist 2017), large
datasets, and their use to understand human behaviour, are not new; nei-
ther is the relationship between governments, commerce and value, when
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 189
it comes to data. Mary Poovey’s A History of the Modern Fact: Problems of
Knowledge in the Sciences of Wealth and Society (1998) describes the rise of
merchants and their influence over the State, including campaigns to pro-
mote the balance of trade as the index of national well-being from the
early seventeenth century onwards (Poovey 1998, 93–94). The new
‘enthusiasm for numbers’ in the early to mid-nineteenth century (Hacking
1991, 186; Porter 1986, 1996) coincided with a growing infrastructure
to collect and analyse data. This desire for numbers, and the data processes
that were required to provide them, led to the ‘great explosion of numbers
that made the term statistics’ (Porter 1986, 11). If truth be told, the term
‘statistics’ originated for governments to understand ‘the quantum of hap-
piness’ (Sinclair 1798, vol. 20, p. xiii). In this ‘avalanche of numbers’,
‘nation-states classified, counted and tabulated their subjects anew’
(Hacking 1990, 2; 1991, 186). However, while ‘statistics’ may be hun-
dreds of years old, large datasets go back further.
Managing land, agricultural hierarchies and the desire to control popu-
lations have long required systems of recording. One of the oldest-known
writing systems is Sumerian script, which is approximately 6000 years old
(Bellet and Frijters 2019). This script is called cuneiform, and its uses are
said to include the tracking of trade and taxes: you need records on who
has paid, how much; who has not paid, and what they owe (Harford
2017). While the clay tablets these records were written on may not seem
like a database, or feel like the Big Data futures outlined in the previous
and subsequent sections, they were a dataset of sorts. Crucially, these data
were used to monitor and control resources, including the management
of people.
Most countries now undertake a census of sorts. The UK Census takes
place every ten years and has done since 1801.7 The first four were only
headcounts, with the 1841 Census being the first to intentionally record
names of all individuals in a household or institution. The UK’s ONS
website offers an interesting history of censuses in the UK, back to the
Domesday book ordered by the Norman (French) King, William the
Conqueror in 1086 (ONS 2016). Again, censuses precede these European
data moments by some 4000 years in both Egypt and China, whose gov-
ernments (as they would have been formed and named in those days)
recorded who lived where and how wealthy they were. The Romans held
regular censuses to keep track of their expanding—and then contracting—
empire. Evidence of other institutionalised data practices exists in the
Bible: the book of Genesis talks of kinship and marriage records and
190 S. OMAN
Exodus mentions a population census to support the tabernacle. The
Church collected information on births, christenings, marriages, wills and
deaths; this tracked the business of a church and its parish, but was also a
means of counting the faithful and tracking their wealth.
You will note that the recording of trade and births, marriages and
deaths is not so different from the administrative data that appear in all our
examples of well-being data, from Table 3.1 to 5.3. So, what is new about
Big Data? We’ve long had large datasets that hold multiple data points on
people and nations, but these are thought to be ‘state simplifications’ for
officials (Scott 1998). Rationalisation and standardisation mean these rep-
resentations ‘did not successfully represent the actual activity of the society
depicted, nor were they intended to; they represented only the slice of it
that interested the official observer’ (Scott 1998, 3). What the historian
James Scott tells us here is that the sorts of information that were collected
on scale lacked detail that could be used to improve quality of life. He
implies, of course, that those in charge did not actually care about quality
of life, only quantity of resource, whether this was people to work the
land, make armies, or pay taxes. More recently, as we have seen, govern-
ments were charged with responsibility for people’s well-being, and there-
fore, more complex data were required.8 One such development was the
social survey.
The social survey has been used to collect data which capture various
qualities of lives in richer ways, and for longer, than it is often credited for.
For example, surveys in the UK in the mid-1940s (in World War II) dis-
covered almost one in ten households did not have the number of cups
deemed necessary for essential use, and ‘the shortage of scrubbing brushes
seems to have been extensively felt’ (Oman 2015, 88; ONS 2001, 9).
Whilst still administrative records of resource and scarcity, the survey
began to be used to articulate more qualitative aspects of quality of life as
proxies for well-being. This presents richer detail than many of the con-
temporary surveys that generate the well-being data we have seen as either
objective or subjective data so far.
These more qualitative data were not only collected using government
social scientists that we might imagine with clipboards. A project called
Mass Observation was established in 1937 by anthropologist Tom
Harrisson, poet Charles Madge and filmmaker Humphrey Jennings.9 Mass
Observation aimed to record everyday life in Britain. There were paid
investigators who anonymously recorded people’s conversations and their
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 191
behaviour: at work, on the street and at memorable occasions, including
public meetings or sporting and religious events.
This project was reminiscent of the current idea of ‘Big Data’, not only
in the scope of the data gathered, but also in how they were gathered. Mass
Observation had numerous phases and at one point also used a panel of
around 500 voluntary ‘observers’. The initial aims of Mass Observation
were to research everyday life, making use of ‘the untrained observer, the
man in the street’10 as much as those who were thought to be skilled and
qualified in gathering data of this sort (Madge and Harrisson 1937, 10).
The observers used various data collection methods to generate large
datasets on different topics: some maintained diaries, while others replied
to open-ended questionnaires. In 1938, there was ‘a competition’ for the
residents of Bolton, Lancashire (see Fig 5.2), asking people what happi-
ness meant for them. This was one of many themes, and people would
reply to what were called directives with often very long texts describing
what they thought and how they felt. The data from these and from the
1938 project can still be accessed via a vast archive at the University of
Sussex.11
Mass Observation began with a positive vision of democratising the
processes behind how data were gathered to better understand people’s
lives. However, over time, much qualitative social research shifted towards
the narrower analysis of consumer choice, and Mass Observation became
a market-research firm in 1949 (Albert 2019). Mass Observation re-
launched in 1981, returning to its original egalitarian ideals and the
archives are testament to the ways that Mass Observation aims to engage
the public in the documenting of their own lives.
These historical examples of large datasets are, therefore, not so dif-
ferent from the qualities found in previously crowdsourced, location-
based, time-based data on how people feel about things, as seen in
Table 5.3. The purchasing of scrubbing brushes was used as proxy data
for other qualities of life in the same way our purchasing data are anal-
ysed to better understand us. Similarly, a lack of cups was indicative of a
particular kind of poverty and lack of resources at a point in time, and
this was analysed across the population. However, the democratic prom-
ise of Mass Observation and other projects of the time were superseded
by the potential of understanding what makes people happy for commer-
cial gain.
192 S. OMAN
Fig. 5.2 What is happiness? Mass Observation competition flyer, 1938
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 193
The Darker Side of Historical Well-being Data
and Commercial Gain
With the rise of market research came increased interest in people’s prefer-
ences, and in what made them happy or gave them pleasure (Davies 2015;
Savage 2010). This involved capturing subjective well-being data, as well
as cultivating communications to imply that owning or consuming certain
things would increase someone’s well-being in some way. The aim here in
this context, of course, was to change people’s purchasing choices. With
this shift, people as citizens became consumers. Over the years, ‘consumer
sentiment’ indices have been assessed to see if these data can predict peo-
ple’s behaviours on a macro level, from economic cycles (Carroll et al.
1994) to presidential popularity (Suzuki 1992). This marriage of mood
and economics is not new to us, of course. In Chap. 4, we encountered
the development of subjective well-being data, a newer shinier well-being
data, as a marriage of economics and psychology, known as happiness eco-
nomics that was able to measure subjective well-being at population level.
Mood and sentiment analysis are not new, then. Neither are big datas-
ets. Even Fitbits and Apple watches are not new; not really, as attaching
technologies to people’s bodies has been used to study and improve pro-
ductivity and surveillance of workers and citizens for around a hundred
years (Davies 2015; Cryle and Stephens 2017). So, what is new? The
amount and variety of data on the well-being of individuals and popula-
tions are increasing as technologies develop to manage greater amounts of
different kinds of data, not only faster, but faster together.12 Therefore, it
is not necessarily how one thing (not that Big Data are one thing, really)
is new. Instead, it is a far more complex picture of how different aspects of,
and different people across fields of, politics, science, research and tech-
nology work together—and work with commerce. These all combine as
developments in what we know, and ways of knowing, about society.
The question is, what does that mean for well-being? How can we learn
from previous mistakes regarding the context of who is using what data—
and to what end? COVID-19 will offer us many data insights and many
insights into how data can help us understand and look after well-being
better. The next section looks at the role of data and learning in a pan-
demic, of old and new infrastructures and commercial and governmental
data practices in the management of a pandemic.
194 S. OMAN
5.4 A Case Study on the Promise of Commercial
Big Data
One of the most high-profile cases of the possibilities of Big Data involves
a tale that begins in 2009 when a new virus was discovered. This new ill-
ness spread quickly and combined elements of bird flu and swine flu. This
story opens Mayer-Schönberger and Cukier’s book, Big Data: A
Revolution That Will Transform How We Will Live, Work and Think, which
you may remember is mentioned earlier in the chapter as a much-cited
originator of the term ‘datafication’ (2013). The authors explain that the
only way authorities could curb the spread of this new virus was through
knowing where it was already.
In the US, the Centres for Disease Control and Prevention (CDC)
requested that doctors inform them of cases. However, the information
on the pandemic that the CDC had to work with was out of date. This was
by nature of the data collected, and its ‘data journey’ (Bates et al. 2016).
There were multiple data journeys to consider: data were collected at the
point someone went to the doctor, which could be days after initial symp-
toms, let alone contraction; sharing data with the CDC was a time-
consuming procedure; the CDC only processed the data once a week.
Thus, the picture was probably weeks out of date, making intervention or
behavioural analysis difficult. In other words, while the datasets were large,
even potentially fairly detailed, these Big Data were too slow.
Coincidentally, so Mayer-Schönberger and Cukier tell us, a few weeks
before the new disease made the headlines, Google engineers published a
paper in a high-profile journal, Nature, which explained how Google
could ‘predict’ the spread of the winter flu in the US. This was possible
just through analysing what people had typed into their search engine
(and, of course, knowing where those people typing were). It compared
the CDC data on the spread of seasonal flu from 2003 to 2008 with the
50 million most common search terms in America.
The Google engineers looked for correlations between what people
typed into the Google search engine and the spread of the disease. Mayer-
Schönberger and Cukier point out that.
Google’s method doesn’t require traditional infrastructures to distrib-
ute mouth swabs or for people to go to doctors’ surgeries.
‘Instead, it is built on ‘big data’—the ability of society to harness informa-
tion in novel ways to produce useful insights or goods and services of
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 195
s ignificant value. With it, by the time the next pandemic comes around, the
world will have a better tool at its disposal to predict and thus prevent the
spread. (Mayer-Schönberger and Cukier 2013, 2–3)
Sadly, a pandemic with wider societal and well-being effects arrived
after I started writing this book, and despite the promise of Big Data, it
did not prevent the spread. Data hold a very important place in the story
of COVID-19 and its management, but all data have limitations in how it
can inform human action to change reality, as do the different ways of
analysing data. Indeed, data are not just there but are managed and used by
people with their own interests. Data do not speak for themselves but
are interpreted. All data realities also involve selective processes in what
data are important and what data are not. These limits are not always
made as clear as they should be.
Mayer-Schönberger and Cukier’s promise of Big Data as revolutionary
and transformational in the US was clearly jumping the gun. Not only was
the pandemic not prevented by way of predictive analytics, but actually,
part of COVID-19 data management has very much involved doctors’
surgeries and mouth swabs—in the UK at least. To clarify, I was randomly
selected from data held on people registered with a GP to participate in a
survey in August 2020.13 I was contacted by the Real-time Assessment of
Community Transmission (REACT) Study,14 which is in fact a series of
studies, using home testing to understand more about COVID-19, and its
transmission in communities in England. The logic behind the study was
that not all people with the virus were being tested at this point, either
because they were asymptomatic or for some other reason. This was one
of a few projects to collect data from a sample of the population, over
time, in order to understand how it was spreading.
This process relied on old infrastructures: I received a letter by Royal
Mail, I signed up online, and then I was sent a mouth swab—also by post.
That all worked fine for me, but there was a series of steps registering dif-
ferent barcodes and I found myself wondering how accessible this was for
everyone (when I say everyone, I often think of my once tech-savvy Dad,
who’d have been bewildered at this whole process). After completing
these steps, a courier was ordered to collect the test. I sat in patiently wait-
ing for my test to be collected, slightly anxious about what felt like a huge
responsibility, and acutely aware that I might need to be ready to run out
and meet a courier with my test.
196 S. OMAN
I live in a high-rise with no working bell or intercom (and a bunch of
other things that don’t work). For three separate days, I watched for
details of the courier on the app, and out of my window, waiting for them
to appear on the road, or call to say I should come down. But there was
no sighting of the courier in real life and no phone call. When the app
showed they were coming, they disappeared without attempting to deliver.
After three attempts. I was told that this particular courier company was
infamous for not bothering to try and collect from my flats, because it was
too inconvenient. So, in my case, while some aspects of the traditional data
infrastructure (the post) worked fine, they didn’t necessarily all work
together as they might. This meant that my test remained uncollected,
expired and had to be securely disposed of. This meant my data became
‘missing data’.
What I was surprised by was how the information system assumed you
would live somewhere that was easy to access. As we know, many people
from our poorest communities live in high-rises where the lift doesn’t
work, or the people in the flats themselves are difficult for a courier to
access. Thinking about the contexts in which data are collected (or not)
can be both extraordinary, and mundane, and we often don’t hear of these
stories—when they work, and the odd occasion when they don’t, and
what that might mean for the data. Yet, these contexts have huge impact
on who is readable in data and how we understand well-being and
inequality.
So why did COVID-19 data collection end up using more traditional
infrastructures in the UK? On a larger scale, why did the world not use
Google data as Mayer-Schönberger and Cukier predicted? As it turns out,
Google Flu Trends (GFT) missed the peak of the 2013 flu season by
140%, and Google subsequently closed the project (REF). In 2014 a paper
called ‘The Parable of Google Flu: Traps in Big Data Analysis’ was pub-
lished in another high-profile academic journal, Science (Lazer et al. 2014).
The authors concluded that while there was potential in these sorts of
methodologies, and while Google’s efforts in projecting the flu may have
been well meaning (which could be called into question), the method and
data were opaque. This made it potentially ‘dangerous’ (Lazer and
Kennedy 2015) to rely on GFT for any decision-making, as the context of
the data and the analyses were not made explicit to public decision-mak-
ers. Of course, it is also perhaps unlikely that Google had designed the
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 197
tool for public decision-making contexts,15 considering what government
officials need to understand for this kind of decision-making.
There are other limits to the data: its sample. Google assumes this ubiq-
uitous reputation, yet, it is not the only search engine available: people
choose other search engines for various reasons. Crucially, Google also
does not have global reach. Most services offered by Google China, for
example, were blocked by the Great Firewall in the People’s Republic of
China. This was not even the first time it was banned in China. So, even if
GFT were still in action, would it have pre-empted the COVID-19 out-
break in Wuhan, China, before more official announcements?
If we are to think about how Big Data have transformed how we live,
as Mayer-Schönberger and Cukier want us to, then we must also consider
how ‘datafication’ has changed people’s practices. More and more of us
scour the internet, hoping to reassure ourselves that recently developed
symptoms are minor ailments. This is—as we discovered in Chap. 2—part
of the anxiety introduced with audit culture: we consult technologies as a
default because we can, rather than should. We search for confirmation
that nothing is wrong, rather than only searching when something is
wrong. In countries where access to healthcare is diminished, people are
actively encouraged to search the internet before interacting with health
services. Consequently, this limits the predictability of search data, as their
contexts have changed.
In the case of COVID-19, people searched for symptoms they didn’t
necessarily have, especially in the second quarter of 2020, when most
nations were in lockdown and the severity and ramifications of the disease
were becoming clearer. The implications of this are that searches would
not necessarily have reflected the infected state of an individual that could
be aggregated to reveal community or population infections, or more
importantly, predict transmission so that it might be controlled in some
way. Instead, searches for COVID-19 symptoms may well be a predictor
of concern or anxiety. Ironically, then, Google searches are arguably a bet-
ter indicator of negative subjective well-being than of COVID-19.
The very idea of data being reliable has led to our need to feel sure—to
have objective confirmation that all was OK, is OK or will be OK, and has
led to an increased reliance on data. In the case of Google searches, this
reliance has triggered people to search for verification of risk or safety. So
how might we have cut through the ‘noise’ that the definitions at the
beginning of this chapter point to, in order to know how it was spreading?
We are back at the chicken and the egg dilemma: do people search about
198 S. OMAN
COVID-19 because they have symptoms? Or do people search about
COVID-19 because they are worried about it and feel compelled to search
for confirmation—or search on behalf of friends or loved ones? I watched
someone use their internet searches to check our colleague’s proclaimed
symptoms against the common signs of swine flu—a very collegiate indi-
vidual, but one whose search history told a story of their friend’s (poten-
tial) disease state, rather than their own. In this latter case, then, Google
searches were more indicative of personality than health or even subjective
well-being, although, perhaps well-being data all the same.
Bigger datasets make correlation more powerful than causation, explain
Mayer-Schönberger and Cukier, devoting a whole chapter to it in their
book (2013). Google queries went from 14 billion per year in 2000 to 1.2
trillion a decade later. There are even websites that show a live running
tally of how many searches have been achieved in a day.16 If Big Data were
all about scale, then GFT would have been more, not less likely to work
on the premise of correlation as search numbers increased. The scale at
which we have correlations using ‘Big Data’ may be an indicator of causa-
tion, but not proof. Is this the end of the promise of Big Data, though? If
we return to a case of COVID-19 and Big Data, what might we find?
Linking Big Datasets: For Well-being?
On New Year’s Day, 2020, a Canadian health monitoring company alerted
its customers to the COVID-19 outbreak, some days before the US’ CDC
or the World Health Organization (WHO) alerted anyone (Niiler 2020).
Of course, the disease was not yet called COVID-19, and it was not known
that it was to be a global pandemic. At this point, a cluster of unusual
pneumonia cases had been detected. One of the companies said to have
beaten the WHO to this discovery is called BlueDot, which uses AI-driven
algorithm searches to look at datasets, much like GFT.
Unlike Google Flu Trends, BlueDot’s algorithms consolidate and anal-
yse data from numerous sources. BlueDot’s owner, Dr. Kamran Khan
explains:
We can pick up news of possible outbreaks, little murmurs or forums or
blogs of indications of some kind of unusual events going on. (Khan, in
Niiler 2020)
Other data sources are more official, such as statements from health
organisations, livestock and news reports in 65 languages. BlueDot also
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 199
uses ‘anonymous mobile phone data’ (Whitaker 2020), flight sales and
other records. These various data points enable a prediction of a possible
new serious disease. Importantly, the logic is that this approach also offers
insight into how that disease becomes mobile by the people who carry it
and the planes who carry the people carrying the disease.
What we have done is use natural language processing and machine learning
to train this engine to recognize whether this is an outbreak of anthrax in
Mongolia versus a reunion of the heavy metal band Anthrax. (Niiler 2020)
Also, crucially, ‘epidemiologists check that the conclusions make sense
from a scientific standpoint’ (Niiler 2020). The company website states
that ‘BlueDot protects people around the world from infectious diseases
with human and artificial intelligence’ (BlueDot n.d.). Therefore, despite
claims to its sophistication, the automated data-sifting still requires human
analysis to make sense of what has been found.
Khan’s company utilised technological developments at its disposal to
synthesise many different types of data from multiple datasets to construct
evidence. Only when the data were pieced together was the information
useful, and only after human experts had checked it, were these insights
deemed useful enough to share and use. BlueDot is a commercial com-
pany. The human and artificial intelligence are synthesised as an enterprise,
and Khan is often presented as both an entrepreneur, as well as a professor
of medicine and public health at the University of Toronto. Khan has also
worked in hospitals, so understands how they work. Khan explains in one
interview,
Disease doesn’t wait for the reviewers, so we need a more agile system. My
motivation for creating a company—here to start supporting an entrepre-
neurial spirit—using business as the vehicle to do that. (Khan, on Charrington
20 February 2020)
There are two things to note here. Khan suggests that the old struc-
tures of peer review and scientific expertise are too slow in their use of data
and evidence to tackle a global pandemic. He also suggests that his busi-
ness successfully links together ‘human and artificial intelligence’ to pro-
vide what traditional science cannot: the analysis of data with veracity and
variability, speed, resolution, relationality and so on. The value of BlueDot
is in its claims to harnessing the qualities of Big Data.
200 S. OMAN
To return to Mayer-Schönberger and Cukier, ‘Google’s method’ may
not have involved distributing mouth swabs, or been built on old infra-
structures, but instead, they explain:
[I]t is built on “big data”—the ability of society to harness information in
novel ways to produce useful insights or goods and services of significant
value. (Mayer-Schönberger and Cukier, 2)
So, there we have those familiar terms of insights (a marketing term)
and valuation (that we discovered from economics in Chap. 2), alongside
clear communications and the presentation of novelty (Chap. 4), goods
and services. Mayer-Schönberger and Cukier hint at the complex politics
at play on the value of data—and the values of data more broadly than we
have already encountered.
Crucially, in a book about well-being and data, we have to note that
BlueDot’s business is entrepreneurial because it is profitable. In other
words, the insights have to be sold to clients and customers. They were
also not the only innovator (as acknowledged by the Lancet and MIT
Review [McCall 2020; Heaven 2020]). Here, we must return to the eco-
nomic value of data because of the possibilities of well-being insights and
the ideological project of the well-being agenda.
If the well-being agenda is about improving redistribution of resources
as an issue of social justice, we might want to think about what position we
are coming from: rather than asking, ‘what are the data limits of these
well-being projects?’, we might ask, ‘what are the well-being limits of data
projects like these?’ Although, despite the clear sophistication of BlueDot’s
project, it also did not prevent COVID-19’s spread. This criticism has
been noted in the MIT Review:
The hype outstrips the reality. In fact, the narrative that has appeared in
many news reports and breathless press releases—that AI is a powerful new
weapon against diseases—is only partly true and risks becoming counterpro-
ductive. (Heaven 2020)
The point this MIT article was making here is that the over-reaching
claims of AI could be damaging to its future progression, in the same way
that GFT overstretched its claims.
Data and the distribution of resources are very much part of the
COVID-19 story, and not just of private companies profiteering, either.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 201
Such competition is also reiterated by national politicians misleading the
public about ‘world-beating’ systems of data (BBC 2020). In the same
way that the social indicators movement was halted because it was not
quite measuring what it thought it was measuring (Chap. 2), the ‘promise’
of Big Data has adjusted. The limits of Google’s approach are in a lack of
context: the nature of what people actually search for is different than was
predicted. The limits on data are social, cultural, political and economic,
and by extension, these limit the possibilities for a good society. We will
explore social media and mobile communications data in the final few sec-
tions to better appreciate this relationship.
5.5 Social Media Data: A Game Changer?
I am sure that social media plays a role in unhappiness, but it has as many
benefits as it does negatives. (Sir Simon Wessely, president of the UK’s Royal
College of Psychiatrists in Campbell 2017)
Social media platforms have an interesting relationship to well-being.
They are often demonised as bad for well-being, especially for the younger
generation who are thought to dwell on images of idealised bodies and
lifestyles on Instagram (Campbell 2017). All ages feel a pang looking at
the picture-perfect presentations on Facebook, and even the NHS warns
people to take breaks from social media (NHS 2016). Credible, successful
women leave themselves vulnerable to criticism from strangers in the shar-
ing of thoughts, opinions and aspects of their identity on platforms like
Twitter (Lewis et al. 2016). Similarly, hate speech against people of colour
(Gayle 2018) or for their gender identity (Pearce et al. 2020) are realities
of social media platforms. However, social media and online platforms also
offer places for human connections, and have had beneficial effects for the
social isolation brought about by measures to curb the spread of
COVID-19. The jury is still out on many of the pros and cons of social
media, including their propensity to spread disinformation, versus credible
analysis of data and guidelines. Social media therefore hold an ambivalent
place in the management of well-being.
These controversial aspects of social media are not their only connections
to well-being. The data we share can make them useful for well-being analy-
sis. The most mundane aspects of our feeds, the venting of minor irritations,
celebrations of small wins or just feelings shared with friends and family
mean our social media accounts are full of well-being data. Think about
202 S. OMAN
those ONS4 questions again (Table 4.2) that aim to gauge ‘personal well-
being’. For example, they all ask you to think about how you felt yesterday
overall—in terms of happiness or anxiety, as well as whether you think what
you do is worthwhile, and whether you are satisfied with your life. When
you think about Facebook’s most prolific posters in your timelines, for
example, much of their content will indicate how they felt in similar ways at
specific moments. The recent addition of emojis to Facebook means it is
easier to proclaim whether you were happy, celebrating or anxious. The
reminders of what you were doing this time last year or ten years ago means
we are telling everyone on Facebook how we feel now, about how we were
feeling in previous years. Crucially, this means it is even easier for Facebook
to know this too, as you have essentially coded your own data for them.
This compulsion to share how we feel means we are also sharing our data
with Facebook and other platforms. These platforms are able to analyse us
alongside millions of others at scale. Companies like Brandwatch monitor
social media and analyse several billion emoticons each year to inform
brands whether they are provoking hatred or happiness with their products.
It is also possible for a broad range of actors to mine social media data,
whether commercial companies, government agencies, academic research-
ers or amateurs with the inclination to do so. The platforms are set up with
open Application Programming Interfaces (APIs). APIs are what allow
other (data mining) software to interact with social media platforms. Once
access to social media data has been gained, it can be ‘scraped’ with com-
parative speed with the right skills and software. Scraping is a process which
essentially involves gathering and copying data that meets specific search
terms. It is then put into a database (that can be as crude as a spreadsheet),
for later retrieval or analysis. This can be done by a person, although the
term more typically refers to automated processes involving a bot or web
crawler. The fact that APIs are generally open as a standard indicates that
these data—your data—are made available by social media platforms to be
used by various different actors. Not many people think about the fact that
their public post on a social media platform is public in the sense that it is
no longer their private property and can be used by others in research.17
There are practical limits to what can be known through analysing peo-
ple’s social media posts, of course. First, people are not neutrally repre-
senting themselves on social media. As we know, people feel compelled to
publish reflections on an idealised version of their lives (Kruzan and Won
2019). Of course, our social media posts don’t always represent our lives
as happier than they actually are: people often exaggerate the impact of
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 203
minor negative events that are as mundane as missing the bus or being
rained on. Some people collectively engage in dissatisfaction with their lot
in life, leading to Twitter bubbles and what has become known as ‘the
culture wars’,18 as the contemporary cultural conflict between social
groups. This term describes a gap between those who side with a tradi-
tional, conservative approach, and those with a liberal, progressive
approach to society and social issues, such as immigration, abortion,
LGBTQIA+ rights, and so on. The contemporary culture wars, as a strug-
gle for dominance of values and beliefs, now takes place on Twitter, and
we might question the extent to which such rage and passion are indicative
of someone’s personal well-being, or some form of tribal rage on a larger
scale. Essentially, we are seeing how important social media can be in both
distorting and shaping our well-being for better or for worse. The key to
appreciating the relationship of social media, data and well-being is under-
standing limits and context—of collection and use.
Social Media Data Mining in Social and Cultural Sectors
Social media data mining is not always a large-scale affair requiring APIs
and special software. As found in a six-month research project with city
councils and a city-based museums group in the north of England
(Kennedy 2016), many small organisations use quite basic techniques to
do this work. Social and cultural policy sectors are reliant on understand-
ing well-being data, as improving well-being is at the core of what many
of them do. Yet, as Chap. 1 of this book acknowledges, the sectors do not
always have the skills or confidence to use data. We will look at these sec-
tors as a whole in greater depth in the next three chapters.
The project exploring how these smaller social and cultural organisa-
tions were already using data mining, wanted to understand how they
might use it more effectively. The researchers discovered that although
software packages were adopted to analyse institutional impact and
engagement on Twitter, this was largely unsystematic (Kennedy 2016, 71
& 72). Keen to improve their social media data mining capacity, these
organisations signed up for training in new tools that would improve their
capability. However, it became clear that less data mining was happening
than expected and the capacity of workshop participants to engage with
training in the new tools also fell away (Kennedy 2016, 74). Doing better
with data seems a good idea, but is not always as easily resourced or incor-
porated into working practices as initially hoped.
204 S. OMAN
Local councils, social and cultural sector organisations all have limited
resources. Despite enthusiasm for being, or becoming, data-driven, capac-
ity to invest time and money in new tools at the organisational level is
often lacking (Kennedy 2016; Oman 2019a, b). In the case of the cultural
sector, there is a tendency to invest in grand schemes, new metrics and
reports at policy level that claim to investigate the value of new and/or Big
Data and the associated technologies required to generate or analyse them
(Gilmore et al. 2018; Oman 2013a). However, when considering the
(already ill-defined) cultural sector19 as a whole, differences are obscured
in requirements and capacity for data technologies, which are multiplied
by huge variability in organisation size, type, purpose, mission and cultural
offering across and within sectors (Oman 2013a). These top- down
resources and contributions are not always actually used or found useful at
an organisational level or across the wider sector (Oman 2013a). Some
organisations recognise that their audiences are full of people whose opin-
ions are less easily captured by Big Data. Some people, for example, still
prefer booking telephone lines to web pages and are certainly not tweeting
or Instagramming their experience of a show. As such, some who attend a
show are less likely to be generating data on their opinions that might then
be mined. Advocates for using Big Data in small organisations acknowl-
edge that Big Data can be ‘debilitating’ in their complexity and challenges.
This is not always explored in a way that offers resolution (Oman 2013a),
and as we have seen (Kennedy 2016) when recommendations, even train-
ing, are offered, there is not necessarily the capacity to take them up.
Yet, it can be very easy and fast to interact with Big Data as social media
data, as long as you consider the limitations of the data and their origins,
as well as how you might analyse them yourself. Organisations and indi-
viduals do not need Big Data analytics know-how or software, although
there are excellent resources freely available to help them understand
how,20 as I found when I wanted to explore Twitter discussions about hap-
piness. In 2013, Mass Observation recreated the Bolton happiness study
on Twitter (see Fig. 5.3). This was still fairly experimental for them as
much as me when I requested access to the tweets. There were 25 responses
that they captured at the time.
The sample of 25 meant that—of course—I did not require data mining
or sentiment analysis software—or any knowledge of APIs. In fact, I did
not even need to request these tweets from Mass Observation directly, as
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 205
Fig. 5.3 Mass Observation happiness tweets
they are still available on Twitter by searching the hashtag (or were in
August 2020 when I last checked). A cursory analysis in this case simply
meant reading, and noting similarities and themes, which I could have
done on a piece of paper.
So, what did this cursory analysis tell me? Whilst 20% mentioned pets,
all of which were cats (it is the internet after all), one person replied with
a single word: bacon. Mainly, however, people described informal, every-
day participation,21 including reading, going to gigs, watching films. There
were lots of glasses of wine and some chocolate in there too. The textual
content of these tweets is reproduced in Box 5.1, without Twitter handles.
You might note the surprising varieties of theories of well-being we have
encountered so far in the book can be present in 25 tweets. Some map
onto clear areas of social policy, others are definitely in the private
domain. Some people used negative language to imply life isn’t currently
great for them: ‘Day off. Smoke in peace.’ And ‘Ability for women to
walk down the street & not be catcalled or threatened. Few happy
women here’. Some people were philosophical, others wistful. Some
focussed on activities, others on the ‘bliss’ of doing nothing. The variety
of tone and content makes for fascinating reading, but leaves these data
wide open to interpretation—whether that is via human or artificial
intelligence.
206 S. OMAN
Box 5.1 Tweets Answering the Question: ‘What Is Happiness?’
• Beer, maps, chocolate, quizzes, the unending pursuit of knowledge
• Ability for women to walk down the street & not be catcalled or threatened.
Few happy women here
• Short term happiness is different for everyone. Long term happiness is about
fulfilling your potential.
• Bacon
• 5 minutes to myself and a good book, with peppermint tea and the cats curled
up around me. Absolute bliss!
• Volunteering, yoga, baking, being with loved ones, reading, warm days paddling
in the sea, colourful things, exploring, my cat: D
• Doing what I love (#history), a safe home by the sea, someone to love & share
things with
• Good company, fireworks, being smiled at, a job well done, ‘sweet pea’ by
Manfred Mann, making someone else happy, good health.
• I am happiest when discovering/learning new things, such as reading books and
finding new music.
• Happiness is cooking for those I love, with a glass of wine and giggles on the
side.
• Day off. Smoke in peace.
• “What is happiness?” something to do with dopamine levels
• Making things that muself [sic], and hopefully other people will enjoy
• Loving and being loved and valued for who I actually am.
• More precisely: Time, a book, a view, a friend.
• Choices and control in life not just in shopping.
• Connecting with other people, being able to make a difference to someone else,
a good book and a purring cat on my lap!
• My kids
• What is happiness?’—“A warm spot on the bed in the sunshine”
• Knowing that enough is plenty
• The scent of roses on a damp morning […] being where you are without
wishing to be somewhere else
• Happiness is seeing my children flourish, Swansea City FC progress & succeed
& cooking for husband. Ln that order!;)
• Love, health and a sense of purpose. Oh, and cake.
• What makes me happy? Cuddling up on the sofa with my partner & animals, a
glass of wine, chocolate, a film & crochet- bliss
• Happiness is good relationships, a little more than enough money, satisfaction
and contentment
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 207
I used these tweets as a light-hearted example, with my ever so light-
touch analysis, in my first ever conference presentation in 2013. In Chap.
3, I explained that my research question at the beginning of my PhD was
loosely: ‘When people describe well-being, how often do they talk about
participating in different kinds of activities—and what might that tell us
about aspects of social and cultural policy?’ or ‘how can qualitative data
collected to understand well-being tell us how people feel about what they
do?’. I noted in this presentation that state-funded cultural practices (like
art galleries and museums) were less frequently mentioned by people as
making them happy than what is called everyday participation (Oman
2013b). This same finding emerged from my reanalysis of the ONS free
text data I used in my PhD (Oman 2017, 2020). By extension, these data
(with their caveats) were another dataset to suggest we should question
whether cultural funding was supporting activities that made people hap-
pier or increased their well-being.
This was not the only way of analysing these tweets to make an argu-
ment about the relationship between culture and well-being. Someone
else may have counted how many of these responses included something
creative and used their analysis to argue they have found the value of cul-
ture to people, thereby justifying more funding. These are debates about
data and their use in politics and policy that we return to in the next chap-
ter. What is important here is that even with (arguably, especially with)
such a small dataset we can see how human bias can interact with data and
lead to different arguments.
If it is difficult for humans to make categorical claims from a form of
sentiment analysis that is not much more systematic or technical than
reading 25 tweets, we must remember these limits when these analyses are
made through machine learning. This is especially vital as time-sensitive
analyses of large-scale samples of emotional expressions are being used in
research on COVID-19, particularly given they are seen to have the poten-
tial to inform mental health support and help tailor risk communication to
change behaviours (i.e. Pellert et al. 2020). As with all data uses men-
tioned in this book, it is not that using social media data, or automated
sentiment analyses are necessarily bad, but rather, that their limits should
be recognised. As ever, it is an issue of methodology, transparency context
and legibility.
208 S. OMAN
Understanding Where People Are and How They Feel Using
Twitter Data
Of course, it is not only what people say that can be mined, but also where
they are. One research project attempted to gauge community well-being
using Twitter data from between 27 September and 10 December 2010
(Quercia et al. 2012). Interestingly, as an aside, this coincided with the
UK’s Measuring National Well-being debate which launched in November
of that year. The researchers were interested in a few things. They wanted
to understand more than individuals, to measure the well-being of com-
munities. They state their intention as moving the recent developments in
subjective well-being measures that we discovered in the last chapter for-
ward. Rather than administering questionnaires on an individual basis, or
in a national-level survey, they wanted to explore the recent possibilities of
sentiment analysis to understand community well-being,
Social media data can significantly reduce the time-consuming pro-
cesses that make large-scale surveys and qualitative work resource-heavy.
Once these data have been ‘scraped’ and saved into a database, they can be
analysed in many ways. In the case of Querica and their co-authors, they
were interested in the idea of using sentiment analysis to see if it could
interpret community well-being. They created a sentiment metric, which
was originally derived from studying Facebook status updates (Kramer
2010). This metric standardised the difference between the percentage of
positive and negative words in a Facebook user’s posts in one day. Kramer
used the metric to make arguments at a national level, aiming to develop,
as he suggests in the title of his paper, ‘An Unobtrusive Behavioral Model
of “Gross National Happiness”’.
His new standardised metric was found to correlate with self-reported
life satisfaction. Looking at the US specifically, peaks were found in life
satisfaction that correlated with national and cultural holidays. This is fine
in and of itself, but what does that tell us about well-being? Christmas is
good for well-being? Other research indicates otherwise (Holmes and
Rahe 1967; Mutz 2016), suggesting it can cause feelings of stress for vari-
ous reasons: financial, family, and so on. What about the days either side
when people are travelling huge distances (with everyone else) using
transport infrastructure which is not fit for purpose? Or the excesses of
consumption that holidays like Christmas involve, as well as their impact
on the planet? What about all those who do not celebrate Christmas, as
they are not of a Christian denomination? In his limitations, the author
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 209
acknowledges that there is a possibility that the likelihood to wish people
‘Happy Christmas’ could have affected these results. However, he decided
not to control for this, as wishing someone happy holidays is a positive
sentiment. We might wonder, then, whether this study was really inter-
ested in the possibilities for understanding the human experience using
the details of the Facebook posts, or whether it was interested in deriving
a metric that was comparable with more established methods.
Returning to the study on community well-being, the authors state, ‘it
is not clear whether the correspondence between sentiment of self-
reported text and well-being would hold at community level, that is,
whether sentiment expressed by community residents on social media
reflects community socio-economic well-being’ (Quercia et al. 2012,
965). Therefore, they do note some of the limitations of using this
approach to answer their research question. However, notably, they do
not acknowledge some of the limitations of the metric itself.
London was chosen for the study to understand about communities,
socio-economics and well-being. Let’s break down what they did and
how. The study used four types of data gathering, it:
1. ‘Crawled’ Twitter accounts whose user-specified locations report
London neighbourhoods.
2. Geo-referenced the Twitter accounts by converting their locations
into longitude—latitude.
3. Measured socio-economic prosperity, using the UK’s IMD.22
4. Conducted sentiment analysis on tweets between particular dates
from their sample.
How did these processes work?
1. How the crawl worked: the researchers chose three popular
London-based profiles of news outlets: the free newspaper The Metro,
which was available in London on the Tube at the time (it has since
expanded), a right-wing tabloid The Sun and the centre-left newspaper
The Independent. These media were chosen because they are thought to
capture different demographics of class and politics. Using these three
accounts as ‘seeds’, they used ‘a crawler’ to trace linked accounts. Crawlers
are software that allows you to gather various kinds of available data based
on who interacts with a particular website or Twitter account. In this
instance, every user following these accounts was ‘crawled’.
210 S. OMAN
2. Some Twitter users stated where they live in their profiles.
Accounts were crawled to find 157k of 250k profiles had listed locations,
with 1323 accounts specified London neighbourhoods. They then filtered
out likely bots by also ‘crawling’ using another metric23 for each profile.
This brought the sample down to 573 profiles. Once these were estab-
lished, locations were converted into longitude-latitude pairs, translating
these data into geographical co-ordinates which are easier to work with.
3. The IMD is broken into 32,482 areas, 78 of these are within the
boundaries of London used by the authors (these are not necessarily
fixed). The IMD offered a score for each of London’s 78 census areas. The
authors use a census area to represent ‘a community’. We shall return to
this key point in a bit, but hold that thought. The data comes from the
ONS’ Census and is an objective list of sorts: income, employment, educa-
tion, health, crime, housing, and the environmental quality. It is worth
noting that in the IMD, the ONS talk about ‘Lower Layer Super Output
Areas’ (LSOAs), rather than communities.
4. Sentiment analysis was undertaken on the tweets using two algo-
rithms. (1) Kramer’s metric described and (2) something called a
‘Maximum Entropy classifier’, which uses machine learning. The algo-
rithm in Kramer’s metric has a limited dictionary, so this second machine
learning package was used to improve on the first, by using a training
dataset of tweets with smiley and frown-y faces. The authors argue that the
results across the two algorithms correlate and are accurate. They then
measured the sentiment expressed by a profile’s tweets and then compute,
for each region, an aggregate sentiment measure of all the profiles in
the region.
Findings: So what did they find? Through studying the relationship
between sentiment and socio-economic well-being they found that ‘the
higher the normalised sentiment score of a community’s tweets, the higher
the community’s socio-economic well-being’. In other words, the senti-
ment metric accounted for positive and negative sentiments, enabling each
area’s aggregated data to show an average score. This tended to correlate
with the scale that they used that indicates poverty and prosperity in that
locale (the IMD).
Limitations—What did the authors identify as limitations?
Demographic bias—Twitter users are certain types of people; there-
fore, these findings will over-represent the happiness of Twitter users—
missing out on non-users.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 211
Causality—our old friend. Though the causal direction is difficult to
determine from observational data, one could repeatedly crawl Twitter
over multiple time intervals, and use a cross-lag analysis to observe poten-
tially causal relationships.
Sentiment—They tracked sentiment but not ‘what actually makes
communities happy’ (Quercia et al. 2012, 968). The intention was to
compare topics across communities. Their example:
given two communities, one talking about yoga and organic food, and the
other talking about gangs and junk food, what can be said about their levels
of social deprivation? The hope is that topical analysis will answer this kind
of question and, in so doing, assist policy makers in making informed choices
regarding, for example, urban planning. (Quercia et al. 2012, 968)
As evidenced with the possibilities for making an argument using the
crude analysis of the Mass Observation tweets, and as suggested by the
citation directly above, there is bias in the ways that Big Data can be used
to inform social and cultural policy. However, this is not necessarily any
more the case in these examples than in those using more traditional data
sources explored earlier in the book. The ways our social worlds are
ordered do not reside in the algorithms, but in the preconceptions, lazi-
ness and judgements which become reproduced through researchers and
their research and through policy-makers and their decisions. While the
Quercia et al. examples were presented as a binary of opposites for narra-
tive effect, the ridiculousness of the proposition may not stop it coming
into effect as a deductive study in future. The fact that gangs are unlikely
to tweet about gangs is one thing. Furthermore, the idea that these gangs
remain within their ONS-allocated geographical boundaries called LSOAs
is also a nonsense.
This brings me to another point, LSOAs are not communities: not in
the way that we think of community well-being as built on social relations
and inter-related lives. People are not only active citizens where they live,
and in a city like London especially, may actually be more likely to be
active citizens where they work. Without the context of understanding
London, what it is to live in London, and the complex, overlaid commu-
nities and social groups that comprise a postcode, this idea of community
well-being is a misnomer. Instead, it matches one index that uses census
data, which, while valuable, can be out of date, and is well-known for its
various limitations as a metric of socio-economic deprivation or advantage.
212 S. OMAN
Perhaps another way to look at a question of community well-being
might be to look at people interacting in public space. Plunz et al. (2019)
also used sentiment analysis with geo-located Twitter data. They were
interested in finding well-being indicators associated with urban park space.
Their goal was to assess if tweets generated in parks may express a more
positive sentiment than tweets generated in other places in New York City.
Their results suggest that tweets in Manhattan are different from other
NYC boroughs. In Manhattan, people’s tweets were more positive outside
of parks than inside, whereas the opposite was true outside of Manhattan.
They concluded that Twitter data could still be useful for aspects of social
policy, including urban design and planning. They also note that one of the
limitations of geo-located Twitter data is that GPS is less accurate than
sometimes accounted for. It also does not account for elevation, so you
could be on the metro underneath Central Park, or indeed, stuck in traffic
alongside it. It is hard to establish whether people may have gone for a walk
to let off steam, or commute to work, for example.
The relationships between where we are standing or where we live and
our well-being are not new, but a feature of much philosophy on the
nature of subjective experience, especially since the Enlightenment (which
we shall come to in the next chapter). Big Data offer new ways to test what
we know about place. However, these data and devices also make assump-
tions about place and experience (Wilmott 2016). The expectations and
suppositions of what happens where, for whom and how drive these analy-
ses with the same bias as other Big Data technologies, and we must be
aware of the limitations of these data, technologies and the ideas of well-
being they claim to measure. We also need to be vigilant about who holds
the data and why they are analysing.
5.6 Fit for Purpose? Health and Well-being
Tracking and Apps
Recent technological developments have seen a rise in people using wear-
able technologies and their mobile phones to track their movements
and behaviour. These include: periods of activity, menstruation, what they
have eaten, how they have slept, how far they have walked and their heart
rate, in order to gain an overall picture of their health and general well-
being. These practices are frequently called the Quantified Self movement
(Ruckenstein and Pantzar 2017), which refers both to the cultural phe-
nomenon of self-tracking using one’s own data, as well as the community
of people who use and share data in this way.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 213
The technologies are increasingly popular and are being discussed as
cost-savers for the NHS, but there are barriers to their use (Jee 2016).
Around five years ago, 85% of the general population did not own wear-
able devices (Lee et al. 2016). Therefore, measures which use datasets
from these technologies will only account for a proportion of the popula-
tion, who are most likely to be younger and more affluent (Strain et al.
2019) and already demonstrating an investment in their current and future
well-being by owning such a device in the first place. We also do not yet
fully understand the impact of COVID-19 on wearable devices and app
use, as at the beginning of the crisis there were stories about governments
using these data to monitor compliance with lockdown measures (Digital
Initiatives 2020). YouGov polling data24 indicate that even in July 2020,
65% of the UK had still never owned a wearable device, with 22% currently
using one (with everyone else having tried one, or owned one but not cur-
rently using one). However, the same YouGov data indicate that usage has
increased from 22% to 27% in January 2021, and those who have never
owned a device has decreased at a similar rate. Therefore COVID-19 has
seen an increase in wearable technology, as people take an interest in their
well-being data in new ways.
Self-tracking, or the practice of generating or capturing data about
everyday activities like eating, exercise for purposes of self-improvement,
puts data and control in the hands of people, as well as the corporations
which produce self-tracking devices and the third parties with which these
data are shared (Kennedy et al. 2020). The research is ambivalent as to
whether the experience of self-tracking has positive benefits, such as per-
ception of control, agency or, in the case of professional or amateur sport-
ing, opportunities for new communities (Ajana 2017; Lupton 2019; Pink
and Fors 2017). It is also thought that these practices in and of them-
selves, and in their relationship to control, may decrease well-being more
generally (Kennedy et al. 2020).
Data collected via mobile phone apps present similar possibilities for
community and compromise. Smartphone access and usage only account
for certain sections of a national demographic, much like wearable devices.
Similarly, people who download an app to better understand their well-
being are already self-selecting as wanting to improve their well-being, and
therefore may not be considered a representative sample. A number of
apps in the early 2010s wanted to further develop the insights gained from
better understanding subjective well-being measurement.
214 S. OMAN
In 2012, experts in geography and the lived environment based at the
London School of Economics created a mobile phone app to understand
happiness (MacKerron and Mourato 2013). What they branded a
‘hedonimeter’ (after the nineteenth-century invention we discovered in
Chap. 2), the ‘Mappiness’ app asked people to allow the app to collect
objective data about where they were (automatically, using GPS data),
what activity they were doing, and who they were with (as manual entries).
It also asked them to provide hedonic responses (subjective well-being
data) as to how awake, happy and relaxed they were. These data were col-
lected using sliders instead of the more traditional scales we have previ-
ously encountered. The data collected by the app were used in a number
of different ways to appreciate subjective well-being and we will touch on
a couple here.
In 2015, a report which drew on this data was published. ‘Cultural
Activities, Artforms and Wellbeing’ reported on research commissioned
by Arts Council England (ACE). The authors evaluated the hedonic read-
ings of various activities found in the data collected by the app (Fujiwara
and MacKerron 2015). Table 5.4 shows what the authors describe as ‘hap-
piness activities rankings’, with theatre, dance and concert appearing to
have the highest effect, and reading the lowest, unless you incorporate
Table 5.4 ‘Happiness activitiesa rankings’
Activities Coefficient
Theatre, dance, concert 8.735***
Singing, performing 7.731***
Exhibition, museum, library 7.457***
Hobbies, arts, crafts 5.737***
Talking, chatting, socialising 3.789***
Drinking alcohol 3.646***
Listening to music 3.518***
Childcare, playing with children 2.888***
Reading 2.331***
Watching TV, film 2.084***
Housework, chores, DIY −0.651***
Source: Fujiwara and MacKerron (2015)
a
The table shows coefficients, rather than rankings. Compared with the baselines, these coefficients report
how much happier participants reported being when participating in these activities on a scale, when rel-
evant variables have been controlled for. The coefficient shows the size of the impact on happiness from
doing the activity (where happiness is measured on a scale of 0-100). All variables were statistically
significant.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 215
other ‘everyday participation’ activities, such as TV watching. As you can
see housework, chores and DIY is negatively associated with happiness.
Other studies cited in this report indicate that theatre has less of an
effect on life satisfaction, whereas reading fares much better (Leadbetter
et al. 2013). As we encountered in Chap. 4, there are conceptual differ-
ences between life satisfaction and happiness, and common sense might
tell us that reading and attending a theatre performance present different
kinds of well-being experiences. Yet, seeing that reading looks quite bad
for well-being is surprising at first glance. Elsewhere in the report are
regression tables25 for other activities, including birdwatching, gardening
and hunting and fishing which are significantly better than watching a
film—or indeed—poor old reading that doesn’t win on these happiness
scales. Interestingly, when you go back to the Twitter data answering the
question: ‘what is happiness?’ (Box 5.1) there were many responses that
answered reading, curling up on the sofa and watching a film, and so on.
While the limited sample of the Twitter data makes it impossible to gener-
alise, it certainly still poses questions as to what is going on with con-
founding results in various happiness data. One thing that struck me
returning to these cases in 2020, a world changed by COVID-19, is the
difference between activities in the home and outside the home.
Interestingly, the app’s inventors co-authored an academic article for
the journal Global Environmental Change. Using the same data, they
found that outdoor activities were better for well-being. They state:
[T]he predicted happiness of a person who is outdoors (+2.32), birdwatch-
ing (+4.32) with friends (+4.38), in heathland (+2.71), on a hot (+5.13)
and sunny (+0.46) Sunday early afternoon (+4.30) is approximately 26 scale
points (or 1.2 standard deviations) higher than that of someone who is
commuting (−2.03), on his or her own, in a city, in a vehicle, on a cold, grey,
early weekday morning. Equivalently, this is a difference of about the same
size as between being ill in bed (−19.65) vs doing physical exercise (+6.51),
keeping all other factors the same. (MacKerron and Mourato 2013, 997)
The numbers in the brackets refer to ‘the scale points’, showing the
increase in probable happiness by where people are, what day of the week
it is, what time of day it is. Interestingly, the greener the space you are in
and the hotter the day (if sunniness seems less important than you might
expect), the better. While this may appear to be common sense in one way,
when you think back to how policy relies on evidence to improve well-
being, what are the policy messages here from an investment point of view?
216 S. OMAN
I had this app for a while and my results always told me that I was hap-
piest in a pub beer garden with my best friends. Did I know that the data
I was ploughing in when the app beeped me to do so was going to poten-
tially be used to inform policy-making? Well, yes, of course, I guessed that,
because I was researching well-being data and policy, which was why I
downloaded the app in the first place. But did most people who were
interested in how they felt doing certain things imagine the contexts of
their data’s potential future use?
What policy decisions should be made about beer gardens off the back
of my interactions with some sliders on a mobile phone app after a few
ciders on a summer’s day? While these data were collected at a scale that
means my personal data and my interactions are no longer visible on an
individual level, it does pose questions for some of the correlations we
make with these data. Are people happier on a weekend because they are
not working or because they can go to the pub?
5.7 Conclusion
Despite the conflicting evidence from different approaches to ‘Big Data’,
people are keen to find new ways to harness them to answer the age-old
policy and philosophy questions around people’s well-being. The increase
in well-being research coincides with an increase in research with and on
Big Data. Both have possibilities and challenges, but could they be exacer-
bated by combining well-being research with these data practices? Do Big
Data have a capacity for good when making decisions about young people’s
exam grades or whether someone is eligible for social housing? We reflected
on some important examples of where this went awry in this chapter.
New methods and metrics using Big Data, and indeed the research
going into developing new tools to harness them, are not necessarily being
checked for rigour before the approach is used elsewhere, as was the case
with the Twitter community study, and its use of the sentiment metrics.
Generalising people’s happiness based on mobile phone data has its limita-
tions. We cannot necessarily be entirely sure of whether it is the aesthetic
grandeur of an old Victorian bandstand in the park, whether there is a
classical concert inside, if you had enough sleep, whether you are picnick-
ing with your favourite friends, with your kids, or having time away from
your kids; indeed, whether you are stuck on a delayed tube underneath the
park, or are walking in a hailstorm, that truly adds to (or detracts from)
your momentary happiness.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 217
The ethics of studying Big Data more broadly should be considered,
and the behaviours of those who are outside the sample of users of wear-
able tech or smartphones, especially as these people may be older or
poorer, for example, which we know intersects with well-being in very
significant ways. Despite this, claims are still made that findings from these
studies could be used to inform policy and investment. While they can
offer some insights, we must be mindful of their limits—and crucially of
their implications, especially in different contexts.
All in all, Big Data and new technologies, whilst not always revolution-
ary in kind, can offer insights into well-being that are useful for policy-
makers on a national scale, in international pandemics and for people who
simply want to see what people think. But they are not without their lim-
its, nor are they a magic bullet to the issues we have with existing data. If
anything, they are also shown to have the potential to exacerbate existing
problems as much as investigate solutions.
The capacity for Big Data to embrace complexity, and at greater speed,
means they present new opportunities to analyse health data—and cru-
cially how health intersects with social concerns. Reflecting back from
today on how crude the Google Flu Trends analysis in 2013 now seems,
it is clear that Big Data technologies and techniques are improving at pace.
The COVID-19 example, BlueDot, shows that the value of Big Data anal-
yses is in their capacity to now cope with more of Big Data’s qualities at
the same time, and in fact, to harness them: their messiness, variability, size
and the capacity to link previously unconnected data sources from farming
information to flight sales. The value was in the variety of data and sources
used. Yet harnessing the power of Big Data was not powerful enough to
prevent a worldwide crisis, despite the grand claims.
What we think of as ‘Big Data’ offer a peculiar perspective on ‘well-
being’. Consider the different things they capture, from sleep patterns to
elite cycle trails to facial recognition and how many steps your walk to the
post office takes. These devices exist to capture and produce data because
data can be useful and commercialised. We are not even clear on whether
more knowledge of the self is good for well-being or bad (yet?), let alone
whether it is good at scale: that governments (and who else) know more
about us. What is clear is that data are producing and changing culture
and society, as much as they are capturing it.
We need to ask questions around the commercial value of these data
practices alongside social justice issues. How would these data have had a
greater chance of improving well-being were the contexts in which they
218 S. OMAN
were analysed different? Who should be included in these discussions, and
who is excluded? Ultimately, how will decisions and trade-offs be made
between the commercial and social justice dimensions?
Notes
1. In fact, what a lot of people refer to as Big Data are not ‘Big’ at all by the
initial standards of definition. They are just large datasets or newer types of
data in not even large datasets, and so arguably not Big at all.
2. Kitchin and McArdle’s (2016) original table says, ‘Limited to wide’ here
(p2), but I think this makes more sense, as: ‘Limited in width’ or narrow.
3. A digital sociologist is interested in understanding the use of digital media
(often data) as part of everyday life, and how these various technologies
contribute to patterns of human behaviour, identity, relationships and
social change.
4. O’Neil describes how the bottom scoring 2–5% of teachers were fired. Yet,
the modelled target student scores and small classrooms made the scoring
of teachers little better than random, and there was almost no correlation
in a teacher’s scores from one year to the next and qualitative data called
one of the sacked teachers ‘one of the best teachers I’ve ever come into
contact with’ (O’Neil 2016, 4).
5. Critical Data Studies are moving for more fairness accountability and trans-
parency in data practices. Please see the FAccT conference for more on
this: https://facctconference.org/.
6. This is largely credited to the 2017 article in the Economist, ‘The world’s
most valuable resource is no longer oil, but data’ (The Economist 2017).
7. With the exceptions of 1941 (during World War II) and Ireland in 1921.
8. Although, of course, given what we have seen elsewhere in the book, we
might question whether the changing possibilities for what data could
describe, changed policy, rather than the other way around.
9. There were a number of iterations of Mass Observation, with different
people initiating them, but these were the original founding members.
10. There were no women observing anything in those days, of course.
11. See Mass Observation (n.d.) website for more on the data available and
how to access them.
12. Several new methodologies are emerging that propose new possibilities for
well-being measurement through combining new data sources with the
survey data we have explored in previous chapters (Bellet and Frijters
2019; Daas et al. 2013; Jahani et al. 2017). These are not only hoping to
understand well-being as personal or subjective experience, but to change
the way that social justice issues such as poverty are approached
(Blumenstock 2016). International organisations such as the United
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 219
Nations are supporting this kind of work, although primarily focussing on
patterns of ‘health and well-being’ (United Nations 2014, 2015).
13. More information is available on the REACT’s data collection and man-
agement here: https://www.ipsos.com/ipsos-mori/en-uk/covid-19-
swab-test-faqs#nameaddress.
14. REACT was commissioned by the Department of Health and Social Care
(DHSC) and is being carried out by Imperial College London in partner-
ship with Ipsos MORI and Imperial College Healthcare NHS Trust.
https://www.imperial.ac.uk/medicine/research-and-impact/groups/
react-study/.
15. A review of literature on data and data practices, Kennedy et al. (2020),
found that tech and policy were considered different worlds when it comes
to data practices, and with different aims, although that is evolving.
16. See Internet Live Stats, ‘Google search statistics’ (Internet Live Stats
n.d.). Internet Live Stats offer plenty more up-to-date data on data, if you
are interesed.
17. For the ethical concerns regarding social media research, see Townsend
and Wallace (2016).
18. See Davies 2018 for a discussion on the greater implications of ‘the culture
wars’ for politics and community.
19. If you are reading this chapter a while after reading the previous ones, then
the cultural sector is a broad description of cultural institutions like librar-
ies, heritage sites, museums, theatres and so on. Crucially, it is not only
about the buildings themselves, but all the ways people make and consume
culture and can include Netflix and outdoor festivals. In the UK, the cul-
tural sector includes organisations funded by public subsidy as well as com-
mercial organisations.
20. This post from Wasim Ahmed (2019) offers a clearly presented overview
of the kinds of analyses available using different software https://blogs.lse.
ac.uk/impactofsocialsciences/2019/06/18/using-twitter-as-a-data-source-
an-overview-of-social-media-research-tools-2019/
21. ‘everyday participation’ (Miles and Sullivan 2010) has come to mean the
everyday activities we participate in, which tend to fall outside of formal
subsidy, which tendentially funds ‘the arts’.
22. IMD is the UK government’s Index of Multiple Deprivation.
23. This is called the PeerIndex realness score. This score is generated using
information such as whether the profile has been self-certified on the
PeerIndex site and/or has been linked to Facebook or LinkedIn. ‘PeerIndex
realness score is a metric that indicates the likelihood that the profile is of
a real person, rather than a spambot or twitter feed. A score above 50
means this account is of a real person, a score below 50 means it is less
likely to be a real person’ (http://www.peerindex.net/help/scores).
24. See YouGov (n.d.) ‘Brits use of wearable device’.
220 S. OMAN
25. A regression table like the one reproduced in Table 5.4 will mainly be con-
cerned with communicating the degree of association between variables.
Chapters 7 and 8 go into this in far greater detail. The values will always lie
between 0 and 1, and the way this table has been presented shows simplified
detail. Ordinarily there is additional information to show not only the degree
of association, but how sure we can be that this is a correct estimate. There will
always be a degree of error that has to be accounted for. Typically in a regres-
sion table, you will find asterixes, as in Table 5.4. Asterisks in a regression table
indicate the level of the statistical significance of a regression coefficient.
References
Ada Lovelace Institute. 2019. Beyond Face Value: Public Attitudes to Facial
Recognition Technology. Accessed 28 April 2021. https://www.adalovelacein-
stitute.org/report/beyond-face-value-public-attitudes-to-facial-recognition-
technology/.
Ahmed, W. 2019. Using Twitter as a Data Source: An Overview of Social Media
Research Tools (2019). Impact of Social Sciences. Accessed 28 April 2021.
https://blogs.lse.ac.uk/impactofsocialsciences/2019/06/18/
using-t witter-a s-a -d ata-s ource-a n-o ver view-o f-s ocial-m edia-r esearch-
tools-2019/.
Ajana, B. 2017. Self-Tracking: Empirical and Philosophical Investigations. Springer
International Publishing. https://doi.org/10.1007/978-3-319-65379-2.
Albert, A. 2019. Citizen Social Science: A Critical Investigation. PhD thesis.
University of Manchester. https://www.escholar.manchester.ac.uk/api/
datastream?publicationPid=uk-ac-man-scw:319481&datastreamId=FULL-T
EXT.PDF.
Avila, R. 2019. Fixing Digital Democracy? The Future of Data-Driven
Political Campaigning. openDemocracy. Accessed 28 April 2021. https://
www.opendemocracy.net/en/fixing-d igital-d emocracy-f uture-o f-d ata-
driven-political-campaigning/.
Bache, I., and Reardon, L. 2013. An Idea Whose Time has Come? Explaining the
Rise of Well-Being in British Politics. Political Studies, 61(4), 898–914.
https://doi.org/10.1111/1467-9248.12001.
Bates, J. 2016. Towards a Critical Data Science—The Complicated Relationship
Between Data and the Democratic Project. Impact of Social Sciences. https://
blogs.lse.ac.uk/impactofsocialsciences/2016/01/12/towards-a-critical-data-
science-data-and-the-democratic-project/.
Bates, J., Lin, Y.-W., and Goodale, P. 2016. Data Journeys: Capturing the Socio-
material Constitution of Data Objects and Flows. Big Data & Society 3 (2):
p. 2053951716654502. https://doi.org/10.1177/2053951716654502.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 221
BBC. 2019. London Mayor Quizzes King’s Cross Developer on Facial Recognition.
BBC News, 14 August. Accessed 29 April 2021. https://www.bbc.com/news/
technology-49343822.
———. 2020. Coronavirus: UK to Have Test, Track and Trace System by June.
BBC News. Accessed 28 April 2021. https://www.bbc.co.uk/news/av/
uk-politics-52745202.
Bellet, C. and Frijters, P. 2019. Big Data and Well-being, p. 26. Accessed 28 April
2021. https://worldhappiness.report/ed/2019/big-data-and-well-being/.
Benjamin, R. 2019. Race After Technology: Abolitionist Tools for the New Jim Code.
Medford, MA: Polity.
Boyd, D., and Crawford, K. 2012. Critical Questions for Big Data. Information,
Communication & Society, 15(5), 662–679. https://doi.org/10.108
0/1369118X.2012.678878.
BlueDot. n.d. BlueDot | Who We Are, BlueDot. Accessed 2 May 2021. https://
bluedot.global/team/.
Blumenstock, J.E. 2016. Fighting Poverty with Data. Science 353 (6301):
753–754. https://doi.org/10.1126/science.aah5217.
Campbell, D. 2017. Facebook and Twitter ‘Harm Young People’s Mental
Health’. The Guardian. Accessed 28 April 2021. http://www.theguard-
ian.com/society/2017/may/19/popular-social-media-sites-harm-young-
peoples-mental-health.
Carroll, C., J.C. Fuhrer, and D.W. Wilcox. 1994. Does Consumer Sentiment
Forecast Household Spending? If So, Why? The American Economic Review 84
(5): 1397–1408.
Charrington, S. 2020. How AI Predicted the Coronavirus Outbreak with Kamran
Khan—#350. Accessed 28 April 2021. https://www.youtube.com/
watch?v=V6BpKSGquRw.
Coughlin, T. 2018. 175 Zettabytes By 2025. Forbes. Accessed 29 March 2021.
https://www.forbes.com/sites/tomcoughlin/2018/11/27/175-zettabytes-
by-2025/.
Cryle, P.M., and E. Stephens. 2017. Normality: A Critical Genealogy. Chicago:
The University of Chicago Press.
Daas, P. J. et al. 2013. Big Data and Official Statistics. In Proceedings of the
NTTS. New Techniques and Technologies for Statistics, pp. 5–7.
Davies, B., Innes, M. and Dawson, A. 2018. An Evaluation of South Wales
Police’s Use of Automated Facial Recognition. Cardiff: Crime and Security
Research Institute, p. 46. https://www.statewatch.org/media/documents/
news/2018/nov/uk-s outh-w ales-p olice-f acial-r ecognition-c ardiff-u ni-
eval-11-18.pdf.
Davies, W. 2015. The Happiness Industry: How The Government and Big Business
Sold Us Well-Being. London: Verso.
———. 2018. Nervous States: How Feeling Took Over the World. Jonathan Cape.
222 S. OMAN
Dencik, L. 2020. The Datafied Welfare State: A Perspective from the UK, 24.
Cardiff: Cardiff University. https://datajusticeproject.net/wp-content/
uploads/sites/30/2020/09/The-Datafied-Welfare-State_draft.pdf.
Denham, E. 2019. Statement: Live facial recognition technology in King’s Cross.
ICO. Accessed: 19 August 2019. https://ico.org.uk/about-the-ico/news-
and-events/news-a nd-b logs/2019/08/statement-live-facial-recognition-
technology-in-kings-cross/.
Digital Initiatives. 2020. Strava: Striving in the Time of Corona? Digital Innovation
and Transformation. Accessed 28 April 2021. https://digital.hbs.edu/
platform-digit/submission/strava-striving-in-the-time-of-corona/.
Dodge, M., and Kitchin, R. 2005. Codes of Life: Identification Codes and the
Machine-Readable World. Environment and Planning D: Society and Space,
23(6), 851–881. https://doi.org/10.1068/d378t.
Eubanks, V. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and
Punish the Poor. St. Martin’s Publishing Group.
Fujiwara, D., and G. MacKerron. 2015. Cultural Activities, Artforms and
Wellbeing. London: Arts Council England.
Fussey, P., and D. Murray. 2019. London-Met-Police-Trial-of-Facial-Recognition-
Tech-Report.pdf. Essex: University of Essex, p. 128. Accessed 28 April 2021.
https://48ba3m4eh2bf2sksp43rq8kk-w pengine.netdna-s sl.com/wp-
content/uploads/2019/07/London-Met-Police-Trial-of-Facial-Recognition-
Tech-Report.pdf.
Gayle, D. 2018. Diane Abbott: Twitter Has ‘Put Racists into Overdrive. The
Guardian. Accessed 28 April 2021. https://www.theguardian.com/poli-
tics/2018/dec/18/diane-a bbott-c alls-f or-t witter-t o-c lamp-d own-o n-
hate-speech.
Gilmore, A., Kostas, A., and Albert, A. 2018. ‘Never Mind the Quality, Feel the
Width’: Big Data for Quality and Performance Evaluation in the Arts and
Cultural Sector and the Case of ‘Culture Metrics’. In G. Schiuma and
D. Carlucci (Eds.), Big Data in the Arts and Humanities: Theory and
Practice. Boca Raton: Taylor and Francis.
Hacking, I. 1990. The Taming of Chance. Cambridge: Cambridge University Press.
———. 1991. How Should We Do the History of Statistics? In The Foucault
Effect: Studies in Governmentality, ed. G. Burchell, C. Gordon, and P. Miller.
Chicago: The University of Chicago Press.
Harford, T. 2017. How the World’s First Accountants Counted on Cuneiform.
BBC News. Accessed 28 April 2021. https://www.bbc.co.uk/news/
business-39870485.
Heaven, W.D. 2020. AI Could Help with the Next Pandemic—But Not with This
One, MIT Technology Review. Accessed 2 May 2021. https://www.tech-
nologyreview.com/2020/03/12/905352/ai-c ould-h elp-w ith-t he-n ext-
pandemicbut-not-with-this-one/.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 223
Hill, K., and A. Krolik 2019. How Photos of Your Kids are Powering Surveillance
Technology. The New York Times. Accessed 28 April 2021. https://www.
nytimes.com/interactive/2019/10/11/technology/flickr-f acial-
recognition.html.
Hintz, A., and J. Brand. n.d. Data Policies: Approaches for Data-Driven Platforms
in the UK and EU. Cardiff: Data Justice Lab, p. 30. https://datajustice.files.
wordpress.com/2020/01/data-policies-research-report-revised.pdf.
Holmes, T.H., and R.H. Rahe. 1967. The Social Readjustment Rating Scale.
Journal of Psychosomatic Research 11 (2): 213–218. https://doi.
org/10.1016/0022-3999(67)90010-4.
Internet Live Stats. n.d. Google Search Statistics—Internet Live Stats. Accessed 28
April 2021. https://www.internetlivestats.com/google-search-statistics/.
Jahani, E., et al. 2017. Improving Official Statistics in Emerging Markets Using
Machine Learning and Mobile Phone Data. EPJ Data Science 6 (1): 1–21.
https://doi.org/10.1140/epjds/s13688-017-0099-3.
Jee, C. 2016. Wearable Tech: Could It Save the NHS?, Techworld. Accessed 15
September 2016. http://www.techworld.com/wearables/could-wearables-
save-nhs-3621960/.
Kennedy, H. 2016. Post, Mine, Repeat: Social Media Data Mining Becomes
Ordinary. New York; Secaucus: Palgrave Macmillan UK. https://doi.
org/10.1057/978-1-137-35398-6.
Kennedy, H., Oman, S., Taylor, M., Bates, J., and Steedman, R. 2020. Public
Understanding and Perceptions of Data Practices: A Review of Existing
Research. Sheffield: The University of Sheffield. https://livingwithdata.org/
project/wp-content/uploads/2020/05/living-with-data-2020-review-of-
existing-research.pdf.
Kitchin, R. 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures
and Their Consequences. SAGE.
Kitchin, R., and G. McArdle. 2016. What Makes Big Data, Big Data? Exploring
the Ontological Characteristics of 26 Datasets. Big Data & Society 3 (1):
p. 2053951716631130. https://doi.org/10.1177/2053951716631130.
Kramer, A.D.I. 2010. An Unobtrusive Behavioral Model Of ‘Gross National
Happiness’. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems. CHI 10, 287–290. Atlanta: Association for Computing
Machinery.
Kruzan, K.P., and A.S. Won. 2019. Embodied Well-Being Through Two Media
Technologies: Virtual Reality and Social Media. New Media & Society 21 (8):
1734–1749. https://doi.org/10.1177/1461444819829873.
Laney, D. 2001. 3D data management: Controlling data volume, velocity and
variety. Meta Group. Accessed: 16 January 2013. http://blogs.gartner.com/
doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-
Volume-Velocity-and-Variety.pdf.
224 S. OMAN
Lazer, D., et al. 2014. The Parable of Google Flu: Traps in Big Data Analysis.
Science 343 (6176): 1203–1205. https://doi.org/10.1126/
science.1248506.
Lazer, D., and R. Kennedy. 2015. What We Can Learn from the Epic Failure of
Google Flu Trends. Wired. Accessed 28 April 2021. https://www.wired.
com/2015/10/can-learn-epic-failure-google-flu-trends/
Leadbetter, C., O’Connor, N., and Commonwealth Games, Culture & Sport
Analysis Scottish Government. 2013. Healthy Attendance? The Impact of
Cultural Engagement and Sports Participation on Health and Satisfaction with
Life in Scotland. Scotland: The Scottish Government. Accessed 17 May 2021.
https://www.gov.scot/publications/healthy-a ttendance-i mpact-c ultural-
engagement-sports-participation-health-satisfaction-life-scotland/.
Lee, L. et al. 2016.Information Disclosure Concerns in The Age of Wearable
Computing. In Proceedings 2016 Workshop on Usable Security. Workshop on
Usable Security, San Diego, CA: Internet Society. https://doi.org/10.14722/
usec.2016.23006.
Lewis, R., M. Rowe, and C. Wiper. 2016. Online Abuse of Feminists as An
Emerging form of Violence Against Women and Girls. The British Journal of
Criminology 57 (6): 1462–1481. https://doi.org/10.1093/bjc/azw073.
Living with data. n.d. Living with Data. https://livingwithdata.org/.
Lupton, D. 2019. Data Mattering and Self-Tracking: What Can Personal Data
Do? Continuum 34 (1): 1–13. https://doi.org/10.1080/1030431
2.2019.1691149.
MacKerron, G., and S. Mourato. 2013. Happiness is Greater in Natural
Environments. Global Environmental Change 23 (5): 992–1000. https://doi.
org/10.1016/j.gloenvcha.2013.03.010.
Madge, C., and T.H. Harrisson. 1937. Mass Observation. London: Frederick
Muller Ltd.
Marr, B. 2014. Big Data: The 5 vs everyone must know. Accessed: 4 September
2015. https://www.linkedin.com/pulse/20140306073407-64875646-big-
data-the-5-vs-everyone-must-know.
Mass Observation. n.d. Mass Observation. http://www.massobs.org.uk.
Matsakis, L. 2019 The WIRED Guide to Your Personal Data (and Who Is Using
It). Wired. Accessed: 28 April 2021. https://www.wired.com/story/
wired-guide-personal-data-collection/.
Mayer-Schönberger, V., and K. Cukier. 2013. Big Data: A Revolution that Will
Transform how We Live, Work, and Think. London: John Murray.
Marz, N. and Warren, J. 2012. Big Data: Principles and Best Practices of Scalable
Realtime Data Systems. MEAP edition. Westhampton, NJ: Manning.
McCall, B. 2020. COVID-19 and Artificial Intelligence: Protecting Health-Care
Workers and Curbing The Spread. The Lancet Digital Health 2 (4): e166–
e167. https://doi.org/10.1016/S2589-7500(20)30054-6.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 225
McNulty, E. 2014. Understanding Big Data: The seven V’s. Accessed: 4
September 2015. Accessed: 4 September 2015. http://dataconomy.com/
seven-vs-big-data/.
Miles, A., and A. Sullivan. 2010. Understanding the Relationship Between Taste
and Value in culture and Sport. London: DCMS.
Murgia, M. 2017. Watchdog Probes Cambridge Analytica’s Poll Role. Financial
Times. Accessed: 28 April 2021. https://www.ft.com/
content/7482ec7c-01c9-11e7-aa5b-6bb07f5c8e12.
Mutz, M. 2016. Christmas and Subjective Well-Being: a Research Note. Applied
Research in Quality of Life 11 (4): 1341–1356. https://doi.org/10.1007/
s11482-015-9441-8.
NHS. 2016. Want to Feel Happier? Take a Break from Facebook.
NHS. https://www.nhs.uk/news/mental-health/want-to-feel-happier-take-
a-break-from-facebook/.
Niiler, E. 2020) An AI Epidemiologist Sent the First Alerts of the Coronavirus.
Wired. Accessed: 28 April 2021. https://www.wired.com/story/ai-
epidemiologist-wuhan-public-health-warnings/.
Noble, S.U. 2018. Algorithms of Oppression: Data Discrimination in the Age of
Google. New York: New York University Press.
Oman, S. 2013a. Review of ‘Counting What Counts: What Big Data Can Do
for the Cultural Sector’. Cultural Value Initiative. http://culturalvalueini-
tiative.org/2013/06/08/review-o f-n estas-c ounting-w hat-c ounts-w hat-
big-data-can-do-for-the-cultural-sector-by-susan-oman/.
———. 2013b. Tackling the Deficit: Well-Being and Cultural Participation.
Presentation at Culture, Health and Wellbeing International Conference.
University of Bristol.
———. 2015. Measuring National Well-Being: What Matters to You? What
Matters to Whom? In Cultures of Wellbeing: Method, Place, Policy, ed. S. White
and C. Blackmore. London: Palgrave Macmillan.
———. 2017. All Being Well: Cultures of Participation and the Cult of Measurement.
PhD Thesis. The University of Manchester.
———. 2019a. Improving Data Practices to Monitor Inequality and Introduce
Social Mobility Measures: A Working Paper. The University of Sheffield.
Available at: https://www.sheffield.ac.uk/polopoly_fs/1.867756!/file/
MetricsWorkingPaper.pdf. Accessed: 29 March 2021.
———. 2019b. Measuring Social Mobility in The Creative and Cultural Industries:
The importance of working in partnership to improve data practices and address
inequality. Sheffield: The University of Sheffield. Accessed: 29 March 2021.
h t t p s : / / w w w. s h e f f i e l d . a c . u k / p o l o p o l y _ f s / 1 . 8 6 7 7 5 4 ! / f i l e /
MetricsPolicyBriefing.pdf.
———. 2020. Leisure pursuits: Uncovering the ‘Selective Tradition’ in Culture
and Well-being Evidence for Policy. Leisure Studies, 39(1), 11–25. https://doi.
org/10.1080/02614367.2019.1607536.
226 S. OMAN
———. n.d. How Data Work in Contexts. Living with Data. Accessed: 29 April
2021. https://livingwithdata.org/previous-research/how-data-work-in-
contexts/.
O’Neil, C. 2016. Weapons of Math Destruction: How Big Data Increases Inequality
and Threatens Democracy. London: Allen Lane.
ONS. 2001. 60 Years of Social Survey: 1941–2001. Norwich: HMSO.
———. 2016. Early Census-Taking in England and Wales. Office for National
Statistics. Accessed 28 April 2021. https://www.ons.gov.uk/
census/2011census/howourcensusworks/aboutcensuses/censushistory/
earlycensustakinginenglandandwales.
Otterbacher, J., Bates, J., and Clough, P. 2017. Competent Men and Warm
Women: Gender Stereotypes and Backlash in Image Search Results. In
Proceedings of the 2017 CHI Conference on Human Factors in Computing
Systems (pp. 6620–6631). Association for Computing Machinery. https://doi.
org/10.1145/3025453.3025727.
Pearce, R., S. Erikainen, and B. Vincent. 2020. TERF Wars: An Introduction. The
Sociological Review 68 (4): 677–698. https://doi.org/10.1177/
0038026120934713.
Pellert, M., et al. 2020. Dashboard of Sentiment in Austrian Social Media During
COVID-19. Frontiers in Big Data 3. https://doi.org/10.3389/
fdata.2020.00032.
Pidd, H. 2020. ‘Punishment by statistics’: The father who foresaw A-level algo-
rithm flaws. The Guardian. Accessed: 11 August 2021. http://www.theguard-
ian.com/education/2020/aug/14/punishment-by-statistics-the-father-
who-foresaw-a-level-algorithm-flaws.
Pink, S., and V. Fors. 2017. Being in a Mediated World: Self-Tracking and the
Mind–Body–Environment. Cultural Geographies 24 (3): 375–388. https://
doi.org/10.1177/1474474016684127.
Plunz, R.A., et al. 2019. Twitter Sentiment in New York City Parks as Measure of
Well-Being. Landscape and Urban Planning 189: 235–246. https://doi.
org/10.1016/j.landurbplan.2019.04.024.
Poovey, M. 1998. A History of the Modern Fact: Problems of Knowledge in the
Sciences of Wealth and Society. Chicago: The University of Chicago Press.
Porter, T.M. 1986. The Rise of Statistical Thinking 1820–1900. Princeton:
Princeton University Press.
———. 1996. Trust in Numbers The Pursuit of Objectivity in Science and Public
Life. Princeton: Princeton University Press.
Quercia, D. et al. 2012. Tracking ‘Gross Community Happiness’ from Tweets. In
Proceedings of the ACM 2012 Conference on Computer Supported Cooperative
Work. CSCM 2012, ed. D. Gergle, et al., 965–968. New York: ACM.
Ram, A., and M. Murgia. 2019. Data Brokers: Regulators Try to Rein in the
‘Privacy Deathstars’. Financial Times. Accessed 29 March 2021. https://www.
ft.com/content/f1590694-fe68-11e8-aebf-99e208d3e521.
5 GETTING A SENSE OF BIG DATA AND WELL-BEING 227
Ruckenstein, M., and M. Pantzar. 2017. Beyond the Quantified Self: Thematic
Exploration of a Dataistic Paradigm. New Media & Society 19 (3): 401–418.
https://doi.org/10.1177/1461444815609081.
Ruppert, E., J. Law, and M. Savage. 2013. ‘Reassembling Social Science Methods:
The Challenge of Digital Devices. Theory, Culture & Society 30 (4): 22–46.
https://doi.org/10.1177/0263276413484941.
Savage, M. 2010. Identities and Social Change in Britain Since 1940: The Politics
of Method. Oxford: Oxford University Press.
Scott, J.C. 1998. Seeing Like a State: How Certain Schemes to Improve the Human
Condition Have Failed. New Haven: Yale University Press (The Yale ISPS series).
Sinclair, J. 1798. Statistical Accounts of Scotland. https://stataccscot.edina.ac.uk/
static/statacc/dist/home.
Strain, T., K. Wijndaele, and S. Brage 2019. Physical Activity Surveillance Through
Smartphone Apps and Wearable Trackers: Examining the UK Potential for
Nationally Representative Sampling. JMIR mHealth and uHealth 7(1): p.
e11898. https://doi.org/10.2196/11898.
Suzuki, M. 1992. Political Business Cycles in the Public Mind. American Political
Science Review 86 (4): 989–996. https://doi.org/10.2307/1964350.
The Economist. 2017. The World’s Most Valuable Resource Is No Longer
Oil, But Data. The Economist, 6 May. Accessed 29 March 2021. https://
www.economist.com/leaders/2017/05/06/the-w orlds-m ost-v aluable-
resource-is-no-longer-oil-but-data.
Townsend, L., and Wallace, C. 2016. Social Media Research: A Guide to Ethics.
Aberdeen: The University of Aberdeen, p. 16. https://www.gla.ac.uk/media/
Media_487729_smxx.pdf.
Turow, J. 2011 Introduction. In The Daily You: How the New Advertising Industry
Is Defining Your Identity and Your Worth, 1–12. Yale University Press.
UK Data Justice Lab. n.d. Data Justice Lab. https://datajusticelab.org.
United Nations. 2014. A World That Counts: Mobilising the Data Revolution for
Sustainable Development. Secretary-General of the United Nations. https://
www.tralac.org/images/Resources/UN_Summit/A%20world%20that%20
counts%20Mobilizing%20the%20data%20revolution%20for%20sustainable%20
development%202014.pdf.
———. 2015. Indicators and a Monitoring Framework for the Sustainable
Development Goals. Launching a Data Revolution for the SDGs. Secretary-
General of the United Nations, p. 233. https://sdgs.un.org/sites/default/
files/publications/2013150612-FINAL-SDSN-Indicator-Report1.pdf.
Voukelatou, V., et al. 2020. Measuring Objective and Subjective Well-Being:
Dimensions and Data Sources. International Journal of Data Science and
Analytics. https://doi.org/10.1007/s41060-020-00224-2.
Whitaker, B. 2020. The Computer Algorithm That was Among the First to Detect the
Coronavirus Outbreak. Accessed 28 April 2021. https://www.cbsnews.com/
news/coronavirus-outbreak-computer-algorithm-artificial-intelligence/.
228 S. OMAN
Wilmott, C. 2016. Small Moments in Spatial Big Data: Calculability, Authority
and Interoperability in Everyday Mobile Mapping. Big Data & Society 3 (2):
p. 2053951716661364. https://doi.org/10.1177/2053951716661364.
YouGov. n.d. Brits Use of Wearable Devices (E.g. A Smartwatch or Wearable
Fitness Band). Accessed 28 April 2021. https://yougov.co.uk/topics/tech-
nology/trackers/brits-use-of-wearable-devices-eg-a-smartwatch-or-wearable-
fitness-band.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/
by/4.0/), which permits use, sharing, adaptation, distribution and reproduction
in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence and
indicate if changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons licence, unless indicated otherwise in a credit line to
the material. If material is not included in the chapter’s Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copy-
right holder.