Papers by Chico Q Camargo

Proceedings of the National Academy of Sciences, 2022
Significance Why does evolution favor symmetric structures when they only represent a minute subs... more Significance Why does evolution favor symmetric structures when they only represent a minute subset of all possible forms? Just as monkeys randomly typing into a computer language will preferentially produce outputs that can be generated by shorter algorithms, so the coding theorem from algorithmic information theory predicts that random mutations, when decoded by the process of development, preferentially produce phenotypes with shorter algorithmic descriptions. Since symmetric structures need less information to encode, they are much more likely to appear as potential variation. Combined with an arrival-of-the-frequent mechanism, this algorithmic bias predicts a much higher prevalence of low-complexity (high-symmetry) phenotypes than follows from natural selection alone and also explains patterns observed in protein complexes, RNA secondary structures, and a gene regulatory network.

Accurate modelling of local population movement patterns is a core contemporary concern for urban... more Accurate modelling of local population movement patterns is a core contemporary concern for urban policymakers, affecting both the short term deployment of public transport resources and the longer term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter supply, with most macro-models known to perform badly in smaller geographic confines. In this paper we take a first step to remedying this deficit, by leveraging two novel datasets to analyse where and why macro-level models of human mobility break down at small scales. In particular, we use an anonymised aggregate dataset from a major mobility app and combine this with freely available data from OpenStreetMap concerning land-use composition of different areas around the county of Oxfordshire in the United Kingdom. We show where different models fail, and make the case for a new modelling st...

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in th... more Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias ...
Data: Oxfordshire traffic and models
These are the datasets and code used in two papers: <em>1 - Diagnosing the performance of h... more These are the datasets and code used in two papers: <em>1 - Diagnosing the performance of human mobility models at small spatial scales using volunteered geographic information (</em>https://arxiv.org/abs/1905.07964) <em>2 - Estimating Traffic Disruption Patterns with Volunteered Geographic Information (</em>https://arxiv.org/abs/1907.05162). The files contain demographic and geographic data for electoral wards in the county of Oxfordshire, UK, as well as code to run the models described in the first paper. Tij matrices will also be public subject to approval by the data controller.

Recent election surprises and regime changes have left the impression that politics has become mo... more Recent election surprises and regime changes have left the impression that politics has become more fast-moving and unstable. While modern politics does seem more volatile, there is little systematic evidence to support this claim. This paper seeks to address this gap in knowledge by reporting data over the last seventy years using public opinion polls and traditional media data from the UK and Germany. These countries are good cases to study because both have experienced considerable changes in electoral behaviour and have new political parties during the time period studied. We measure volatility in public opinion and in media coverage using approaches from information theory, tracking the change in word-use patterns across over 700,000 articles. Our preliminary analysis suggests an increase in the number of opinion issues over time and a growth in lack of predictability of the media series from the 1970s.

ArXiv, 2022
The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are impor... more The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are importantly shaped by both human mobility networks and socio-economic factors. Few studies, however, have examined the interaction of mobility networks with socio-spatial inequalities to understand the spread of infection. We introduce a novel methodology, called the Infection Delay Model, to calculate how the arrival time of an infection varies geographically, considering both effective distance-based metrics and differences in regions’ capacity to isolate – a feature associated with socioeconomic inequalities. To illustrate an application of the Infection Delay Model, this paper integrates household travel survey data with cell phone mobility data from the São Paulo metropolitan region to assess the effectiveness of lockdowns to slow the spread of COVID-19. Rather than operating under the assumption that the next pandemic will begin in the same region as the last, the model estimates infect...

ArXiv, 2018
Recent election surprises and regime changes have left the impression that politics has become mo... more Recent election surprises and regime changes have left the impression that politics has become more fast-moving and unstable. While modern politics does seem more volatile, there is little systematic evidence to support this claim. This paper seeks to address this gap in knowledge by reporting data over the last seventy years using public opinion polls and traditional media data from the UK and Germany. These countries are good cases to study because both have experienced considerable changes in electoral behaviour and have new political parties during the time period studied. We measure volatility in public opinion and in media coverage using approaches from information theory, tracking the change in word-use patterns across over 700,000 articles. Our preliminary analysis suggests an increase in the number of opinion issues over time and a growth in lack of predictability of the media series from the 1970s.

Among all tools used to understand collective human behavior, few tools have been as successful a... more Among all tools used to understand collective human behavior, few tools have been as successful as agent-based models (ABMs). These models have been particularly effective at describing emergent social behavior, such as spatial segregation in neighborhoods or opinion polarization on social networks. ABMs are particularly common in the study of opinion and belief dynamics, being used by fields ranging from anthropology to statistical physics. These models, much like the social systems they describe, often do not have unique output variables, scales, or clear order parameters. This lack of clearly measurable emergent behavior makes such complex ABMs difficult to study, ultimately limiting their application to cases of empirical interest. In this paper, we introduce a series of approaches to analyze complex multidimensional ABMs, drawing from information theory and cluster analysis. We use these approaches to explore a multi-level agent-based model of ideological alignment introduced b...

Autopsy of a metaphor: The origins, use and blind spots of the ‘infodemic’
In 2020, the term ‘infodemic’ rose from relative obscurity to becoming a popular catch-all metaph... more In 2020, the term ‘infodemic’ rose from relative obscurity to becoming a popular catch-all metaphor, representing the perils of fast, wide-spreading (false) information about the coronavirus pandemic. It featured in thousands of academic publications and received widespread attention from policymakers and the media. In this article, we trace the origins and use of the ‘infodemic’ metaphor and examine the blind spots inherent in this seemingly intuitive term. Drawing from literature in the cognitive sciences and communication studies, we show why information does not spread like a virus and point out how the ‘infodemic’ metaphor can be misleading, as it conflates multiple forms of social behaviour, oversimplifies a complex situation and helps constitute a phenomenon for which concrete evidence remains patchy. We point out the existing tension between the usefulness of the widespread use of the term ‘infodemic’ and its uncritical adoption, which we argue can do more harm than good, po...

The idea that neural networks may exhibit a bias towards simplicity has a long history (1; 2; 3; ... more The idea that neural networks may exhibit a bias towards simplicity has a long history (1; 2; 3; 4). Simplicity bias (5) provides a way to quantify this intuition. It predicts, for a broad class of input-output maps which can describe many systems in science and engineering, that simple outputs are exponentially more likely to occur upon uniform random sampling of inputs than complex outputs are. This simplicity bias behaviour has been observed for systems ranging from the RNA sequence to secondary structure map, to systems of coupled differential equations, to models of plant growth. Deep neural networks can be viewed as a mapping from the space of parameters (the weights) to the space of functions (how inputs get transformed to outputs by the network). We show that this parameter-function map obeys the necessary conditions for simplicity bias, and numerically show that it is hugely biased towards functions with low descriptional complexity. We also demonstrate a Zipf like power-la...

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in th... more Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias ...

Boolean Threshold Networks as Models of Genotype-Phenotype Maps
Boolean threshold networks (BTNs) are a class of mathematical models used to describe complex dyn... more Boolean threshold networks (BTNs) are a class of mathematical models used to describe complex dynamics on networks. They have been used to study gene regulation, but also to model the brain, and are similar to artificial neural networks used in machine learning applications. In this paper we study BTNs from the perspective of genotype-phenotype maps, by treating the network’s set of nodes and connections as its genotype, and dynamic behaviour of the model as its phenotype. We show that these systems exhibit (1) Redundancy, that is many genotypes map to the same phenotypes; (2) Bias, the number of genotypes per phenotypes varies over many orders of magnitude; (3) Simplicity bias, simpler phenotypes are exponentially more likely to occur than complex ones; (4) Large robustness, many phenotypes are surprisingly robust to random perturbations in the parameters, and (5) this robustness correlates positively with the evolvability, the ability of the system to find other phenotypes by poin...

Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution
Engineers routinely design systems to be modular and symmetric in order to increase robustness to... more Engineers routinely design systems to be modular and symmetric in order to increase robustness to perturbations and to facilitate alterations at a later date. Biological structures also frequently exhibit modularity and symmetry, but the origin of such trends is much less well understood. It can be tempting to assume – by analogy to engineering design – that symmetry and modularity arise from natural selection. But evolution, unlike engineers, cannot plan ahead, and so these traits must also afford some immediate selective advantage which is hard to reconcile with the breadth of systems where symmetry is observed. Here we introduce an alternative non-adaptive hypothesis based on an algorithmic picture of evolution. It suggests that symmetric structures preferentially arise not just due to natural selection, but also because they require less specific information to encode, and are therefore much more likely to appear as phenotypic variation through random mutations. Arguments from a...

Measuring the Volatility of the Political agenda in Public Opinion and News Media
Public Opinion Quarterly
Recent election surprises, regime changes, and political shocks indicate that political agendas h... more Recent election surprises, regime changes, and political shocks indicate that political agendas have become more fast-moving and volatile. The ability to measure the complex dynamics of agenda change and capture the nature and extent of volatility in political systems is therefore more crucial than ever before. This study proposes a definition and operationalization of volatility that combines insights from political science, communications, information theory, and computational techniques. The proposed measures of fractionalization and agenda change encompass the shifting salience of issues in the agenda as a whole and allow the study of agendas across different domains. We evaluate these metrics and compare them to other measures such as issue-level survival rates and the Pedersen Index, which uses public-opinion poll data to measure public agendas, as well as traditional media content to measure media agendas in the UK and Germany. We show how these measures complement existing a...

Scientific Reports
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers.... more Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated using features from the volunteered geographic information site OpenStreetMap (OSM). We use OSM features as predictors for linear regressions of counts of traffic disruptions and traffic volume at 6,500 points in the road network within 112 regions of Oxfordshire, UK. We show that more than half the variation in traffic volume and disruptions can be explained with OSM features alone, and use cross-validation and recursive feature elimination to evaluate the predictive power and importance of different land use categories. Finally, we show that using OSM’s granular point of interest data allows for better predictions than the broader categories typically used in studies of transportation ...

Nature Communications
Many systems in nature can be described using discrete input-output maps. Without knowing details... more Many systems in nature can be described using discrete input-output maps. Without knowing details about a map, there may seem to be no a priori reason to expect that a randomly chosen input would be more likely to generate one output over another. Here, by extending fundamental results from algorithmic information theory, we show instead that for many real-world maps, the a priori probability P(x) that randomly sampled inputs generate a particular output x decays exponentially with the approximate Kolmogorov complexityKðxÞ of that output. These input-output maps are biased towards simplicity. We derive an upper bound P(x) ≲ 2 ÀaKðxÞÀb , which is tight for most inputs. The constants a and b, as well as many properties of P(x), can be predicted with minimal knowledge of the map. We explore this strong bias towards simple outputs in systems ranging from the folding of RNA secondary structures to systems of coupled ordinary differential equations to a stochastic financial trading model.

Royal Society Open Science
Accurate modelling of local population movement patterns is a core, contemporary concern for urba... more Accurate modelling of local population movement patterns is a core, contemporary concern for urban policymakers, affecting both the short-term deployment of public transport resources and the longer-term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter supply, with most macro-models known to perform poorly at smaller geographical scales. In this paper, we take a first step to remedy this deficit, by leveraging two novel datasets to analyse where and why macro-level models of human mobility break down. We show how freely available data from OpenStreetMap concerning land use composition of different areas around the county of Oxfordshire in the UK can be used to diagnose mobility models and understand the types of trips they over- and underestimate when compared with empirical volumes derived from aggregated, anonymous smartphone location d...
Uploads
Papers by Chico Q Camargo