Academia.eduAcademia.edu

Estimating Voter Registration Deadline Effects with Web Search Data

2015, Political Analysis

https://doi.org/10.1093/PAN/MPV002
Advance Access publication March 12, 2015 Political Analysis (2015) 23:225–241 doi:10.1093/pan/mpv002 Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 Estimating Voter Registration Deadline Effects with Web Search Data Alex Street Political Science, Carroll College, Helena, MT 59625 e-mail: [email protected] (corresponding author) Thomas A. Murray Department of Biostatistics, MD Anderson Cancer Center e-mail: [email protected] John Blitzer Google, Inc., Mountain View, CA e-mail: [email protected] Rajan S. Patel Google, Inc., Mountain View, CA e-mail: [email protected] Edited by R. Michael Alvarez Electoral rules have the potential to affect the size and composition of the voting public. Yet scholars disagree over whether requiring voters to register well in advance of Election Day reduces turnout. We present a new approach, using web searches for “voter registration” to measure interest in registering, both before and after registration deadlines for the 2012 U.S. presidential election. Many Americans sought information on “voter registration” even after the deadline in their state had passed. Combining web search data with evidence on the timing of registration for 80 million Americans, we model the relationship between search and registration. Extrapolating this relationship to the post-deadline period, we estimate that an additional 3–4 million Americans would have registered in time to vote, if deadlines had been extended to Election Day. We test our approach by predicting out of sample and with historical data. Web search data provide new opportunities to measure and study information-seeking behavior. 1 Introduction One in seven Americans eligible to vote in the 2012 presidential election was not registered, and was thus unable to cast a ballot.1 Every U.S. state, except North Dakota, requires voters to register. The earliest deadlines are currently 1 month in advance of the election. In some states, however, regis- tration remains open, or re-opens to allow Election-Day Registration (EDR). The United States is unusual among democracies in placing the responsibility to register largely on the voter, and in leaving the administration of elections to state and local officials (Powell and Bingham 1986; Jackman 1987). Authors’ note: The authors thank Mike Alvarez and two anonymous reviewers for valuable comments, and Joshua Dyck, Peter Enns, Matt Filner, Alex Kuo, Renee Liu, Philipp Rehm, Steve Scott, Daniel Smith, Nigel Snoad, Seth Stephens- Davidowitz, Hal Varian and seminar participants at Cornell University and Google for helpful suggestions. Replication data are available in Street et al. (2015). Supplementary materials for this article are available on the Political Analysis web site. 1 For details on this calculation, see Supplementary Section S.1. ß The Author 2015. Published by Oxford University Press on behalf of the Society for Political Methodology. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons. org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 225 226 Alex Street et al. The effects of requiring early registration are disputed. Some scholars argue that obliging citizens Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 to register well before Election Day reduces turnout, by imposing an additional cost on would-be participants and by preventing the mobilization of new voters in the final days of the campaign, when political interest is most intense (Wolfinger and Rosenstone 1980; Teixeira 1992; Burden et al. 2014). Skeptics contend that most people who fail to register have little interest in politics or political participation (Highton 2004; Hanmer 2009). Existing research on the effects of voter registration laws compares turnout across elections covered by different laws. However, laws that facilitate voter registration are easier to pass in time periods or districts where politicians and voters place greater value on electoral participation (Hanmer 2009). This implies that omitted variables may confound estimates of deadline effects that rely on comparing turnout under different election laws. Here, we present an alternative approach that directly addresses the question of how many people missed the deadline but were nonetheless interested in registering. Specifically, we use the volume of web searches for “register to vote” and related terms as a measure of interest. We show that daily web search volume is closely correlated with the daily number registering, when regis- tration is open. Using Bayesian models of the relationship between search and registration timing, we calculate counterfactual predictions for the number of additional registrations that would have been observed in 2012 if all U.S. states had extended the deadline to Election Day. The estimates rely on the assumption that web search activity is an equally strong indicator of interest in regis- tering in the pre- and post-deadline periods. We assess the credibility of this assumption, and conclude by discussing other ways in which scholars could use web search data to study elections and information-seeking behavior. 2 The Effects of Electoral Rules Research on electoral rules is motivated, in part, by the concern that incumbents may manipulate the terms of the contest to their own advantage. U.S. election administration is unusually decentralized, which, historically, has left wide scope for the abuse of power (Tokaji 2008; Keyssar 2009). Changes in electoral rules were central to the post-Reconstruction exclusion of African Americans and other minority groups from the franchise (Key 1949; Kousser 1974, 1999), and some scholars suggest that a similar dynamic is at work again today (Bentele and O’Brien 2013). The effects of registration laws have received sustained scholarly interest. Prior research on the effect on turnout of allowing voters to register up to Election Day yields estimates ranging from 2% to þ14% points (see Supplementary Table S1 for summaries of 15 studies). Estimates of the effect of allowing late registration have fallen over time. The mean estimate in publications through the 1990s was that allowing EDR or keeping registration open through Election Day would produce a 6.4% point increase in turnout, but the mean estimate since 2000 is 3% points. Keele and Minozzi (2013) provide a lucid discussion of the difficulty of estimating the effects of voter registration laws (see also Kousser and Mullin 2007). Early studies used cross-sectional comparisons of state turnout, with the effects of registration laws estimated with a dummy variable for EDR states or a measure of the number of days when registration was closed (Rosenstone and Wolfinger 1978; Nagler 1991). Such estimates identify the effect of requiring early registration only under the assumption that selection into treatment (allowing EDR) is as if random, conditional on observed covariates (Angrist and Pischke 2009, 55). But the selection on observables assumption is dubious, in this case. Other factors, such as norms on the importance of participation, may affect both interest in registering and election laws. Yet these confounders are not observed and can’t be included in the model. To address this problem, scholars have focused on otherwise similar elections that used different registration laws, such as consecutive elections in a given state (Knack 2001). Difference-in-differ- ences models estimate changes in turnout in states or districts that moved the deadline, while using changes over time in other regions to control for broad temporal trends (Ansolabehere and Konisky 2006; Knee and Green 2011). Keele and Minozzi (2013) employ a regression discontinuity design to compare districts just above and below population thresholds that were used to decide which districts in Minnesota and Wisconsin were obliged to introduce EDR in the 1970s. More Voter Registration Deadline Effects 227 careful research designs may help explain why recent research finds smaller effects of early regis- Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 tration requirements. However, these studies still rest on the assumption that elections held under different rules were otherwise equivalent. As Keele and Minozzi (2013) put it, one’s confidence in the results ultimately depends on one’s answer to questions such as “How much is Minnesota like Wisconsin?” They argue that more credible research designs yield ever-smaller estimates, and that the most plausible estimate is that EDR has little or no effect. Knee and Green (2011) reach similar conclusions, though they find that EDR has a modest effect on turnout in presidential elections. Before giving up on the idea that registration laws affect turnout, however, we believe that scholars should draw upon a wider range of measures and methods. We propose to use web search data to measure interest in registering to vote, both before and after the deadline. One advantage of web search data in this context is granularity. These data are available in large quantities across U.S. states and on a daily basis in the period leading up to recent elections. Skeptics have questioned whether the kind of people who miss the deadline are actually interested in registering, but we find substantial last-minute interest. A growing literature uses web search data to provide up-to-date and localized measures of a range of phenomena, from epidemics of infectious diseases to unemployment claims. Web search data can “predict the present” by providing evidence on time-varying outcomes more quickly than other methods (Ginsberg et al. 2008; Choi and Varian 2012; Varian 2014; but see also Lazer et al. 2014). Scholars have also used web search data to predict consumer behavior into the near future (Goel et al. 2010). In this article we extend the literature on web search and mass behavior to the electoral domain. We also take on a new methodological task, counterfactual prediction. We ask, how many more people would have registered for the 2012 election, if registration had remained open through Election Day? Since this outcome was not actually observed, there is no definitive answer to this question. Estimates can be made only under certain assumptions. As the literature on web search and mass behavior is quite new, and we are among the first to consider methods for counterfactual predictions in this area (but see Brodersen et al. 2014), it is important for us to be as clear as possible about our approach. To this end, our data and code are published online with the paper (Street et al. 2015). 3 Data We obtained data on the number of Americans seeking information on voter registration from Google web search logs. These logs are the source of the sample that is publicly available via the Google Trends web site. Using the original source allows us to collect daily data even in small states; not all of these data are available via the Trends web site. Some users issued general queries, while others searched explicitly for voter registration rules in a given state. We therefore chose two generic queries: [voter registration] and [register to vote], and three that referred to state names: [voter registration <state>], [<state>voter registration], and [register to vote <state>]. State names were matched to the state in which the search originated.2 Combining several queries yields extra data, while facilitating comparisons with the Google Trends web site, which allows at most five queries (see Supplementary Section S.2). The five queries were chosen for construct validity, to measure interest in registering. We con- firmed that people who entered these queries were more likely to click on official sources of infor- mation on how to register (typically web sites run by the state Secretary of State) than on any other link that Google supplied in response to the query. Our measure of search volume is the daily number of times the five queries were issued in each state, which ranged into the millions. To avoid revealing proprietary information, the data were standardized by subtracting the grand mean and dividing by the standard deviation. We truncated the data by setting the lowest 5% of values to zero (with very little effect on our results). 2 Google uses proprietary methods for ascertaining the location from which searches originate. The details of such methods are beyond the scope of this article. 228 Alex Street et al. We focus on the 67 days leading up to the 2012 election, from 9/1 to 11/6. This period was Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 chosen because prior research shows that many people register in the final weeks before presidential elections (Cain and McCue 1985; Gimpel, Dyck, and Shaw 2007), and also to facilitate replication with data from the Google Trends web site. In almost all states, voter registration closes for some period before the election. In 2012, the median length of time for which registration was closed was 3 weeks, and 11 states allowed EDR. Most states show two peaks in search activity: at the time of the registration deadline, and on the Monday before the election and Election Day itself. States that allowed EDR in 2012 showed a single peak in search activity, on and shortly before Election Day (see Supplementary Figs. S1–S3 for search data in all states). We also collected voter files from 16 states, yielding the date of registration for 80 million Americans. Our sample was limited by the fact that some states prohibit research with voter files, while others charge high fees (see Supplementary Section S.3). Note that voter files contain the effective date of registration, even for applications that were mailed on the deadline but processed thereafter. Validation against other sources shows that, while they do contain some errors, voter files are accurate sources of evidence on political behavior (McDonald 2007; Ansolabehere and Hersh 2012). Figure 1 shows search and registration timing in the 16 states for which both kinds of data were available. In each of the panels, the left axis and the black line show the daily number of registra- tions, in thousands. The right axis and the dashed gray line show standardized search volume. The horizontal axis shows the date, with D standing for the mail and in-person deadlines.3 States that allowed EDR saw high numbers registering on Election Day. In states that did not allow EDR, the highest number of registrations was observed on the day of the deadline. Although some applica- tions for voter registration were processed after the deadline (e.g., for people registering a vehicle), those who registered after the deadline were not eligible to vote in the coming election, and regis- tration rates were thus much lower. As Fig. 1 reveals, daily web search volume in the weeks leading up to the 2012 election was closely related to the daily number registering. The Spearman’s rank correlation between search and registration during the registration period was 0.85 (n ¼ 742, p < 0.01; see Supplementary Table S2 for details of each state). In the states in our sample that allowed EDR in 2012, at least for the presidential ticket—Alaska, Idaho, Maine, Rhode Island and Wyoming—much of the search and registration activity occurred on Election Day. In other states we also see a spike in search activity on and immediately before Election Day. This suggests that many Americans were interested in registering at the last minute, but were unable to do so. 4 Our Critical Assumption The strong correlation between web search volume and voter registration totals, when registration was open, as well as the spikes in search activity around Election Day, suggest that web search data can be used to measure the post-deadline potential for extra registrations. To do this, we model the pre-deadline relationship between daily web search and voter registration totals, and use the re- sulting coefficients and the data on post-deadline searches to create counterfactual predictions of the number of people who would have registered if deadlines had been extended to Election Day. We create prediction intervals (PIs) around these estimates; these differ from confidence intervals in that they account not only for the uncertainty around the values of parameters in the model, but also for the range of outcomes that are consistent with these values. Following Angrist and Pischke (2009, 14), in observational studies one can think of differences between people affected by a policy, and those not affected, as the sum of the average treatment effect on the treated and selection bias. This bias arises when the units that select into treatment differ from those that do not. One way to remove selection bias is to conduct a randomized experiment, and use the control group to estimate counterfactual outcomes for the treated. In our case, one could compare turnout in states randomly assigned to allow or forbid EDR. But true experiments with election laws are not feasible, and the natural experiments that have been 3 Typically, the mail and in-person registration deadlines fall on the same day. A few states allow online registration, with the same deadline as registration by mail. Voter Registration Deadline Effects 229 Alaska voter registration and search timing Arkansas voter registration and search timing California voter registration and search timing Delaware voter registration and search timing Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 0.4 25 4 0.1 300 0.12 15 20 30 0.08 0.3 Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) 250 3 Search volume Search volume Search volume Search volume 200 15 0.06 0.08 10 20 0.2 2 150 0.04 10 100 0.04 0.1 5 10 1 0.02 5 50 0 0 0 0 0 0 0 0 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Florida voter registration and search timing Idaho voter registration and search timing Maine voter registration and search timing Michigan voter registration and search timing 0.35 2.5 6 15 100 60 0.12 25 0.3 5 2 Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) 0.1 80 0.25 20 10 4 Search volume Search volume Search volume Search volume 0.08 40 1.5 0.2 60 15 3 0.06 0.15 1 40 10 5 0.04 20 2 0.1 0.5 20 5 0.02 0.05 1 0 0 0 0 0 0 0 0 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Sept1 Oct1 Nov1 Sept1 Oct1 D Nov1 North Carolina voter registration and search timing New Jersey voter registration and search timing Nevada voter registration and search timing New York voter registration and search timing 20 1.5 8 2.5 3 40 50 30 2.5 Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) 2 15 6 40 30 Search volume Search volume Search volume Search volume 1 2 1.5 20 30 10 1.5 20 4 20 1 0.5 1 10 5 10 2 10 0.5 0.5 0 0 0 0 0 0 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Sept1 Oct1 D D Nov1 Sept1 Oct1 D Nov1 Ohio voter registration and search timing Rhode Island voter registration and search timing Washington voter registration and search timing Wyoming voter registration and search timing 10 1.2 5 100 0.3 25 0.08 20 0.25 1 8 4 80 20 Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) Daily registrations (thousands) 0.06 15 Search volume Search volume Search volume Search volume 0.2 0.8 6 60 15 3 0.15 0.04 0.6 10 4 40 10 2 0.1 0.4 0.02 5 20 2 5 0.05 1 0.2 0 0 0 0 0 0 0 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Sept1 Oct1 D Nov1 Fig. 1 Web searches for “voter registration” and observed registration numbers, September to November 2012. Black lines and left axes show daily registrations, in thousands. Dashed gray lines and right axes show standardized search volume. Horizontal axes show dates; D marks the mail and in-person registration deadlines (the same day in most states). suggested in this area are questionable (Keele and Minozzi 2013). States select which election laws to apply, and comparisons across states (or even over time in a given state) risk mistaking the effects of the laws for the reasons behind choices of (or changes in) the laws. Our new measure of post- deadline interest in registering does not preclude other, unobserved differences across states with different deadlines. Rather than using state-level variation in turnout to estimate the effects of registration laws, we take a different approach. Our identification strategy is to make assumptions that allow us to model the counterfactual outcomes.4 Crucially, we assume that queries for “voter registration,” which 4 In this respect, our approach is similar to recent studies that create “synthetic controls” to estimate counterfactual outcomes for units affected by a given intervention. See Abadie, Diamond, and Hainmueller (2010); Brodersen et al. (2014). 230 Alex Street et al. were generated after the deadline, are equally strong evidence of potential for actual registrations as Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 the same queries in the pre-deadline period. Since we fit models on aggregate web search and registration totals on a given day, our key assumption pertains to the conditional distribution of the daily number of registrations (we allow for lower registration volume on weekends, and for deadline effects). More formally, we assume: Pðregistration volume j search ¼ srch; weekend ¼ w; deadline ¼ d; before deadlineÞ ð1Þ ¼ Pðregistration volume j search ¼ srch; weekend ¼ w; deadline ¼ d; after deadlineÞ: Assumption (1) cannot be directly tested. But we can use indirect evidence to assess whether the assumption is plausible. Post-deadline search activity might actually be stronger evidence of an intent to register. Many people participate in electoral politics because they are asked to do so, and voter mobilization is easier when an election is imminent (Rosenstone and Hansen 1993; Verba, Schlozman, and Brady 1995). Alternatively, the relationship between web search and true registra- tion potential in the post-deadline period might be weaker. This would be especially problematic, since it would lead us to over-estimate the effect of requiring early registration. The core assumption could be violated in two main ways. The first is if the kind of people who sought information on “voter registration,” after the deadline, were systematically different from those who sought the same information beforehand. For example, conscientious citizens may be more likely both to search before the deadline and to actually register (although recent research suggests that conscientiousness is a weak predictor of electoral participation; see Mondak et al. 2010; Gerber et al. 2011; Gallego and Oberski 2012). In order to ensure privacy, Google does not provide evidence based on multiple pieces of data generated by the same user. We thus lack indi- vidual-level data on user characteristics. But we can use aggregate data to test for certain patterns that would arise if our key assumption is misguided. One possibility is that people who searched after the deadline were less interested in registering. Perhaps the late searches were generated by people seeking news on the election process, rather than by citizens interested in voting. Or perhaps the queries were generated by people who were already registered, and were trying to find their polling places. We guard against these possibilities by using data on the links most often chosen, after the relevant queries had been entered. Specifically, we test whether people who searched after the deadline were any less likely to click through to the official (Secretary of State) web sites with information on how to register.5 The second main way in which assumption (1) could be violated is due to contextual differences. The states in our sample were selected based on the availability of registration data. Note that the sample in which we observed both search and registration data includes one state where in-person registration did not close (Maine), two states with in-person registration deadlines only a few days before the election (North Carolina and Washington), and several states that allowed EDR. As such, we are able to use observed outcomes from the final days of the campaign, as well as the data from September and early October. The sample is diverse in terms of population size, the tendency to support Democrats or Republicans, and the competitiveness of the 2012 presidential race. Nonetheless, the sample may not be representative of the entire country. To test our ability to predict beyond this sample, we successively hold out each state for which we have data, and re- estimate the model. We use the resulting coefficients, along with search data from the state that was held out, to “predict” voter registration numbers in the held-out state. Finally, we test whether the observed number of registered voters in the held-out state, during the period when registration was open, is within the PI. This cross-validation exercise tests our ability to predict out of sample, and allows us to confirm that no single state is driving our results. Of course, we can only check our predictions against observed outcomes. We have no such baseline with which to compare our counterfactual, post-deadline predictions. 5 Comparing click-through rates before and after the deadline is a hard test. Some users may have seen, from the brief description that Google provides with each suggested link, that the deadline had already passed, without having to click on the link. For further discussion of state-level correlates of the search data, see Supplementary Section S.6. Voter Registration Deadline Effects 231 Another concern is the activities of other groups, besides the people who generated the web Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 search data. One can think of this as a “general equilibrium” issue. Political parties and civic groups, such as churches, the League of Women Voters, or the NAACP, play an important role in registering voters (Green and Gerber 2008; Herron and Smith 2012, 2013). If the deadline shifted to Election Day, we expect that third-party registration drives, which climax in the period imme- diately before the deadline, would shift to the later date. Our response to this issue is to include in our models not only web search volume but also indicator variables to capture deadline effects. The models include at most one indicator for each kind of deadline (mail, in-person, or online) for each state, on the day when the deadline fell in 2012. Our counterfactual post-deadline predictions are therefore based only on the raw relationship between search activity and registrations, with no additional deadline effects. We see this as a conservative approach, since mobilization efforts on or immediately before Election Day, when media coverage and interest in the election peaks, might be even more effective than similar efforts 3 or 4 weeks earlier. In addition, we compare search activity in “safe” and “battleground” states. If third parties focus their registration efforts on the more competitive states, and succeed in registering most people interested in voting in those states, then it would be inappropriate to treat post-deadline searches in less competitive states as strong signals of extra registration potential. Besides this evidence on the credibility of our key assumption, we also conduct a general test of our ability to predict post-deadline behavior based solely on the pre-deadline relationship between search and registration timing. We use coefficients from a model of the relationship between web search and registration numbers in Iowa in 2004, when registration closed 10 days before the election, to predict registrations in the state in the final weeks of the election campaign in 2008 and 2012, when Iowa allowed EDR. We compare the predictions (and PIs) to the outcomes actually observed in 2008 and 2012. Finally, we conduct a sensitivity analysis to show how violations of our key assumption would affect our results. 5 Estimation We model the number of people who registered as voters in each state on each day as a function of daily web search activity in that state, in the 67-day period leading up to the 2012 election. Voter registration was restricted in the period after the deadline, so we treat these observations as missing. Although we have searched data from all states for the entire period, we rely on registration data from a sample of states. Thus, the outcome variable is also missing for the entire period in many states. In order to handle these missing data, we estimate the relationship between daily search activity and daily registrations using fully Bayesian models. These models allow us to calculate posterior predictive distributions for every unobserved value, based on the observed relationships. The predictive distributions reflect the uncertainty around each parameter, given the variation in the observed data. The patterns in search and registration timing, evident in Fig. 1, support the suspicion of Brians and Grofman (2001) that the final days of the campaign are especially important. One-quarter of all search activity in the 10-week period leading up to the 2012 election was observed in the final 2 days. We therefore estimate the total number of post-deadline registrations through Election Day, rather than some subset of this period. Our counterfactual estimate of additional registrations applies only to states that did not allow EDR in 2012. Bayesian estimation proceeds in two steps (Carlin and Louis 2009). First, we posit models for the likelihood of the data. Second, we specify prior distributions for the unknown parameters in the models. Our models of the likelihood are designed to reflect various features of the data. The outcome of interest is measured as count data. To allow for over-dispersion, we use a Poisson-gamma mixture formulation of the negative binomial distribution (Zeger 1988). Formally, we model   k Ys;t js;t  Pðs;t Þ; s;t js;t ; k  G k; ; ð2Þ s;t where PðÞ denotes a Poisson distribution with mean l, Gða; bÞ denotes a gamma distribution with 232 Alex Street et al. mean a=b and variance a=b2 , and Ys;t denotes the registration count in state s ¼ 1; . . .; 50 on day Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 t ¼ 1; . . .; 67. Integrating over s;t , we see that this hierarchical structure implies E½Ys;t js;t ; k ¼ s;t and Var½Ys;t js;t ; k ¼ s;t þ 2s;t =k. Thus, s;t denotes the expected number of registrations in state s on day t, and  measures over-dispersion. Figure 1 shows less web search activity on weekends, and more on registration deadlines. Nonetheless, it is possible that the variation in search activity does not fully reflect the impact of these temporal features of the data. To allow for this possibility, the regression components of our models contain indicators for whether day t was a weekend (wt), and whether in state s day t was the mail (mds;t ), in-person (pds;t ), or online (ods;t ) registration deadline, plus an indicator for Election Day in states that allowed EDR (eds;t ). Another concern is autocorrelation. Even after accounting for search volume and the other covariates, outcomes on successive days in a given state may be correlated due to unobserved time-dependent variables. The Durbin–Watson test in a linear model yields evidence of moderate first-order autocorrelation (DW ¼ 1.45, p < 0.01). In addition to models with the standard assump- tion of independent errors, we thus fit models with an autocorrelation structure. Finally, in the absence of prior research on the functional form linking web search volume and electoral behavior, we allow for non-linear effects using a flexible spline term (Ruppert, Wand, and Carroll 2003). We specify models in which the measure of web searches in state s on day t (srch s;t ) enters as a linear predictor (equation (3)), or is modeled using the flexible spline term (equation (4)): Zs;t ¼ a0 þ aw wt þ amd mds;t þ apd pds;t þ aod ods;t þ aed eds;t þ asrch srchs;t ð3Þ Zs;t ¼ a0 þ aw wt þ amd mds;t þ apd pds;t þ aod ods;t þ aed eds;t þ fðsrchs;t ; bÞ; X J   ð4Þ where fðsrch s;t ; bÞ ¼ b1 srch s;t þ g j j3  jsrch bjþ1 jsrch s;t  srch g j j3 ¼ zs;t 0 b: j¼1 We model fðsrch; bÞ with modified low-rank thin plate splines, which have been shown to exhibit better Markov chain Monte Carlo properties than other spline formulations (Crainiceanu et al. 2005). In equation (4), J is the number of knots and srch g j ; j ¼ 1; . . .; J are the knot locations (Ruppert, Wand, and Carroll 2003). We opt to use J ¼ 15 knots at equally spaced quantiles of the Google search volume observed in all 50 states over the study period (and obtain similar results with more or fewer knots). For models with independent errors, we simply estimate logðs;t Þ ¼ Zs;t . To model autocorrel- ation, we assume that logðs;1 Þ ¼ Zs;1 , and that   logðs;t Þ ¼ Zs;t þ r logðs;t1 Þ  Zs;t1 ; ð5Þ for t ¼ 2; . . .; 67; s ¼ 1; . . .; 50. The latter term in equation (5) denotes the latent residual from the previous day, and r 2 ½1; 1 measures the correlation between adjacent latent residuals (Hay and Pettitt 2001). To complete the Bayesian model specification, we use vague priors for the parameters in the likelihood. For the parameters indicating weekends and the various deadlines (a0 s), we use vague N ð0; 105 Þ priors, where N ðn; s2 Þ denotes a Gaussian distribution with mean  and variance s2 . For the relationship between registration numbers and Google search volume (b0 s), we specify the low- rank thin plate spline prior detailed in Crainiceanu et al. (2005). We model the remaining param- eters with vague priors as follows: sb  Uð0:01; 100Þ; k  Gð0:01; 100Þ and r  Uð0:99; 0:99Þ, where Uðl; uÞ denotes a uniform distribution on ½l; u. To estimate the posteriors of all the param- eters and the predictive posteriors for the unobserved Ys;t ’s, we use a Gibbs sampler built by JAGS (Plummer, 2003). We assessed convergence empirically using potential scale reduction factors (i.e., the Gelman and Rubin [1992] diagnostic, known as “R-hat”), and visually with trace plots (Supplementary Fig. S6). We found all the parameters to have an R-hat value of less than 1.1, which is indicative of convergence, and the trace plots show good mixing. The code for these models and the convergence assessment is included with the replication data (Street et al. 2015). Voter Registration Deadline Effects 233 6 Results Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 Table 1 presents results from four models, with the linear and spline functional forms, and with independent and autoregressive errors. The fit of the models is good, explaining around 75% of the deviance. We prefer the spline model with autoregressive errors, due to the evidence of autocor- relation (Supplementary Fig. S4 shows the spline estimate of the relationship between search and registration numbers). Notably, all four models yield similar estimates and fit (on the DIC and pD, see Spiegelhalter et al. 2002). Table 1 shows coefficients for the deadline indicators, which are multiplicative because the models were fit using the log link. As expected, the coefficients show that search activity on deadlines predicted considerably more registrations than on other days. For example, a given level of search activity on Election Day, in an EDR state, was associated with around 11 times as many registrations as the same level of activity on a non-deadline weekday. To aid interpretation of the models, Table 1 presents predicted numbers of registrants at varying levels of search activity, from the first to the 99th percentile. For instance, we estimate that a non-deadline weekday at the 90th percentile of search activity saw around 10,000 new registrations in the relevant state. We used the models to calculate the posterior predictive distributions of registrations in the period from the deadline to Election Day in each state. The total prediction from each model is reported toward the bottom of Table 1. Summing across states that did not allow EDR in 2012, our models suggest that around 3.5 million people would have registered in the post-deadline period, if this had been possible (these results are reported separately by state in Supplementary Table S3). This would have added 2% points to the total number registered nationwide. High turnout among late registrants, and full turnout among those who register on Election Day, implies that 80% or more of these people would have voted, producing a 3% point increase in turnout (see Supplementary Section S.4 for details on turnout among late registrants). In order to test whether our results were driven by the specifications described above, we also fit an array of different models, including a Poisson-normal mixture, and linear models with random slopes or intercepts by state. The estimates from these models were consistently between 3 and 4.5 million total additional registrants across the country (see Supplementary Section S.5 for more on alter- native specifications). 7 Evaluating our Critical Assumption We now report evidence on the plausibility of our assumption that web searches for “register to vote” (and similar terms) provide an equally valid measure of voter registration potential in the pre- and post-deadline periods. We begin with the possibility that this assumption is violated because different kinds of people entered such a query before versus after the deadline. Among the people who entered our five queries, the official web site with information on how to register was the most- clicked link in over 90% of the days in our sample period, and in many states was the most-clicked link on every single day. Even on Election Day, very few of the people who searched for “voter registration” appear to have had other intentions, such as checking their registration status or finding their polling place (see Supplementary Section S.2). Nonetheless, we saw some differences, after the deadline passed. In 19 states, the mean daily click-through rate to the relevant web sites was significantly lower in the post-deadline period. We found no significant difference between pre- and post-deadline click-through rates in 15 states, and found that click-through rates were actually higher, after registration closed, in a further 12 states (we used the conventional p < 0.05 threshold; Supplementary Fig. S5 shows trends in click-through rates, before and after the deadline).6 In the median state, the click-through rate after the deadline was one-third of a standard deviation lower than before the deadline. While the differences are not dramatic, they suggest that our critical assumption may not hold, and in the next section of the paper we assess the sensitivity of our 6 We ran this test in 46 states. Data were missing for New Mexico and the District of Columbia. North Dakota does not require voters to register, and in Maine in-person registration does not close, precluding the pre-/post-deadline comparison. Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 234 Table 1 Coefficient estimates, predictions, and fit statistics from four models Model Linear Linear AR(1) Spline Spline AR(1) Indicator variable Multiplicative coefficients (95% CI) Weekend 0.28 (0.24, 0.32) 0.24 (0.20, 0.27) 0.28 (0.24, 0.32) 0.24 (0.21, 0.27) Walk-in deadline 1.92 (0.92, 3.73) 1.82 (0.92, 3.42) 2.04 (1.02, 3.83) 1.99 (1.02,3.66) Mail-in deadline 1.78 (0.76, 3.61) 1.84 (0.84, 3.60) 1.52 (0.70, 2.97) 1.63 (0.76, 3.11) Online deadline 1.82 (0.41, 5.79) 2.26 (0.52, 7.31) 2.80 (0.65, 8.91) 3.38 (0.78, 10.86) Election Day 13.43 (5.78, 30.11) 13.33 (5.62, 29.71) 11.09 (4.83, 24.70) 11.41 (4.87, 25.17) % of search volume Predicted number of registrants (95% CI) 1% 184 (161, 210) 244 (204, 296) 115 (96, 139) 154 (122, 197) 10% 287 (256, 323) 373 (318, 443) 257 (229, 289) 329 (280, 390) 25% 660 (603, 725) 822 (724, 945) 848 (739, 976) 1004 (848, 1201) Alex Street et al. 50% 2001 (1849, 2168) 2357 (2115, 2641) 2709 (2335, 3233) 2906 (2472, 3490) 75% 4794 (4390, 5250) 5407 (4829, 6108) 4951 (4330, 5656) 5405 (4600, 6317) 90% 9917 (8921, 11,076) 10,786 (9439, 12,458) 8415 (7066, 9798) 9635 (8031, 11,405) 99% 38,271 (32,852, 44,848) 38,921 (32,090, 47,470) 31,683 (25,192, 40,703) 31,195 (23,982, 41,613) Total prediction, 3.89 (3.40, 4.51) 3.67 (3.11, 4.35) 3.66 (3.24, 4.17) 3.49 (3.03, 4.05) millions DIC (pD) 7888 (769) 7914 (780) 7898 (777) 7923 (790) Voter Registration Deadline Effects 235 findings to this possibility. At the aggregate level, we found no evidence that post-deadline searches Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 were more common in states with less competitive elections (see Supplementary Section S.6). The states in our sample were chosen based on the availability of registration data. To test our ability to predict beyond this sample, we successively held out each state for which we have data, and calculated new predictions. We then tested whether the observed number of registered voters in the held-out state, during the period when registration was open, was within the PI. Observed registrations were within the 90% PI in 14 of the 16 states in our sample. The exceptions were Delaware and New York, where the observed numbers were at the 0.2 and 0.7 percentile of pre- dictions, respectively. This is higher than the expected number of extreme results, but may (partly) reflect peculiarities in the registration records.7 Figure 2 illustrates the ability of a model, fit on data that excluded Arkansas, to predict daily registration numbers in that state. The black line shows observed registrations, the dashed gray line shows search volume, and the dotted gray line with points shows predictions. To the nearest thousand, the predicted number of pre-deadline registra- tions was 59,000 (90% PI 35,000–96,000), and the observed number was 62,000. Overall, this cross- validation exercise suggests that our approach is moderately robust to the inclusion of some states, but not others, in the sample. Of course, this does not guarantee that our post-deadline, counter- factual predictions were equally accurate. Finally, we conducted a general test of our ability to predict post-deadline behavior based solely on the pre-deadline relationship between search and registration timing. To do so, we used histor- ical data from a state that recently changed its voter registration rules. Iowa allowed EDR in 2008 and 2012, but not in 2004. We fit a model to web search and registration data in Iowa in 2004, and used the coefficients to predict from search to registration in Iowa in 2008 and 2012.8 The models were similar to those used for our counterfactual predictions, except that for the purpose of pre- dicting the timing of registration we moved the deadline indicator to Election Day in 2008 and 2012 (we still used only one deadline indicator in the predictions for each year). Search data from Iowa in each year were normalized against all other states in the same time period, and standardized using the procedure described above. Transforming the data in this way is necessary to control for temporal trends in web search volume, and for trends in the composition of internet users, though it does not rule out the possibility that web search, or the habits of search engine users, changed in unusual ways in Iowa. We compared the resulting predictions against the observed registration totals. The predictions were reasonably accurate. To the nearest thousand, we predicted 148,000 registrations from September to November 2008, and observed 103,000 (the observed value fell at the 3rd percentile of predictions, that is, slightly outside the 90% PI). The fact that EDR was new to Iowa in 2008 may help explain the high search volume and our over-pre- diction. In the same period in 2012, we predicted 100,000 registrations and observed 128,000 (the observed value was around the 84th percentile of predictions). Figures 3 and 4 display our results for the final weeks of the campaigns. In each figure, the black line (and left axis) shows observed registrations, while the dashed gray line (and right axis) shows search volume. The dotted gray line with points shows predictions, and the shaded area shows 90% PIs. Of all our cross-validation exercises, the Iowa data provide the best evidence on our approach of modeling post-deadline behavior based on pre-deadline data, since we are able to compare the predictions to actual outcomes. One limitation of the analysis, of course, is that we only have data from one state that recently changed its registration deadline (data from Montana, which also recently introduced EDR, were not available), and we can’t rule out the possibility that Iowa is atypical. 7 In Delaware, registration was closed in early September, and numbers jumped when it re-opened. Delaware was atypical in that the state saw no spike in registrations on the day of the deadline. In New York, about 70,000 voters were recorded as registering by mail in the week after the deadline, but also as having voted on November 6, 2012, suggesting that they did in fact register by the deadline. New York was the only state in our sample to show such large numbers of people as having registered in the week after the deadline. Officials in New York and other states assured us that the dates in voter registration files refer to the date of submission rather than the date of processing. Nonetheless, it is possible that errors occurred. Excluding New York and Delaware made little difference to our overall results. 8 Registration totals were provided by Iowa Secretary of State officials; see Supplementary Section S.7. 236 Alex Street et al. 0.4 20 Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 Arkansas 2012 Search Daily registrations (thousands) Predicted registrations 0.3 Observed registrations 15 Search volume 0.2 10 0.1 5 0 0 Sept1 Oct1 D Nov1 E Fig. 2 Predicting voter registration from search activity in Arkansas, using a model fit on data from other states. The black line and left axis show daily registrations, in thousands, the dashed gray line and right axis show search volume, and the dotted gray line with points shows predicted registrations. D shows the mail and in-person registration deadline, and E marks the date of the election. Iowa 2008 125 Search Predicted registrations Daily registrations (thousands) 0.8 Observed registrations 100 Search volume 0.6 75 0.4 50 0.2 25 0 0 Oct15 Nov1 E Fig. 3 Predicting voter registration in Iowa in 2008, using the relationship between search and registration estimated in 2004. Shaded gray areas show 90% PIs. 8 Sensitivity Analysis We now assess how violations of our critical assumption would affect our results. We report how different the relationship between web search and voter registration would have to be, in the post- deadline period, in order to produce substantively different outcomes. To do this, we allow for a post-deadline main effect, and calculate which values of this effect would be needed to yield pre- dictions ranging from 1 million to 6 million additional registrants. We assume the same functional form as in our core results and again use the log link, so that the model can be summarized as E½Yj search volume ¼ srch; after deadline ¼ expffðsrchÞg. However, we now allow for the Voter Registration Deadline Effects 237 Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 Iowa 2012 100 0.5 Search Daily registrations (thousands) Predicted registrations Observed registrations 0.4 75 Search volume 0.3 50 0.2 25 0.1 0 0 Oct15 Nov1 E Fig. 4 Predicting voter registration in Iowa in 2012, using the relationship between search and registration estimated in 2004. Shaded gray areas show 90% PIs. Table 2 Sensitivity of the predicted number of additional registrants to hypothetical post-deadline effects Expected number of additional registrants (million) expðpost Þ 1 0.286 2 0.571 3 0.857 3.5 1 4 1.143 5 1.429 6 1.714 alternative that E½Yj search volume ¼ srch; after deadline ¼ expfapost þ fðsrchÞg, where post denotes the post-deadline main effect. We cannot estimate post because we do not observe unre- stricted registration activity after the deadline, but we can calculate how a range of values of post affect our predictions. Table 2 shows the results; we take the exponent of post in order to report values on a linear scale. As Table 2 shows, in order for our estimate to fall from 3.5 million to 1 million, post-deadline search activity would have to be indicative of around 70% fewer registrations, compared with searches for the same terms in the pre-deadline period. If the relationship were only around half as strong, we would still predict an additional 2 million late registrants. In contrast, if searches on and immediately before Election Day indicated higher registration potential, for example, 40% higher, our prediction would rise to 5 million people, enough to add 4% to the electorate (equal to Obama’s margin of victory over Romney in the popular vote). These calculations do not account for all possibilities. The relationship between search and registration might change in more complex ways, modifying f(srch). But this simple exercise conveys some implications and limitations of our approach. 9 Discussion and Conclusions Web search data provide insights into citizens’ interests and intentions. In this article we measured web queries about voter registration, as Election Day approached, in order to estimate pre- and 238 Alex Street et al. post-deadline interest in registering. In 2012, millions of Americans searched online for information Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 on registering to vote, after the deadline in their state had already passed. Our results suggest that extending registration deadlines to Election Day would have allowed 3–4 million more Americans to register to vote. Given the limits of any single study, our results should be interpreted in light of other research. In this spirit, it is striking that our estimate of a 3% increase in turnout is similar to findings from over-time comparisons in districts that introduced EDR (e.g., Ansolabehere and Konisky 2006; Knee and Green 2011; Neiheisel and Burden 2012). Without randomization to lend credence to the claim that some units can be used to estimate counterfactual outcomes for others, all identification strategies are fragile. But our approach of modeling the counterfactual relies on different assump- tions than prior research on the effects of voter registration rules. The fact that we obtain similar results with different methods should add to our confidence that extending registration to Election Day would allow significantly more people to vote—albeit not as many as some advocates of easier registration hope (e.g., Piven and Cloward 2000). We have gone beyond prior research by showing that much of the late interest in registering is concentrated at the very end of the campaign period. Across the country, 26% of the post-deadline search activity occurred in the final 2 days of the 2012 campaign. This may help explain the limited impact of the National Voter Registration Act (NVRA) of 1993, which has puzzled scholars (Highton 2004; Berinsky 2005). The NVRA was expected to increase turnout by mandating motor-voter, public agency, and mail registration, and by regulating how states purge voter files. However, turnout in subsequent presidential elections actually fell. Extending deadlines to Election Day is among the few steps that would allow last-minute interest to feed through to electoral participation. Predictions based on web search data have recently come in for some criticism. Lazer et al. (2014) raise two concerns. The first is “big data hubris [. . .] the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” Our article draws on large sources of data: Google search logs and voter registration files. However, compiling the data did not require heavy computation, and the resulting data set is small (50 states over 67 days). Our Bayesian models can be fit on ordinary laptops. Nor do we face the common problem with big data that the number of predictors greatly exceeds the number of outcome measures (p >> n), which can lead to over-fitting. We avoided this problem by selecting a small number of relevant queries for construct validity. Hence, our research may not even qualify as “big data.” More importantly, we are clear that our approach should complement rather than substitute for other methods. Lazer et al. (2014) also argue that predictions based on web search data are unstable, because engineers frequently update search engine algorithms. While valid in some contexts, this critique is not relevant here, since our models and predictions apply only to a short time period, in which no major changes to the search algorithm were made.9 More generally, while Lazer et al. are correct to observe that much existing research uses web search data to predict future outcomes, this is not the only way in which scholars can use these data. We have illustrated a new application, modeling counterfactual outcomes. We now discuss some promising paths for future research. An obvious extension of our work would be to use web search data to study other aspects of election administration. The effects of electoral rules have acquired new relevance in the wake of the Supreme Court decision in Shelby County v. Holder (2013), which invalidated a key section of the 1965 Voting Rights Act, allowing many more jurisdictions to amend electoral rules without federal oversight. As Kousser and Mullin (2007) observe, the United States is cross-cut by boundaries for elections at the local, state, and federal levels. Besides variation in registration deadlines, U.S. elections differ in myriad ways, such as the presence or absence of voter ID laws (Bentele and O’Brien 2013), the availability of mail ballots, sample ballots, and other information (Wolfinger, Highton, and Mullin 2005; Kousser and Mullin 2007), or the accessibility of polling places (Brady 9 For the 2014 general election, Google has started displaying boxes with links to information on local registration rules, in response to queries such as “register to vote.” These may change click-through behavior, meaning that a replication of our approach for the 2014 election would require slightly different methods. Voter Registration Deadline Effects 239 and McNulty 2011). The onus is on the voter to find out about the details, and searching online is Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 now the obvious way to do so. Jurisdictional or district boundaries can be used to study the effects of different electoral rules, under certain conditions (Keele and Titiunik 2015). But this is only possible if data are available across the border. One great advantage of web search data is granularity. We expect that search data will become available at the local level and, given the rise of mobile devices—over half of web search volume now comes from such devices—that it will be possible to study search behavior by specific location and time. Evidence on information-seeking behavior could be compared with registration or voting patterns, and potentially to demographic information (e.g., from voter files or the census). This could reveal the effects of election laws, or the effects of the varying imple- mentation of those laws on certain sectors of the electorate (Atkeson et al. 2010). It may be possible to build on our methods in this paper by using additional data on search engine users—such as information on other searches generated by the same person—to find out which kind of people exhibit the most similar behavior on either side of administrative deadlines. This information could be used to refine models of counterfactual outcomes, using the best set of synthetic controls. Concerns over privacy will need careful attention, and models for collaboration between scholars and industry may need to be improved in order to make the data available (King 2011), but we foresee many opportunities for micro-level research on information and elections. Future research could also seek to explain information-seeking behavior, taking web search data as the outcome variable. Indeed, while political scientists have produced rich literatures on (typic- ally low) levels of political knowledge among the public (e.g., Lippmann 1922; Converse 1964), and on how citizens process political information (e.g., Sniderman, Brody, and Tetlock 1991; Lupia and McCubbins 1998), much less is known about the ways in which members of the public go about acquiring the limited political information that they do obtain. As Lau and Redlawsk (2006, 3) argue in the context of voting, “Most of our existing models of the vote choice are relatively static, based in a very real sense on cross-sectional survey data, taking what little (typically) voters know about the candidates at the time of the survey as a given with almost no thought to how they went about obtaining that information in the first place.” The same point could be made for our under- standing of political information more broadly. Research has been limited, in part, by a lack of tools for measuring how people go about collecting information. Scholars have made progress by studying temporal dynamics in informa- tion-seeking, or how emotions motivate inquiry. In so doing, they have relied on surveys or la- boratory experiments (Marcus, Neuman, and MacKuen 2000; Valentino, Hutchings, and Williams 2004; Lau and Redlawsk 2006). These research methods have advantages: surveys and the labora- tory provide controlled environments and allow the collection of data on individual attributes. But these are not the natural contexts in which members of the public seek political information. Data on web search activity provide a new source of leverage, at a time when the ubiquity of the internet is reducing the cost of acquiring information. Web search data are available from commercial search engines in large volumes, for a range of geographical units and time scales. These new measures will allow scientists to address old questions in new ways, and also promise to open new areas of study. References Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American Statistical Association 105(490):493–505. Angrist, Joshua, and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press. Ansolabehere, Stephen, and David M. Konisky. 2006. The introduction of voter registration and its effect on turnout. Political Analysis 14(1):83–100. Ansolabehere, Stephen, and Eitan Hersh. 2012. Validation: What big data reveal about survey misreporting and the real electorate. Political Analysis 20(4):437–59. Atkeson, Lonna R., Lisa A. Bryant, Thad E. Hall, Kyle Saunders, and Michael Alvarez. 2010. A new barrier to par- ticipation: Heterogeneous application of voter identification policies. Electoral Studies 29(1):66–73. 240 Alex Street et al. Bentele, Keith G., and Erin E. O’Brien. 2013. Jim Crow 2.0? Why states consider and adopt restrictive voter access Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 policies. Perspectives on Politics 11(4):1088–1116. Berinsky, Adam J. 2005. The perverse consequences of electoral reform in the United States. American Politics Research 33(4):471–91. Brady, Henry E., and John E. McNulty. 2011. Turning out to vote: The costs of finding and getting to the polling place. American Political Science Review 105(1):115–34. Brians, Craig L., and Bernard Grofman. 2001. Election day registration’s effect on US voter turnout. Social Science Quarterly 82(1):170–83. Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. Forthcoming. Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics. Burden, Barry C., David T. Canon, Kenneth R. Mayer, and Donald P. Moynihan. 2014. Election laws, mobilization, and turnout: The unanticipated consequences of election reform. American Journal of Political Science 58(1):95–109. Cain, Bruce E., and Ken McCue. 1985. The efficacy of registration drives. Journal of Politics 47(4):1221–1230. Carlin, Bradley P., and Thomas A. Louis. 2009. Bayesian methods for data analysis. Boca Raton, FL: CRC Press. Choi, Hyunyoung, and Hal Varian. 2012. Predicting the present with Google trends. Economic Record 88(1):2–9. Converse, Philip E. 1964. The nature of belief systems in mass publics. In Ideology and discontent, ed. David E. Apter, 206–61. New York: Free Press of Glencoe. Crainiceanu, Ciprian, David Ruppert, Gerda Claeskens, and Matthew P. Wand. 2005. Exact likelihood ratio tests for penalised splines. Biometrika 92(1):91–103. Fitzgerald, Mary. 2005. Greater convenience but not greater turnout: The impact of alternative voting methods on electoral participation in the United States. American Politics Research 33(6):842–67. Gallego, Aina, and Daniel Oberski. 2012. Personality and political participation: The mediation hypothesis. Political Behavior 34(3):425–51. Gelman, Andrew, and Donald B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7(4):457–511. Gerber, Alan S., Gregory A. Huber, David Doherty, Conor M. Dowling, Connor Raso, and Shang E. Ha. 2011. Personality traits and participation in political processes. Journal of Politics 73(03):692–706. Gimpel, James G., Joshua J. Dyck, and Daron R. Shaw. 2007. Election-year stimuli and the timing of voter registration. Party Politics 13(3):351–74. Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2008. Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014. Goel, Sharad, Jake M. Hofman, Sébastien Lahaie, David M. Pennock, and Duncan J. Watts. 2010. Predicting consumer behavior with web search. Proceedings of the National Academy of Sciences 107(41):17486–17490. Green, Donald P., and Alan S. Gerber. 2008. Get out the vote: How to increase voter turnout. Washington, DC: Brookings Institution Press. Hanmer, Michael J. 2009. Discount voting: Voter registration reforms and their effects. New York: Cambridge University Press. Hay, John L., and Anthony N. Pettitt. 2001. Bayesian analysis of a time series of counts with covariates: an application to the control of an infectious disease. Biostatistics 2(4):433–44. Herron, Michael C., and Daniel A. Smith. 2012. Souls to the polls: Early voting in Florida in the shadow of House Bill 1355. Election Law Journal 11(3):331–47. ———. 2013. The effects of House Bill 1355 on voter registration in Florida. State Politics & Policy Quarterly 13(2):279–305. Highton, Benjamin. 2004. Voter registration and turnout in the United States. Perspectives on Politics 2(3):507–15. Jackman, Robert W. 1987. Political institutions and voter turnout in the industrial democracies. American Political Science Review 81(2):405–23. Keele, Luke, and Rocı́o Titiunik. 2015. Geographic boundaries as regression discontinuities. Political Analysis 23(1):127–155. Keele, Luke, and William Minozzi. 2013. How much is Minnesota like Wisconsin? Assumptions and counterfactuals in causal inference with observational data. Political Analysis 21(2):193–216. Key, Valdimer O. 1949. Southern politics in state and nation. Knoxville: University of Tennessee Press. Keyssar, Alexander. 2009. The right to vote: The contested history of democracy in the United States (Rev. Ed.). New York: Basic Books. King, Gary. 2011. Ensuring the data-rich future of the social sciences. Science 331(6018):719–21. Knack, Stephen. 2001. Election-day registration: The second wave. American Politics Research 29(1):65–78. Knee, Matthew R., and Donald P. Green. 2011. The effects of registration laws on voter turnout: An updated assessment. In Facing the challenge of democracy: Explorations in the analysis of public opinion and political participation, eds. M. Sniderman Paul and Benjamin Highton, 312–28. Princeton, NJ: Princeton University Press. Kousser, J. Morgan. 1974. The shaping of southern politics: Suffrage restriction and the establishment of the one-party South, 1880–1910. New Haven, CT: Yale University Press. ———. 1999. Colorblind injustice: Minority voting rights and the undoing of the second reconstruction. Chapel Hill: University of North Carolina Press. Kousser, Thad, and Megan Mullin. 2007. Does voting by mail increase participation? Using matching to analyze a natural experiment. Political Analysis 15(4):428–45. Voter Registration Deadline Effects 241 Lau, Richard R., and David P. Redlawsk. 2006. How voters decide: Information processing in election campaigns. New Downloaded from https://www.cambridge.org/core. IP address: 168.151.154.60, on 23 Sep 2017 at 03:59:10, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1093/pan/mpv002 York: Cambridge University Press. Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343(6176):1203–1205. Lippmann, Walter. 1922. Public opinion. New York: Harcourt Brace. Lupia, Arthur, and Mathew D. McCubbins. 1998. The democratic dilemma: Can citizens learn what they need to know? New York: Cambridge University Press. Marcus, George E., W. Russell Neuman, and Michael MacKuen. 2000. Affective intelligence and political judgment. Chicago: University of Chicago Press. McDonald, Michael P. 2007. The true electorate a cross-validation of voter registration files and election survey demo- graphics. Public Opinion Quarterly 71(4):588–602. Mondak, Jeffery J., Matthew V. Hibbing, Damarys Canache, Mitchell A. Seligson, and Mary R. Anderson. 2010. Personality and civic engagement: An integrative framework for the study of trait effects on political behavior. American Political Science Review 104(01):85–110. Nagler, Jonathan. 1991. The effect of registration laws and education on US voter turnout. American Political Science Review 85(4):1393–1405. Neiheisel, Jacob R., and Barry C. Burden. 2012. The impact of election day registration on voter turnout and election outcomes. American Politics Research 40(4):636–664. Piven, Frances F., and Richard A. Cloward. 2000. Why Americans still don’t vote: And why politicians want it that way. Boston: Beacon Press. Plummer, Martyn. 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria, March, 20–22. Powell, G. Bingham, Jr. 1986. American voter turnout in comparative perspective. American Political Science Review 80(1):17–43. Rosenstone, Steven J, and Raymond E. Wolfinger. 1978. The effect of registration laws on voter turnout. American Political Science Review 72(1):22–45. Rosenstone, Steven, and John M. Hansen. 1993. Mobilization, participation and democracy in America. New York: MacMillan Publishing. Ruppert, D., M. Wand, and R. Carroll. 2003. Semiparametric regression. New York: Cambridge University Press. Sniderman, Paul M., Richard A. Brody, and Philip E. Tetlock. 1991. Reasoning and choice: Explorations in social psychology. New York: Cambridge University Press. Spiegelhalter, Richard, Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B 64(4):583–639. Street, Alex, Thomas A. Murray, John Blitzer, and Rajan S. Patel. 2015. Replication data for: Estimating voter regis- tration deadline effects with web search data. http://dx.doi.org/10.7910/DVN/28575. Teixeira, Ruy A. 1992. The disappearing American voter. Washington, DC: Brookings Institution Press. Tokaji, Daniel P. 2008. Voter registration and election reform. William & Mary Bill of Rights 17(2):1–56. Valentino, Nicholas A., Vincent L. Hutchings, and Dmitri Williams. 2004. The impact of political advertising on know- ledge, internet information seeking, and candidate preference. Journal of Communication 54(2):337–54. Varian, Hal R. 2014. Big data: New tricks for econometrics. Journal of Economic Perspectives 28(2):3–28. Verba, Sidney, Kay Lehman Schlozman, and Henry E. Brady. 1995. Voice and equality: Civic voluntarism in American politics. Cambridge, MA: Harvard University Press. Wolfinger, Raymond E., Benjamin Highton, and Megan Mullin. 2005. How postregistration laws affect the turnout of citizens registered to vote. State Politics & Policy Quarterly 5(1):1–23. Wolfinger, Raymond E., and Steven J. Rosenstone. 1980. Who votes? New Haven, CT: Yale University Press. Zeger, Scott L. 1988. A regression model for time series of counts. Biometrika 75(4):621–29.

References (67)

  1. 38 (0.78, 10.86) Election Day 13.43 (5.78, 30.11) 13.33 (5.62, 29.71) 11.09 (4.83, 24.70) 11.41 (4.87, 25.17) % of search volume Predicted number of registrants (95% CI) 1% 184 (161, 210) 244 (204, 296) 115 (96, 139) 154 (122, 197) 10% 287 (256, 323) 373 (318, 443) 257 (229, 289) 329 (280, 390) 25% 660 (603, 725) 822 (724, 945) 848 (739, 976) 1004 (848, 1201) 50% 2001 (1849, 2168) 2357 (2115, 2641) 2709 (2335, 3233) 2906 (2472, 3490) 75% 4794 (4390, 5250) 5407 (4829, 6108) 4951 (4330, 5656) 5405 (4600, 6317) 90% 9917 (8921, 11,076) 10,786 (9439, 12,458) 8415 (7066, 9798) 9635 (8031, 11,405)
  2. Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association 105(490):493-505.
  3. Angrist, Joshua, and Jo¨rn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.
  4. Ansolabehere, Stephen, and David M. Konisky. 2006. The introduction of voter registration and its effect on turnout. Political Analysis 14(1):83-100.
  5. Ansolabehere, Stephen, and Eitan Hersh. 2012. Validation: What big data reveal about survey misreporting and the real electorate. Political Analysis 20(4):437-59.
  6. Atkeson, Lonna R., Lisa A. Bryant, Thad E. Hall, Kyle Saunders, and Michael Alvarez. 2010. A new barrier to par- ticipation: Heterogeneous application of voter identification policies. Electoral Studies 29(1):66-73. Voter Registration Deadline Effects
  7. Bentele, Keith G., and Erin E. O'Brien. 2013. Jim Crow 2.0? Why states consider and adopt restrictive voter access policies. Perspectives on Politics 11(4):1088-1116.
  8. Berinsky, Adam J. 2005. The perverse consequences of electoral reform in the United States. American Politics Research 33(4):471-91.
  9. Brady, Henry E., and John E. McNulty. 2011. Turning out vote: The costs of finding and getting to the polling place. American Political Science Review 105(1):115-34.
  10. Brians, Craig L., and Bernard Grofman. 2001. Election day registration's effect on US voter turnout. Social Science Quarterly 82(1):170-83.
  11. Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. Forthcoming. Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics.
  12. Burden, Barry C., David T. Canon, Kenneth R. Mayer, and Donald P. Moynihan. 2014. Election laws, mobilization, and turnout: The unanticipated consequences of election reform. American Journal of Political Science 58(1):95-109.
  13. Cain, Bruce E., and Ken McCue. 1985. The efficacy of registration drives. Journal of Politics 47(4):1221-1230.
  14. Carlin, Bradley P., and Thomas A. Louis. 2009. Bayesian methods for data analysis. Boca Raton, FL: CRC Press.
  15. Choi, Hyunyoung, and Hal Varian. 2012. Predicting the present with Google trends. Economic Record 88(1):2-9.
  16. Converse, Philip E. 1964. The nature of belief systems in mass publics. In Ideology and discontent, ed. David E. Apter, 206-61. New York: Free Press of Glencoe.
  17. Crainiceanu, Ciprian, David Ruppert, Gerda Claeskens, and Matthew P. Wand. 2005. Exact likelihood ratio tests for penalised splines. Biometrika 92(1):91-103.
  18. Fitzgerald, Mary. 2005. Greater convenience but not greater turnout: The impact of alternative voting methods on electoral participation in the United States. American Politics Research 33(6):842-67.
  19. Gallego, Aina, and Daniel Oberski. 2012. Personality and political participation: The mediation hypothesis. Political Behavior 34(3):425-51.
  20. Gelman, Andrew, and Donald B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7(4):457-511.
  21. Gerber, Alan S., Gregory A. Huber, David Doherty, Conor M. Dowling, Connor Raso, and Shang E. Ha. 2011. Personality traits and participation in political processes. Journal of Politics 73(03):692-706.
  22. Gimpel, James G., Joshua J. Dyck, and Daron R. Shaw. 2007. Election-year stimuli and the timing of voter registration. Party Politics 13(3):351-74.
  23. Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2008. Detecting influenza epidemics using search engine query data. Nature 457(7232):1012-1014.
  24. Goel, Sharad, Jake M. Hofman, Se´bastien Lahaie, David M. Pennock, and Duncan J. Watts. 2010. Predicting consumer behavior with web search. Proceedings of the National Academy of Sciences 107(41):17486-17490.
  25. Green, Donald P., and Alan S. Gerber. 2008. Get out the vote: How to increase voter turnout. Washington, DC: Brookings Institution Press.
  26. Hanmer, Michael J. 2009. Discount voting: Voter registration reforms and their effects. New York: Cambridge University Press.
  27. Hay, John L., and Anthony N. Pettitt. 2001. Bayesian analysis of a time series of counts with covariates: an application to the control of an infectious disease. Biostatistics 2(4):433-44.
  28. Herron, Michael C., and Daniel A. Smith. 2012. Souls to the polls: Early voting in Florida in the shadow of House Bill 1355. Election Law Journal 11(3):331-47.
  29. ---. 2013. The effects of House Bill 1355 on voter registration in Florida. State Politics & Policy Quarterly 13(2):279-305.
  30. Highton, Benjamin. 2004. Voter registration and turnout in the United States. Perspectives on Politics 2(3):507-15.
  31. Jackman, Robert W. 1987. Political institutions and voter turnout in the industrial democracies. American Political Science Review 81(2):405-23.
  32. Keele, Luke, and Rocı´o Titiunik. 2015. Geographic boundaries as regression discontinuities. Political Analysis 23(1):127-155.
  33. Keele, Luke, and William Minozzi. 2013. How much is Minnesota like Wisconsin? Assumptions and counterfactuals in causal inference with observational data. Political Analysis 21(2):193-216.
  34. Key, Valdimer O. 1949. Southern politics in state and nation. Knoxville: University of Tennessee Press.
  35. Keyssar, Alexander. 2009. The right to vote: The contested history of democracy in the United States (Rev. Ed.). New York: Basic Books.
  36. King, Gary. 2011. Ensuring the data-rich future of the social sciences. Science 331(6018):719-21.
  37. Knack, Stephen. 2001. Election-day registration: The second wave. American Politics Research 29(1):65-78.
  38. Knee, Matthew R., and Donald P. Green. 2011. The effects of registration laws on voter turnout: An updated assessment. In Facing the challenge of democracy: Explorations in the analysis of public opinion and political participation, eds. M. Sniderman Paul and Benjamin Highton, 312-28. Princeton, NJ: Princeton University Press.
  39. Kousser, J. Morgan. 1974. The shaping of southern politics: Suffrage restriction and the establishment of the one-party South, 1880-1910. New Haven, CT: Yale University Press.
  40. ---. 1999. Colorblind injustice: Minority voting rights and the undoing of the second reconstruction. Chapel Hill: University of North Carolina Press.
  41. Kousser, Thad, and Megan Mullin. 2007. Does voting by mail increase participation? Using matching to analyze a natural experiment. Political Analysis 15(4):428-45.
  42. Alex Street et al.
  43. Lau, Richard R., and David P. Redlawsk. 2006. How voters decide: Information processing in election campaigns. New York: Cambridge University Press.
  44. Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343(6176):1203-1205.
  45. Lippmann, Walter. 1922. Public opinion. New York: Brace.
  46. Lupia, Arthur, and Mathew D. McCubbins. 1998. The democratic dilemma: Can citizens learn what they need to know? New York: Cambridge University Press.
  47. Marcus, George E., W. Russell Neuman, and Michael MacKuen. 2000. Affective intelligence and political judgment. Chicago: University of Chicago Press.
  48. McDonald, Michael P. 2007. The true electorate a cross-validation of voter registration files and election survey demo- graphics. Public Opinion Quarterly 71(4):588-602.
  49. Mondak, Jeffery J., Matthew V. Hibbing, Damarys Canache, Mitchell A. Seligson, and Mary R. Anderson. 2010. Personality and civic engagement: An integrative framework for the study of trait effects on political behavior. American Political Science Review 104(01):85-110.
  50. Nagler, Jonathan. 1991. The effect of registration laws and education on US voter turnout. American Political Science Review 85(4):1393-1405.
  51. Neiheisel, Jacob R., and Barry C. Burden. 2012. The impact of election day registration on voter turnout and election outcomes. American Politics Research 40(4):636-664.
  52. Piven, Frances F., and Richard A. Cloward. 2000. Why Americans still don't vote: And why politicians want it that way. Boston: Beacon Press.
  53. Plummer, Martyn. 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria, March, 20-22.
  54. Powell, G. Bingham, Jr. 1986. American voter turnout in comparative perspective. American Political Science Review 80(1):17-43.
  55. Rosenstone, Steven J, and Raymond E. Wolfinger. 1978. The effect of registration laws on voter turnout. American Political Science Review 72(1):22-45.
  56. Rosenstone, Steven, and John M. Hansen. 1993. Mobilization, participation and democracy in America. New York: MacMillan Publishing.
  57. Ruppert, D., M. Wand, and R. Carroll. 2003. Semiparametric regression. New York: Cambridge University Press.
  58. Sniderman, Paul M., Richard A. Brody, and Philip E. Tetlock. 1991. Reasoning and choice: Explorations in social psychology. New York: Cambridge University Press.
  59. Spiegelhalter, Richard, Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B 64(4):583-639.
  60. Street, Alex, Thomas A. Murray, John Blitzer, and Rajan S. Patel. 2015. Replication data for: Estimating voter regis- tration deadline effects with web search data. http://dx.doi.org/10.7910/DVN/28575.
  61. Teixeira, Ruy A. 1992. The disappearing American voter. Washington, DC: Brookings Institution Press.
  62. Tokaji, Daniel P. 2008. Voter registration and election reform. William & Mary Bill of Rights 17(2):1-56.
  63. Valentino, Nicholas A., Vincent L. Hutchings, and Dmitri Williams. 2004. The impact of political advertising on know- ledge, internet information seeking, and candidate preference. Journal of Communication 54(2):337-54.
  64. Varian, Hal R. 2014. Big data: New tricks for econometrics. Journal of Economic Perspectives 28(2):3-28.
  65. Verba, Sidney, Kay Lehman Schlozman, and Henry E. Brady. 1995. Voice and equality: Civic voluntarism in American politics. Cambridge, MA: Harvard University Press.
  66. Wolfinger, Raymond E., Benjamin Highton, and Megan Mullin. 2005. How postregistration laws affect the turnout of citizens registered to vote. State Politics & Policy Quarterly 5(1):1-23.
  67. Wolfinger, Raymond E., and Steven J. Rosenstone. 1980. Who votes? New Haven, CT: Yale University Press. Zeger, Scott L. 1988. A regression model for time series of counts. Biometrika 75(4):621-29. Voter Registration Deadline Effects
About the author
Papers
45
Followers
2
View all papers from Rajan Patelarrow_forward