Geoffrey Borman

Followers

Following

Public Views

Interests

Uploads

Papers by Geoffrey Borman

An Effort to Close Achievement Gaps at Scale Through Self-Affirmation

Educational Evaluation and Policy Analysis, Mar 1, 2016

Several recent high-profile but small-scale experimental studies have shown that relatively simpl... more Several recent high-profile but small-scale experimental studies have shown that relatively simple, brief, but well-conceptualized socialpsychological interventions, which focus on individual and socially constructed beliefs that affect school outcomes, can have important impacts on secondary and postsecondary students' short-and longer term educational outcomes (Cohen,

Download

New evidence on self-affirmation effects and theorized sources of heterogeneity from large-scale replications

Journal of Educational Psychology, Apr 1, 2017

Brief, targeted self-affirmation writing exercises have recently been offered as a way to reduce ... more Brief, targeted self-affirmation writing exercises have recently been offered as a way to reduce racial achievement gaps, but evidence about their effects in educational settings is mixed, leaving ambiguity about the likely benefits of these strategies if implemented broadly. A key limitation in interpreting these mixed results is that they come from studies conducted by different research teams with different procedures in different settings; it is therefore impossible to isolate whether different effects are the result of theorized heterogeneity, unidentified moderators, or idiosyncratic features of the different studies. We addressed this limitation by conducting a well-powered replication of self-affirmation in a setting where a previous large-scale field experiment demonstrated significant positive impacts, using the same procedures. We found no evidence of effects in this replication study and estimates were precise enough to reject benefits larger than an effect size of 0.10. These null effects were significantly different from persistent benefits in the prior study in the same setting, and extensive testing revealed that currently theorized moderators of self-affirmation effects could not explain the difference. These results highlight the potential fragility of self-affirmation in educational settings when implemented widely and the need for new theory, measures, and evidence about the necessary conditions for self-affirmation success.

Download

Site Selection in Experiments: A Follow-Up Evaluation of Site Recruitment in Two Scale-Up Studies

Society for Research on Educational Effectiveness, 2015

Randomized experiments are commonly used to evaluate if particular interventions improve student ... more Randomized experiments are commonly used to evaluate if particular interventions improve student achievement. In these evaluations, the goal is typically to estimate a single, average treatment impact, and ideally, the results of the evaluation can be used to make policy decision by schools, districts, and other governing bodies (e.g., via the What Works Clearinghouse). While random assignment to the treatment conditions ensures that that the treatment in fact causes these changes, typically the schools or districts that take part in the experiment are not randomly selected from a well-defined inference population. If an intervention is more or less effective in some schools or districts than others, however, this convenience sampling strategy results in a causal effect that does not readily generalize. Recently, attention has turned to developing new methodologies for improving generalizations from large-scale experiments (see Schochet, Puma, & Deke, 2014). There have been three streams of research in this area. The first has focused on assessing the degree of similarity between the convenience sample of schools or districts in a completed experiment (e.g., Stuart, Cole, Bradshaw, & Leaf, 2011; Olsen, Orr, Bell, & Stuart, 2013; Tipton, in press). The second area focuses on reweighting this convenience sample to be more similar to one or more well-defined inference populations (e.g., O'Muircheartaigh & Hedges, 2014; Tipton, 2013). This work shows that there can often be a large-penalty to delaying discussions of generalization until after the evaluation is complete -increased standard errors and, often, limits to bias reduction. In reaction to these limitations, the third area shifts focus from improvements through statistical adjustments to improvements through design and improved recruitment strategies (e.g. Tipton et al, 2014; Tipton, 2014; Roschelle et al, 2014). Tipton et al (2014) provide a purposive sampling alternative to the convenience sampling most commonly found in the field. This design-based approach uses propensity score methodology to first compare an inference population to those eligible for recruitment in the experiment, and then creates strata for site-selection * . The goal is to help recruiters create a recruitment strategy that is targeted and, that when perfectly implemented, results in a sample of sites that is like a miniature of the inference population of interest. When not perfectly implemented (which is seen as likely), the goal is to reduce or eliminate the under-coverage problems that limit the effectiveness of post-hoc statistical adjustments. The paper situates this more general method in relation to two scale-up studies conducted by SEDL and the University of Wisconsin, Madison: one of Open Court Reading and the other of Everyday Math. These studies began recruitment in the fall of 2011, with the first round of experimental results available in the spring of 2014. This paper is a follow up study to the examples proposed and carried out in Tipton et al (2014), with the goal of evaluating the success of these methods in practice, as well as addressing additional problems that arose in recruitment. The three aims of this work are: 1) Comparing sites actually included in the final study sample to those sites that were proposed in the original * Note that this work assumes that not all units in the population are eligible to be in the experiment. When this is not the case, Tipton (2014) provides an alternative stratification method using cluster analysis.

Download

The District-Wide Effectiveness of the Achieve3000 Program: A Quasi-Experimental Study

Journal of education for students placed at risk, May 30, 2023

Title I: The Evolution and Effectiveness of Compensatory Education

The ... Yearbook of the National Society for the Study of Education, Apr 4, 2005

Experiments for Educational Evaluation and Improvement

Peabody Journal of Education, Oct 1, 2002

To help develop and improve programs and practices in U.S. schools and classrooms, current nation... more To help develop and improve programs and practices in U.S. schools and classrooms, current national policies strongly encourage more widespread application of rigorous research methods for evaluating what works. Although randomized experiments have been accepted and applied as the gold standard for testing and developing innovations in other fields, most notably medicine, their application to questions in education has been

Self-Affirmation Effects Are Produced by School Context, Student Engagement With the Intervention, and Time: Lessons From a District-Wide Implementation

Psychological Science, Sep 5, 2018

Over the past two decades, a new class of social psychological interventions, such as mind-set (B... more

Download

A Multistate District-Level Cluster Randomized Trial of the Impact of Data-Driven Reform on Reading and Mathematics Achievement

Educational Evaluation and Policy Analysis, Sep 1, 2011

Analyzing mathematics and reading achievement outcomes from a district-level random assignment st... more Analyzing mathematics and reading achievement outcomes from a district-level random assignment study fielded in over 500 schools within 59 school districts and seven states, the authors estimate the 1-year impacts of a data-driven reform initiative implemented by the Johns Hopkins Center for Data-Driven Reform in Education (CDDRE). CDDRE consultants work with districts to implement quarterly student benchmark assessments and provide district and school leaders with extensive training on interpreting and using the data to guide reform. Relative to a control condition, in which districts operated as usual without CDDRE services, the data-driven reform initiative caused statistically significant districtwide improvements in student mathematics achievement. The CDDRE intervention also had a positive effect on reading achievement, but the estimates fell short of conventional levels of statistical significance.

Download

Promoting School Belongingness and Academic Performance: A Multisite Effectiveness Trial of a Scalable Student Mindset Intervention

The Use of Randomized Trials to Inform Education Policy

Routledge eBooks, Feb 17, 2015

A Replicable Identity-Based Intervention Reduces the Black-White Suspension Gap at Scale

Proceedings of the 2022 AERA Annual Meeting

Reducing Achievement Gaps at Scale Through a Brief Self-Affirmation Intervention

We propose to conduct a multisite randomized controlled trial (RCT) in a national sample of at le... more We propose to conduct a multisite randomized controlled trial (RCT) in a national sample of at least 6 districts to investigate the effectiveness of a virtually costless, scalable self-affirmation intervention designed to reduce stereotype threat and promote academic performance. The study design is a multisite student-level RCT implemented with two separate cohorts of seventh-grade students (2019‒20 and 2020‒21). Each year, we administer the first exercise at the beginning of school; three follow-up exercises occur just prior to a consequential evaluative activity. Power calculations assume 48 schools and 4,800 students, with a minimum detectable effect size of d = 0.10 for each cohort. Prior to randomization, during the first week of school, all seventh-grade students will complete a survey, with a follow-up survey shortly after receiving final administration of the intervention, and at the end of eighth grade.

Promoting School Belongingness and Academic Performance: A Multisite Effectiveness Trial of a Scalable Student Mindset Intervention

Study of Teacher Development and Student Achievement in Elementary Science. ” The primary investigators are

A power analysis determines the ability to discern a statistically significant difference in outc... more A power analysis determines the ability to discern a statistically significant difference in outcome-if one exists-between the treatment and control groups of an experiment. Conventionally, researchers assume that the power should be at least 0.80 and design their experiment to ensure that they will meet this threshold. This analysis examines how a randomized controlled trial (RCT) in the Los Angeles Unified School District made certain assumptions of power in its original grant application, and how these power assumptions needed to be modified based on the actual data. Source: Data comes from a randomized controlled trial testing the effect of teacher development and student achievement in elementary science in

Download

Author(s)

Title: Designing a sample selection plan to improve generalizations from two scale-up experiments

A Quasi-Experimental Study of the Impacts of the Kids Read Now Summer Reading Program

Journal of Education for Students Placed at Risk (JESPAR), 2020

Abstract Drawing on administrative data and reading achievement data provided by two Midwestern s... more Abstract Drawing on administrative data and reading achievement data provided by two Midwestern school districts for three schools, we analyze the literacy impacts of a replicable summer reading program, Kids Read Now. The program includes both school-based and home-based components that together encourage students to remain engaged in reading high-quality books over the summer months. We apply propensity score matching methods to match participating Kids Read Now students with similar comparison students. Our results suggest that Kids Read Now participants outperformed comparison group students, with a mean effect size of d = .12. Additional model estimates of the impacts for those students who read more of the books provided by Kids Read Now revealed that those who received all 9 books realized an effect size of d = .18 relative to the outcomes for matched comparison students. We discuss how these results might be considered in light of prior findings on summer learning.

A Replicable Identity-Based Intervention Reduces the Black-White Suspension Gap at Scale

American Educational Research Journal, 2021

Nationally, educators suspend Black students at greater rates than any other group. This dispropo... more Nationally, educators suspend Black students at greater rates than any other group. This disproportionality is fueled by stereotypes casting Black students as “troublemakers”—a label students too often internalize as part of their identities. Across two independent double-blind randomized field trials involving over 2,000 seventh graders in 11 middle schools, we tested the efficacy of a brief intervention to buffer students from stereotypes and mitigate the racial suspension gap. The self-affirmation intervention helps students access positive aspects of their identities less associated with troublemaking in school. Confirmed in both trials, treatment effects cut Black-White suspension and office disciplinary referral gaps during seventh and eighth grade by approximately two thirds, with even greater impacts for Black students with prior infractions.

Site Selection in Experiments: An Assessment of Site Recruitment and Generalizability in Two Scale-up Studies

Journal of Research on Educational Effectiveness, 2016

ABSTRACT Recently, statisticians have begun developing methods to improve the generalizability of... more ABSTRACT Recently, statisticians have begun developing methods to improve the generalizability of results from large-scale experiments in education. This work has included the development of methods for improved site selection when random sampling is infeasible, including the use of stratification and targeted recruitment strategies. This article provides the next step in this literature—a template for assessing generalizability after a study is completed. In this template, first records from the recruitment process are analyzed, comparing differences between those who agreed to be in the study and those who did not. Second, the final sample is compared to the original inference population and different possible subsets, with the goal of determining where the results best generalize (and where they do not). Throughout, these methods are situated in the post hoc analysis of results from two scale-up studies. The article ends with a discussion of the use of these methods more generally when reporting results from randomized trials.

Title I: The Evolution and Effectiveness of Compensatory Education

Yearbook of the National Society for the Study of Education, 2005

The University of Wisconsin-Madison School of Education ELPA 940-003, Evidence-Based Education Policy Fall 2005 Educational Sciences Building, Rm. 1161, Thursday 4:40 - 7:10 Instructor: Secretary

Geoffrey Borman

Uploads

Papers by Geoffrey Borman

Log In