Terry Ackerman

The University of Iowa, Department of Psychological and Quantitative Foundations, Faculty Member

Followers

Following

Public Views

Interests

Uploads

Papers by Terry Ackerman

An Analysis of CAAP Essay and Multiple-Choice Writing Tests

This study examines differences and similarities in the infcrmation provided by direct and indire... more This study examines differences and similarities in the infcrmation provided by direct and indirect measures of writing from the Collegiate Assessment of Academic Proficiency (CAAP). The indirect measure was a 72-item multiple-choice test, while the direct measure involved respondirl to two essay prompts. The 40-minute multiple-choice test can be subdivided into six skill areas: (1) punctuation; (2) grammar and usage; (3) sentence structure; (4) style; (5) strategy; and (6) organization. Item response and essay ratings were calibrated together using a graded response model from item response theory. Results suggest that while the essays are measuring a different component of writing ability than the multiple-choice test, their overlap is substantial. Relative information plots also suggest that the writing sample provides information equal to as many a.. 40 multiple-choice items.

format_quoteCombining essay ratings with MC tests enhances information gained at the upper ability scale, equivalent to 40 additional MC items.format_quote

Download

Building a Test Using Items That Require More than One Skill to Determine a Correct Answer

This paper demonstrates the relationship between the concept of unidimensionality and direction o... more This paper demonstrates the relationship between the concept of unidimensionality and direction of an item in a multidimensional space. The basic premise is that if items that measure in the same direction are combined to form a test, that test will meet the item response theory requirements of unidimensionality. This will be true even if the items measuring in the same direction measure more than one psychological construct. A form of the ACT *

Download

A General Model for Item Dependency

A model of test item dependency is presented and used to illustrate the effect that violations of... more A model of test item dependency is presented and used to illustrate the effect that violations of local independence have on the behavior of item characteristic curves. The der:ndency model is flexible enough to simulate the interaction of a number of factors including item difficulty and item discrimination, varying degrees of item dependence, and item order or sequence effects. The model also provides for an ability-by-dependence interaction. Results suggest that the shift in an item's characteristic curve can be fairly dramatic, producing nonlogistic response probability curves.

format_quoteGraphical interpretations illustrate item dependency influences on ICCs across various logistic models, revealing complex interactions.format_quote

Download

ACT research report series: The effect of item response dependency on trait or ability dimensionality

PsycEXTRA Dataset

format_quoteRestricting item difficulty range eliminated the emergence of a second factor in principal component analyses of item response data.format_quote

Download

Concurrent Adaptive Measurement of Multiple Abilities

An adaptive test can usually match or exceed the measurement precision of conventional tests seve... more An adaptive test can usually match or exceed the measurement precision of conventional tests several times its length. This increased efficiency is not without costs, however, as the models underlying adaptive testing make strong assumptions about examinees and items. Most troublesome is the assumption that item pools are unidimensional. Truly unidimensional item pools are the exception rather than the rule, so procedures have been established for handling multidimensional pools. One option is to insure that every adaptive test measures the same composite of the multiple abilities represented in the item pool. However, this approach forfeits the multidimensional structure of the item pool. The alternative is to retain this structure by splitting the item pool into more unidimensional subsets and administering each separately. This approach, however, increases testing time. A third approach is proposed-concurrent adaptive measurement. In this approach collateral information-information that an item provides about a secondary ability-is used to update ability estimates obtained from adaptive tests administered in separate content areas. A study is reviewed, which evaluated the effectiveness of the concurrent adaptive measurement procedure using unidimensional estimates of two two-dimensional item pools of 200 items each. The results indicate that both bias aad the standard error of the estimated ability decrease when collateral information is used. As the correlation between the latent skills increases, the standard error drops slightly. Six tables and four figures are included. (SLD)

Download

Equating Reading Test Scores That Combine Narrative and Expository Test Formats

This paper summarizes an investigation of the format used for equating the 1993 Illinois Goal Ass... more This paper summarizes an investigation of the format used for equating the 1993 Illinois Goal Assessment Program (IGAP) sixth grade reading test. In 1992, each student took only one test, either a narrative test or an expository tcat. In 1993, there was onl:: one test, which included both formats. Several possible approaches for linking the 1993 test to the 1992 tests, including use of the partial credit model and true-score equating, are proposed and investigated in this study. The sample size for the 1992 narrative test was 10,178. The or;pcsitory test sample was 10,277, and the sample for the 1993 ,t was 4,830. Results show that the 1993 examinees have a hit er mean-scaled score than the 1992 examinees if the test is linked to she narrative test, but a lower score if linked to the expository test. Three tables and 10 figures present analysis results. (Contains 8 references.) (Author/SLD)

format_quoteExamining test score means shows 1993 group had mean ability estimate 1.49, 6.5 points higher than 1992 group at 246.16 scaled score.format_quote

Download

A comparison of the effects of random versus fixed order of item presentation via the computer

PsycEXTRA Dataset

ACT research report series: The relationship between the Bock-Aitkin procedure and the EM algorithm for IRT model estimation

PsycEXTRA Dataset

It has previously been shown that the Bock-Aitkin procedure (R. Bock and M. Aitkin, 1981) is an i... more It has previously been shown that the Bock-Aitkin procedure (R. Bock and M. Aitkin, 1981) is an instance of the EM algorithm when trying to find the marginal maximum likelihood estimate for a discrete latent ability variable (latent trait). In this paper, it is shown that the Bock-Aitkin procedure is a numerical implementation of the EM algorithm for a continuous latent ability variable using numerical quadrature. Further, the relationship between the EM algorithm, marginal maximum likelihood estimation, and the Bock-Aitkin procedure is described for both the discrete and continuous cases. Some issues concerning the use of the Bock-Aitkin procedure and the EM algorithm are addressed. (Contains 31 references.) (SLD) Reproductions supplied by EDRS are the best that can be made from the original document.

Download

Creating a Test Information Profile for a Two-Dimensional Latent Space

Applied Psychological Measurement, 1994

In some cognitive testing situations it is believed, despite reporting only a single score, that ... more In some cognitive testing situations it is believed, despite reporting only a single score, that the test items differentiate levels of multiple traits. In such situa tions, the reported score may represent quite disparate composites of these multiple traits. Thus, when attempting to interpret a single score from a set of multidimensional items, several concerns naturally arise. First, it is important to know what composite of traits is being measured at all levels of the reported score scale. Second, it is also necessary to discern that all examinees, no matter where they lie in the latent trait space, are being measured on the same composite of traits. Thus, the role of multidimensionality in the interpretation or meaning given to various score levels must be examined. This paper presents a method for computing multidimensional information and provides examples of how different aspects of test information can be displayed graphically to form a profile of a test in a two-dimensiona...

format_quoteForm 24B's item vectors distinguished multiple trait levels, showcasing comprehensive skill assessments across the (θ₁, θ₂) composites.format_quote

Download

A Comparison of the Information Provided by Essay, Multiple-Choice, and Free-Response Writing Tests

Applied Psychological Measurement, 1988

This study investigated the similarity of information that is provided by direct and indirect met... more This study investigated the similarity of information that is provided by direct and indirect methods of writing assessment. The skills required by each of these techniques provide a framework for a cognitive model of writing skills from which these procedures can be compared. It is suggested that practitioners in terested in reliably measuring all aspects of the pro posed writing process continuum, as characterized by this cognitive model, use both indirect and direct methods. Index terms: Confirmatory factor analy sis, Essay tests, Free-response tests, Multiple-choice tests, Writing assessment, Writing processes.

format_quoteEvidence indicates that direct and indirect writing assessments measure different abilities, with limited correlation in study findings suggesting distinct assessments needed.format_quote

Download

ACT research report series: Comparison of two logistic multidimensional item response theory models: Research report ONR90-8

PsycEXTRA Dataset

Comparison of two lo g is tic multidimensional item response theory models

format_quoteThis work is approved for public release, indicating significance and utility for various stakeholders in psychometrics and educational assessment.format_quote

Download

An Investigation of the Relationship between Reliability, Power, and the Type I Error Rate of the Mantel-Haenszel and Simultaneous Item Bias Detection Procedures

This study examines the relationship between levels of reliability and the power of two bias and ... more This study examines the relationship between levels of reliability and the power of two bias and differential item functioning (DIF) detection methods. Both methods, the Mantel-Haenszel (MH) (Holland & Thayer, 1988) and the Simultaneous Item Bias (SIB) (Shealy & Stout, 1991), use examinees' raw scores as a conditioning variable in the computation of differential performance between two groups of interest. As a result, the extent to which examinees' observed scores accurately reflect their true abilities plays an important role. If examinees are misrepresented by their cbserved score (as for a test with low reliability) then the ability of bias detection methods to determine item bias may not be very accurate. Results suggest that for a fixed length test, the power of both statistics increases moderately as reliability is increased and substantially sample size increased. However, the combination of small sample sizes and higli relibility resulted in a decrease of power. For most of the simulated conditions the MI4 procedure and SIB had very similar rates of correctly rejecting the biased item.

format_quoteAs reliability increased, discrimination parameters improved and became less variable, underscoring reliability's impact on bias detection.format_quote

Download

A Didactic Example of the Influence of Conditioning on the Complete Latent Ability Space When Performing DIF Analyses