This study examines differences and similarities in the infcrmation provided by direct and indire... more This study examines differences and similarities in the infcrmation provided by direct and indirect measures of writing from the Collegiate Assessment of Academic Proficiency (CAAP). The indirect measure was a 72-item multiple-choice test, while the direct measure involved respondirl to two essay prompts. The 40-minute multiple-choice test can be subdivided into six skill areas: (1) punctuation; (2) grammar and usage; (3) sentence structure; (4) style; (5) strategy; and (6) organization. Item response and essay ratings were calibrated together using a graded response model from item response theory. Results suggest that while the essays are measuring a different component of writing ability than the multiple-choice test, their overlap is substantial. Relative information plots also suggest that the writing sample provides information equal to as many a.. 40 multiple-choice items.
This paper demonstrates the relationship between the concept of unidimensionality and direction o... more This paper demonstrates the relationship between the concept of unidimensionality and direction of an item in a multidimensional space. The basic premise is that if items that measure in the same direction are combined to form a test, that test will meet the item response theory requirements of unidimensionality. This will be true even if the items measuring in the same direction measure more than one psychological construct. A form of the ACT *
A model of test item dependency is presented and used to illustrate the effect that violations of... more A model of test item dependency is presented and used to illustrate the effect that violations of local independence have on the behavior of item characteristic curves. The der:ndency model is flexible enough to simulate the interaction of a number of factors including item difficulty and item discrimination, varying degrees of item dependence, and item order or sequence effects. The model also provides for an ability-by-dependence interaction. Results suggest that the shift in an item's characteristic curve can be fairly dramatic, producing nonlogistic response probability curves.
An adaptive test can usually match or exceed the measurement precision of conventional tests seve... more An adaptive test can usually match or exceed the measurement precision of conventional tests several times its length. This increased efficiency is not without costs, however, as the models underlying adaptive testing make strong assumptions about examinees and items. Most troublesome is the assumption that item pools are unidimensional. Truly unidimensional item pools are the exception rather than the rule, so procedures have been established for handling multidimensional pools. One option is to insure that every adaptive test measures the same composite of the multiple abilities represented in the item pool. However, this approach forfeits the multidimensional structure of the item pool. The alternative is to retain this structure by splitting the item pool into more unidimensional subsets and administering each separately. This approach, however, increases testing time. A third approach is proposed-concurrent adaptive measurement. In this approach collateral information-information that an item provides about a secondary ability-is used to update ability estimates obtained from adaptive tests administered in separate content areas. A study is reviewed, which evaluated the effectiveness of the concurrent adaptive measurement procedure using unidimensional estimates of two two-dimensional item pools of 200 items each. The results indicate that both bias aad the standard error of the estimated ability decrease when collateral information is used. As the correlation between the latent skills increases, the standard error drops slightly. Six tables and four figures are included. (SLD)
This paper summarizes an investigation of the format used for equating the 1993 Illinois Goal Ass... more This paper summarizes an investigation of the format used for equating the 1993 Illinois Goal Assessment Program (IGAP) sixth grade reading test. In 1992, each student took only one test, either a narrative test or an expository tcat. In 1993, there was onl:: one test, which included both formats. Several possible approaches for linking the 1993 test to the 1992 tests, including use of the partial credit model and true-score equating, are proposed and investigated in this study. The sample size for the 1992 narrative test was 10,178. The or;pcsitory test sample was 10,277, and the sample for the 1993 ,t was 4,830. Results show that the 1993 examinees have a hit er mean-scaled score than the 1992 examinees if the test is linked to she narrative test, but a lower score if linked to the expository test. Three tables and 10 figures present analysis results. (Contains 8 references.) (Author/SLD)
It has previously been shown that the Bock-Aitkin procedure (R. Bock and M. Aitkin, 1981) is an i... more It has previously been shown that the Bock-Aitkin procedure (R. Bock and M. Aitkin, 1981) is an instance of the EM algorithm when trying to find the marginal maximum likelihood estimate for a discrete latent ability variable (latent trait). In this paper, it is shown that the Bock-Aitkin procedure is a numerical implementation of the EM algorithm for a continuous latent ability variable using numerical quadrature. Further, the relationship between the EM algorithm, marginal maximum likelihood estimation, and the Bock-Aitkin procedure is described for both the discrete and continuous cases. Some issues concerning the use of the Bock-Aitkin procedure and the EM algorithm are addressed. (Contains 31 references.) (SLD) Reproductions supplied by EDRS are the best that can be made from the original document.
In some cognitive testing situations it is believed, despite reporting only a single score, that ... more In some cognitive testing situations it is believed, despite reporting only a single score, that the test items differentiate levels of multiple traits. In such situa tions, the reported score may represent quite disparate composites of these multiple traits. Thus, when attempting to interpret a single score from a set of multidimensional items, several concerns naturally arise. First, it is important to know what composite of traits is being measured at all levels of the reported score scale. Second, it is also necessary to discern that all examinees, no matter where they lie in the latent trait space, are being measured on the same composite of traits. Thus, the role of multidimensionality in the interpretation or meaning given to various score levels must be examined. This paper presents a method for computing multidimensional information and provides examples of how different aspects of test information can be displayed graphically to form a profile of a test in a two-dimensiona...
This study investigated the similarity of information that is provided by direct and indirect met... more This study investigated the similarity of information that is provided by direct and indirect methods of writing assessment. The skills required by each of these techniques provide a framework for a cognitive model of writing skills from which these procedures can be compared. It is suggested that practitioners in terested in reliably measuring all aspects of the pro posed writing process continuum, as characterized by this cognitive model, use both indirect and direct methods. Index terms: Confirmatory factor analy sis, Essay tests, Free-response tests, Multiple-choice tests, Writing assessment, Writing processes.
This study examines the relationship between levels of reliability and the power of two bias and ... more This study examines the relationship between levels of reliability and the power of two bias and differential item functioning (DIF) detection methods. Both methods, the Mantel-Haenszel (MH) (Holland & Thayer, 1988) and the Simultaneous Item Bias (SIB) (Shealy & Stout, 1991), use examinees' raw scores as a conditioning variable in the computation of differential performance between two groups of interest. As a result, the extent to which examinees' observed scores accurately reflect their true abilities plays an important role. If examinees are misrepresented by their cbserved score (as for a test with low reliability) then the ability of bias detection methods to determine item bias may not be very accurate. Results suggest that for a fixed length test, the power of both statistics increases moderately as reliability is increased and substantially sample size increased. However, the combination of small sample sizes and higli relibility resulted in a decrease of power. For most of the simulated conditions the MI4 procedure and SIB had very similar rates of correctly rejecting the biased item.
Uploads
Papers by Terry Ackerman