Extensions Thereof
2014
…
6 pages
Sign up for access to the world's latest research
Abstract
… an item shows DIF if individuals having the same ability, but from different groups, do not have the same probability of getting the item right (Hambleton, Swaminathan, & Rogers, 1991, p. 110) Let us begin in the beginning …. with binary item responses. ! Uniform DIF exists when the probability of answering the item correctly is greater for one group than the other uniformly over all levels of ability: There is no interaction between ability level and group membership. ! Nonuniform DIF exists when the probability of answering the item correctly is not greater across all levels of ability for any group: There is interaction between ability level and group membership.
Related papers
Journal of Educational Measurement, 2009
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF using a combination of the decisions made with BD and MH to those of LR. The results revealed that while the Type I error rate of CDR was consistently below the nominal alpha level, the Type I error rate of LR was high for the conditions having unequal ability distributions. In addition, the power of CDR was consistently higher than that of LR across all forms of DIF. In any assessment of individuals, test developers want to be sure that the scores are free of bias so that they are valid. Test bias occurs when the performance on the test requires sources of knowledge different from those intended to be measured, causing the test scores to be less valid for a particular group (Camilli, 1993; Camilli & Shepard, 1994). The presence of item bias can be defined as the simultaneous presence of two factors on item performance (Camilli, 1992; Shealy & Stout, 1993). The first factor is that the performance on a given item (called the studied item) is dependent on abilities or resources other than those intended to be measured. Using the terminology of Shealy and Stout, the abilities intended to be measured are referred to as target abilities and those that are observed but unintended to be measured are referred to as nuisance determinants. The second factor required for the presence of item bias is that the distribution of the nuisance determinant is unequal for two groups. By convention, the two groups are referred to as the reference and focal groups. In other words, item bias exists when the difference between focal and reference group performance on the studied item cannot be completely accounted for by differences in the target abilities, and thus must, at least partially, be accounted for by a nuisance determinant. Early investigations of item bias focused on the identification of differences in mean item scores for the focal and reference groups (Angoff, 1972; Angoff, 1982; Cleary & Hilton, 1968). It is not clear, however, whether these mean differences are attributable to a systematic bias caused by the presence of a nuisance determinant, or to a between-group difference in target ability distribution. Whether a between-group difference in mean item score is attributable to differences in the distribution of target ability or nuisance determinant can be assessed by examining between-group differences in item performance after controlling for the level of target ability (Dorans & 314
Computational Statistics & Data Analysis, 2013
IRT models generally relate (probabilistically) the answer given to an item by an individual to characteristics of items and individuals. Common characteristics of the item include difficulty and discrimination. For the individual, on the other hand, it can be any latent trait or ability, such as proficiency in mathematics or level of depression.
In multiple-choice items the response probability on an item may be viewed as the result of two distinct latent processes-a cognitive process to solve the problem, and another random process that leads to the choice of a certain alternative (the process of giving the actual response). An incomplete latent class model is formulated that describes the first process by a Rasch model and the second process by a guessing model. Alternative models are specified that contain additional parameters describing differential item functioning (DIF) in the two processes. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as chi-square. As an example of the model, four five-choice items from the Second International Mathematics Study (1987) with a sample of 3,002 secondary students were considered. A 42-item list of references and 3 data cables are included. (Author/SLD) Reproductions supplied by EDRS are the best that can be made from the original document.
PsycEXTRA Dataset
2009
The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the groups. Determining whether the difference in performance on an item between two demographic groups is due to between group differences in ability or some form of unfairness in the item is a more complex task for a polytomous item, because of its many score categories, than for a dichotomous item. Effective DIF detection methods must be able to locate DIF within each of these various score categories. The Mantel, Generalized Mantel Haenszel (GMH), and Logistic Regression (LR) are three of several DIF detection methods that are able to test for DIF in polytomous items. There have been relatively few studies on the effectiveness of polytomous procedures to detect DIF; and of those studies, only a very small percentage have examined the efficiency of the Mantel, GMH, and LR procedures when item discrimination magnitudes and category intersection parameters vary and when there are different patterns of DIF (e.g., balanced versus constant) within score categories. This Monte Carlo simulation study compared the Type I error and power of the Mantel, GMH, and OLR (LR method for ordinal data) procedures when variation occurred in 1) the item discrimination parameters, 2) category intersection parameters, 3) DIF patterns within score categories, and 4) the average latent traits between the reference and focal groups. Results of this investigation showed that high item discrimination levels were directly related to increased DIF detection rates. The location of the difficulty parameters was also found to have a direct effect on DIF detection rates. Additionally, depending on item difficulty, DIF magnitudes and patterns within score categories were found to impact DIF detection rates and finally, DIF detection power increased as DIF magnitudes became larger. The GMH outperformed the Mantel and OLR and is recommended for use with polytomous data when the item discrimination varies across items.
Applied Stochastic Models in Business and Industry, 2020
The main characteristic of a binary test is the item response function (IRF) expressing the probability P (d, a) of an object under test (OUT), possessing ability a, to successfully overcome the test item (TI) of difficulty d. Each specific test requires its own definitions of TI difficulty and OUT ability and has its own P (d, a) describing the probability of "success" mentioned above. This is demonstrated on the basis of several examples taken from different areas of statistical engineering. A common feature is that they all relate to "antagonistic" situations, in which the "success" of one side may formally be considered as a "loss" to the opposite side. For such situations ability and difficulty are two interchangeable sides of the same coin and the corresponding IRFs are complementary, that is, P (d, a) = 1 − P(a, d), with all consequences and restrictions imposed by this property. A study shows that the family of feasible IRFs is limited and has a number of interesting properties, which are discussed in the article. The analysis provided should facilitate avoiding errors in decisions about an IRF adequately describing the studied test.
Communications in Statistics - Theory and Methods, 2013
Differential item functioning is present when something about the characteristics of a test taker interferes with the relationship between ability and item response. Nonuniform differential item functioning exists when there is interaction between ability level and group membership.
Scientometrics, 2005
A plethora of research studies has focused on Differential Item Functioning. Despite the diversity of DIF detection techniques offered, little research has been done on the interpretation of DIF results. This study was undertaken to investigate whether there is any order to the interpretations offered for the real cause of items flagged as displaying DIF. The analysis of the opinion of experts showed that there is no such order. It is argued that such “ad hoc” interpretations have rendered DIF analysis of little use. It is further suggested that research should focus on devising a mechanism for basing DIF interpretations on principled grounds.
Communications in Statistics - Theory and Methods, 2019
Abstract Differences in type I error and power rates for majority and minority groups are investigated when differential item functioning (DIF) contamination in a test is unbalanced. Typically, type I error and power rates are aggregated across groups, however cumulative results can be misleading if subgroups are affected differently by study conditions. With unbalanced DIF contamination, type I error and power rates are reduced for groups with more DIF items favoring them, and increased for groups with less DIF contamination. Even when aggregated impacts appear small, differing subgroup impacts can result in a larger proportional bias than in the original data.
Bruno Zumbo