Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Item Response Theory Using Bayesian Networks by Richard Neapolitan I will follow the Bayesian network approach to IRT forwarded by Almond and Mislevy: http://ecd.ralmond.net/tutorial/ A good tutorial that introduces basic IRT is provided at the following site: http://www.creative-wisdom.com/multimedia/ICHA.htm Let Θ represent arithmetic ability. Θ is called a proficiency. We have the following items to test Θ: Item Task 1 (easiest) 2+2 2 16 - 12 3 64 x 27 4 673 x 515 5 (hardest) 105,110 / 67 0 represents average ability. -2 is the lowest ability. 2 is the highest ability. We assume performance on items is independent given the ability. Item_1 Right 77.2 Wrong 22.8 Item_2 Right 64.6 Wrong 35.4 pos2 pos1 Zero neg1 neg2 Theta 10.0 20.0 40.0 20.0 10.0 Item_3 Right 49.3 Wrong 50.7 Item_4 Right 35.4 Wrong 64.6 Item_5 Right 22.9 Wrong 77.1 IRT Logistic Evidence Model 1 p( X i Right ) ( bi ) 1 e bi measures the difficulty of the item. b = 0 (average difficulty) P 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 theta b = - 1.5 (easy item) P 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 theta b = 1.5 (hard item) P 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 theta Discrimination Parameter: a p X i Right 1 1 a b i i e a = 5, b = 0 P 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 theta a = .5, b = 0 P 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 theta a = 5, b = 1.5 P 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 theta Two Proficiency Models Two Proficiency Models Compensatory: More of Proficiency 1 compensates for less of Proficiency 2. Combination rule is sum. Conjunctive: Both proficiencies are needed to solve the problem. Combination rule is minimum. Disjunctive: Two proficiencies represent alternative solution paths to the problem. Combination rule is maximum. H M L P2_Comp H 33.3 M 33.3 L 33.3 P1_comp 33.3 33.3 33.3 Compensatory Right 50.0 Wrong 50.0 H M L P1_conj 33.3 33.3 33.3 H M L P2_conj 33.3 33.3 33.3 Conjunctive Right 50.0 Wrong 50.0 H M L P1_disj 33.3 33.3 33.3 H M L Disjunctive Right 50.0 Wrong 50.0 P2_disj 33.3 33.3 33.3 Yes No Yes No Skill1 50.0 50.0 0.5 ± 0.5 Skill2 50.0 50.0 0.5 ± 0.5 Task1 Right 50.0 Wrong 50.0 0.5 ± 0.5 Task2 Right 50.0 Wrong 50.0 0.5 ± 0.5 Task3 Right 50.0 Wrong 50.0 0.5 ± 0.5 Mixed Number Subtraction This example is drawn from the research of Tatsuoka (1983) and her colleagues. Almond and MsLevy (2012) did the analysis. Their work began with cognitive analyses of middle-school students’ solutions of mixed-number subtraction problems. Klein et al. (1981) identified two methods that students used to solve problems in this domain: • Method A: Convert mixed numbers to improper fractions, subtract, then reduce if necessary • Method B: Separate mixed numbers into whole number and fractional parts; subtract as two subproblems, borrowing one from minuend whole number if necessary; then simplify and reduce if necessary. Their analysis concerns the responses of 325 students Tatsuoka identified as using Method B to fifteen items in which it is not necessary to find a common denominator. The items are grouped in terms of which of the following procedures is required for a solution under Method B: Skill 1: Basic fraction subtraction. Skill 2: Simplify/reduce fraction or mixed number. Skill 3: Separate whole number from fraction. Skill 4: Borrow one from the whole number in a given mixed number. Skill 5: Convert a whole number to a fraction. All models are conjunctive. Learning Parameters From Data Learning From Complete Data We use Dirichlet distributions to represent our belief about the parameters. In our hypothetical prior sample, – – – – a11 is the number of times Θ tooks its first value. b11 is the number of times Θ took its second value. a21 is the number of times I took its first value when Θ took its first value. b21 is the number of times I took its second value when Θ took its first value. Θ I 1 1 1 1 1 2 2 1 2 1 2 2 2 2 2 2 Suppose we have the data in the table above. a11 = a11 + 3 = 2 + 3 = 5 b11 = b11 + 5 = 2 + 5 = 7 P(Θ1 ) = 5/12 a21 = a21 + 2 = 1 + 2 = 3 b21 = b21 + 1 = 1 + 1 = 2 P(I1 | Θ1) = 3/5 Θ I ? 1 ? 1 ? 2 ? 1 ? 1 ? 2 ? 2 ? 2 But we don’t have data on the proficiency. We then use algorithms that learn when there is missing data. Markov Chain Monte Carlo (MCMC). Expectation Maximization (EM). Influence Diagrams Standard IRT In traditional applications of IRT there usually is one proficency Θ and a set of items. A normal prior is placed on Θ. The parameters a and b in the logistic function are learned from data. The model is then used to do inference for the next case.