* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download empriical tests lecture
Survey
Document related concepts
Transcript
Empirical tests: Chi-square: If one has n observations y(I), I=1,…n, and one has k categories( for the dice problem we had 11 categories) then if p(I) is the probability that the Ith category will be chosen. Consider V= 1<=I <=k (y(I) – np(I))*(y(I)-np(I)/np(I) Or after some manipulation V = 1/n 1<=I <=k (y(I)2/p(I))-n. In tables we used the number of degrees of freedom: k-1. In the table that is attached the 4 percent entry in row 10 of 18.31 says that we will have V>18.31 only 5 percent of the time. If V is much too low the observed values are so close to the expected On the other hand if V is too high, it is not random either. The table entry are approximations and are only good if n is sufficiently large. Essentially we want np(I) to be at least 5. If the dice are actually biased, that fact would be detected as n gets larger-i.e. more throws. But large values of n will tend to smooth out locally nonrandom behavior .i.e. blocks of numbers with a strong bias followed by blocks of numbers with the opposite bias. The chi square test can be summarized as follows: A fairly large number n of independent observations is made. We count the number of observations falling into each of k categories and compute V from above. Then V is compared to the numbers in the table with = less than the 99% enter or greater than the 1% entry, we reject the numbers as not sufficiently random. If V lies between 99 and 95% or between 5 and 1%, the numbers are suspect, between 90 and 95% or between 5 and 10% the numbers might be "almost suspect". The test for any one random number generator is done at least 3 times on different sets of data (like different seeds) and if at least 2 of the 3 results are suspect the numbers are regarded as not sufficiently random. Kolmogorov-SmirnovWhat happens if data does not fall neatly into categories but form a continuum- like it can be any real number between 0 and 1. We then look at F(x) a distribution function of a random quantity z. i.e. F(x) = probability that ( z <=x). If we take n independent observations of z, getting values z(1), z(2),… z(n) we can form the emperical distribution F(n,x) = (number of the z’s which are <=x)/n The Kolmogorov-Smirnov test can be used when F(x) has no jumps and is based on the difference between F(x) and F(n,x). Let K(+,n)= sqrt(n) max (F(n,x)- F(x)) And K(-,x) = sqrt(n)max(F(x)-F(n,x)) Then look at table using the K’s. The table is exact for any value of n Algorithm: 1. 2. 3. 4. Get n observations z(1),… z(n). Sort them in asscending order z(1) <= z(2) … z(n). Compute K(+,n) = max(over j) (j/n – F(z(j)) Compute K(-,n) = max(over j) (F(z(j))-(j-1)/n) The fact that all n observations are to be remembered and sorted leads us to prefer comparatively small values of n, say 1000. Since the K-S test applies to distribution having no jumps while the chi-square applies to functions having nothing but jumps, the two tests are intended for different applications. But we can apply the chi square to continuous F if we divide the domain into k parts and ignore all variations within each part. Since the chi-square test is intrinsically less accurate and since it requires comparatively large values of n, the KS test has several advantages when a continuous distribution is to be tested. Assume we have numbers 0.3 0.1 0.5 0.7 0.2 0.9 0.6 We arrange them as 0.1 0.2 0.3 0.5 0.6 0.7 0.9 If this was uniform we would expect something like 0.125 0.25 0.375 0.5 0.6125 0.75 0.875 We then would have K=.075 A.Frequency test- 2 possibilities For example if d=3 with the Y’s being 021011200, the numbers for the categories (0,1,2) are (4,3,2) so Chi square gives: V= ((4-3)*(4-3)+ (3-3)*(3-3) + (2-3)*(2-3))/3=2/3. Since k =2, we look at the line of 2 in the table and see that it is between 75% and 50%- acceptable. Homework: Do a chi square frequency test on the dice problem. If the Y’s were 020011100 the numbers per category would be (5,3,1) And V would be (4+0+4)/3=2.67 or in the 25% grouping. For Kolmogorov we would get the numbers: Cumulative relative frequency F(x<=0) = 4/9 F(x<=1)=7/9 Smirnov numbers = 1/9 Homework: Do a Smirnov test on the frequency data from the dice Serial test Consider pairs (Y(2j),Y(2j+1)), which for 020011100 Would give us the matrix 211 120 1 0 0 and apply the chi square test with k =d*d categories with probability 1/(d*d) for each category. In our case we get v =5 which is between 95 and 75% category on line 8. (Actually we need much more data for this test like n> 5 d*d. Homework: Do a serial test on the dice problem- with a Chi-square statistic Gap testassume we have 3 numbers and the sequences is 1 2 3 2 1 3 1 3 2 1 2 3 2 1 3. for the 1s gaps are 3 1 2 3 What are they for 2’s? 1 4 1 1 For the 3’s? 2 1 3 2 How many of length 0 -0 1 -5 2 -3 3 3 4 1 In theory a gap of length 0 is 1/3 A gap of length 1 is 2/3*1/3 = 2/9 A gap of length 2 is 2/3*2/3*1/3= 4/27 In theory for gap of length 3 is p(no 1)p(no 1)p(no 1)p(1)= 2/3*2/3*2/3*1/3 So a probability for n different types of things of gap k is (n-1/n)(k-1) (1/n) So probability of gaps less than k is sum( from i=1 to k-1) (n-1/n)(i-1) (1/n) = 1 - (1-1/n)(k) for n=3 we get less than 1: 1-2/3=1/3 less than 2: 1-(2/3)(2/3)= 5/9 so for our data we should have had gaps of 0 of 4 gaps of 0 or 1 we should have had 5/9*12 or about 7 gaps of 0 ,1 ,2 should have had 19/27*12 or between 8 and 9 Doing a Smirnov test we get the worst at 0- therefore the Smirnov number is about .25 Algorithm (1) Specify the theoretical frequency- for dice set n=6 (2) Arrange the observed sample of gaps in a cumulative distribution (3) Find maximum deviation from theory Look at Table Homework- do a gap test and Smirnov Statistic on the dice Poker tests: Assume we take numbers that are 3 digits long, what is the probability they are all different, they are all the same, there is a pair.- imagine they are three consecutive dice rolls-theoretical number that are all the same=6, number that they are all different is 6*5*4=120, exactly one pair is 6*6*6*6*6*6-126.