Download empriical tests lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Large numbers wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Addition wikipedia , lookup

German tank problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Elementary mathematics wikipedia , lookup

Transcript
Empirical tests:
Chi-square:
If one has n observations y(I), I=1,…n, and one has k categories(
for the dice problem we had 11 categories) then if p(I) is the
probability that the Ith category will be chosen.
Consider
V=  1<=I <=k (y(I) – np(I))*(y(I)-np(I)/np(I)
Or after some manipulation
V = 1/n  1<=I <=k (y(I)2/p(I))-n.
In tables we used the number of degrees of freedom: k-1.
In the table that is attached the 4 percent entry in row 10 of 18.31
says that we will have V>18.31 only 5 percent of the time.
If V is much too low the observed values are so close to the
expected
On the other hand if V is too high, it is not random either.
The table entry are approximations and are only good if n is
sufficiently large. Essentially we want np(I) to be at least 5.
If the dice are actually biased, that fact would be detected as n gets
larger-i.e. more throws. But large values of n will tend to smooth
out locally nonrandom behavior .i.e. blocks of numbers with a
strong bias followed by blocks of numbers with the opposite bias.
The chi square test can be summarized as follows: A fairly large
number n of independent observations is made.
We count the number of observations falling into each of k
categories and compute V from above.
Then V is compared to the numbers in the table with  = less than
the 99% enter or greater than the 1% entry, we reject the numbers
as not sufficiently random.
If V lies between 99 and 95% or between 5 and 1%, the numbers
are suspect, between 90 and 95% or between 5 and 10% the
numbers might be "almost suspect". The test for any one random
number generator is done at least 3 times on different sets of data
(like different seeds) and if at least 2 of the 3 results are suspect the
numbers are regarded as not sufficiently random.
Kolmogorov-SmirnovWhat happens if data does not fall neatly into categories but form a
continuum- like it can be any real number between 0 and 1. We
then look at F(x) a distribution function of a random quantity z.
i.e.
F(x) = probability that ( z <=x).
If we take n independent observations of z, getting values z(1),
z(2),… z(n) we can form the emperical distribution F(n,x) =
(number of the z’s which are <=x)/n
The Kolmogorov-Smirnov test can be used when F(x) has no
jumps and is based on the difference between F(x) and F(n,x).
Let K(+,n)= sqrt(n) max (F(n,x)- F(x))
And K(-,x) = sqrt(n)max(F(x)-F(n,x))
Then look at table using the K’s. The table is exact for any value of
n
Algorithm:
1.
2.
3.
4.
Get n observations z(1),… z(n).
Sort them in asscending order z(1) <= z(2) … z(n).
Compute K(+,n) = max(over j) (j/n – F(z(j))
Compute K(-,n) = max(over j) (F(z(j))-(j-1)/n)
The fact that all n observations are to be remembered and sorted
leads us to prefer comparatively small values of n, say 1000.
Since the K-S test applies to distribution having no jumps while
the chi-square applies to functions having nothing but jumps, the
two tests are intended for different applications. But we can apply
the chi square to continuous F if we divide the domain into k parts
and ignore all variations within each part. Since the chi-square test
is intrinsically less accurate and since it requires comparatively
large values of n, the KS test has several advantages when a
continuous distribution is to be tested.
Assume we have numbers
0.3 0.1 0.5 0.7 0.2 0.9 0.6
We arrange them as
0.1 0.2 0.3
0.5 0.6 0.7 0.9
If this was uniform we would expect something like
0.125 0.25 0.375 0.5 0.6125 0.75 0.875
We then would have K=.075
A.Frequency test- 2 possibilities
For example if d=3 with the Y’s being 021011200, the numbers for
the categories (0,1,2) are (4,3,2) so
Chi square gives:
V= ((4-3)*(4-3)+ (3-3)*(3-3) + (2-3)*(2-3))/3=2/3. Since k =2, we
look at the line of 2 in the table and see that it is between 75% and
50%- acceptable.
Homework: Do a chi square frequency test on the dice problem.
If the Y’s were 020011100 the numbers per category would be
(5,3,1)
And V would be (4+0+4)/3=2.67 or in the 25% grouping.
For Kolmogorov we would get the numbers:
Cumulative relative frequency
F(x<=0) = 4/9
F(x<=1)=7/9
Smirnov numbers = 1/9
Homework: Do a Smirnov test on the frequency data from the dice
Serial test
Consider pairs (Y(2j),Y(2j+1)), which for 020011100
Would give us the matrix
211
120
1 0 0 and apply the chi square test with k =d*d categories with
probability 1/(d*d) for each category.
In our case we get v =5 which is between 95 and 75% category on
line 8.
(Actually we need much more data for this test like n> 5 d*d.
Homework: Do a serial test on the dice problem- with a Chi-square
statistic
Gap testassume we have 3 numbers and the sequences is
1 2 3 2 1 3 1 3 2 1 2 3 2 1 3.
for the 1s gaps are 3 1 2 3
What are they for 2’s? 1 4 1 1
For the 3’s? 2 1 3 2
How many of length 0 -0
1 -5
2 -3
3 3
4 1
In theory a gap of length 0 is 1/3
A gap of length 1 is 2/3*1/3 = 2/9
A gap of length 2 is 2/3*2/3*1/3= 4/27
In theory for gap of length 3 is
p(no 1)p(no 1)p(no 1)p(1)= 2/3*2/3*2/3*1/3
So a probability for n different types of things of gap k is
(n-1/n)(k-1) (1/n)
So probability of gaps less than k is
sum( from i=1 to k-1) (n-1/n)(i-1) (1/n) = 1 - (1-1/n)(k)
for n=3 we get less than 1: 1-2/3=1/3
less than 2: 1-(2/3)(2/3)= 5/9
so for our data we should have had gaps of 0 of 4
gaps of 0 or 1 we should have had 5/9*12 or about 7
gaps of 0 ,1 ,2 should have had 19/27*12 or between 8 and 9
Doing a Smirnov test we get the worst at 0- therefore the Smirnov
number is about .25
Algorithm
(1) Specify the theoretical frequency- for dice set n=6
(2) Arrange the observed sample of gaps in a cumulative
distribution
(3) Find maximum deviation from theory
Look at Table
Homework- do a gap test and Smirnov Statistic on the dice
Poker tests: Assume we take numbers that are 3 digits long, what is
the probability they are all different, they are all the same, there is
a pair.- imagine they are three consecutive dice rolls-theoretical
number that are all the same=6, number that they are all different is
6*5*4=120, exactly one pair is 6*6*6*6*6*6-126.