Download How To Conduct Good Experiments?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
How To Conduct Good Experiments?
Ernesto Costa
DEI/CISUC
[email protected]
DEI/CISUC
http://www.dei.uc.pt/~ernesto
© 2003 Ernesto Costa
Evonet Summer School - Parma
1
Summary
What is the goal of this talk?
Background
DEI/CISUC
Probabilities
Random Variables and Probability distributions
Inferential Statistics
Applying the Theory
Conclusions
© 2003 Ernesto Costa
Evonet Summer School - Parma
2
What is the goal of this talk?
I don’t know! I have been asked to give a talk on
that subject…
I do know!
EC is (much) an experimental discipline
Most of our work is to compare things
DEI/CISUC
Algorithms
Parameters settings
What is a fair comparison?
© 2003 Ernesto Costa
Evonet Summer School - Parma
3
What is the goal of this talk?
Looking for EC papers
DEI/CISUC
One problem
One run
Several runs
10, 20, 30?
Use average values
Use average of the bests
Use the mean
Use the mean and the standard deviation
Use Confidence Levels / Intervals
© 2003 Ernesto Costa
Evonet Summer School - Parma
4
What is the goal of this talk?
What is a good experiment?
Identify independent and dependent variables
Mutation rate  fitness
Different crossover operators  fitness
Evolution and Learning  # of survivors
DEI/CISUC
Identify the conditions of the experiment
Initial conditions
Number of runs
Parameters Settings
Identify the kind of Statistics you will need
Descriptive
Inferential
Non parametric
© 2003 Ernesto Costa
Evonet Summer School - Parma
5
Background
Probabilities
Experiment: procedure whose variable result
cannot be predicted ahead of time.
Tossing a coin, rolling a dice
DEI/CISUC
Sample Space: set of possible outcomes of an
experiment.
{Heads, Tails}
{1,2,3,4,5,6}
Event: subset of the sample space
{Heads}
{1,3,5}
© 2003 Ernesto Costa
Evonet Summer School - Parma
6
Background
Probabilities
Probability of an Event
Measure the likelihod that the event will occur
Tossing a (fair) coin: probability(outcome=heads) =1/2
Axioms
DEI/CISUC
P(E)0
P(S)=1
For mutually exclusive events
© 2003 Ernesto Costa
  
P  Ei    P( Ei )
 i 1  i 1
Evonet Summer School - Parma
7
Background
Probabilities
Example
What is the probability of
when rolling two dice the
sum of the two outcomes
equal 7?
1/6
Working Methodology
© 2003 Ernesto Costa
Sample Space
Event
Prob. Assign.
Two Dice Experiment
Number
DEI/CISUC
Experiment
8
6
4
2
0
Tw o Dice
Experiment
1 2 3 4 5 6 7 8 9 10 11 12
Sum
Evonet Summer School - Parma
8
Probabilities
Example: A family has two children. Knowing that one is a boy what
is the probability that they have two boys?
DEI/CISUC
1/3
Definition: Let E and F be two events, with p(F)>0.
The conditional probability of E given F, p(E|F),
is defined as:
© 2003 Ernesto Costa
p(E  F)
p(E | F) 
p(F)
Evonet Summer School - Parma
9
Probabilities
Example: A building has two lifts. One is used by 45% of the residents
And the other by 55%. The first one, 5% of the time have problems, while
The second 8% of the time can let you in trouble. Knowing that one lift had
a problem , what is the probability of being lift number 1?
33,8%
DEI/CISUC
Theorem of Bayes:
p( A1 | B) 
© 2003 Ernesto Costa
p( B | A1 ) p( A1 )
p( B | A1 ) p( A1 )

p( B)
p( B | A1 ) p( A1 )  p( B | A2 ) p( A2 )
Evonet Summer School - Parma
10
Random Variables and Probability Distributions
Random Variables
Definition: A random variable, X, is a function from the
sample space of an experiment to the set of real
numbers.
S
X
DEI/CISUC
s
SX
0
1
2
X(s)
3
A RV is a function … and is not random!!!
© 2003 Ernesto Costa
Evonet Summer School - Parma
11
Random Variables and Probability Distributions
Working Methodology
Experiment
Sample Space
Event
Prob. Assign.
Random Variable
Prob. Distribution
DEI/CISUC
Example
Toss coin (3x)
Experiment
8 possibilities
Sample Space
Event
X(HHT)=2
Random Variable
© 2003 Ernesto Costa
# Heads
f(xi)=p(X=xi)
Prob. Assign.
Xf(xi)
Prob. Distrib.
Evonet Summer School - Parma
12
Random Variables and Probability Distributions
DEI/CISUC
Example: Suppose you toss a coin three times. Let X(t)
denote the number of heads that appear when t is the
result. Então X(t):
© 2003 Ernesto Costa
Probability Distribution
f(xi)
X(HHH) = 3
X(HHT) = X(HTH) = X(THH) = 2
X(TTH) = X(THT) = X(HTT) = 1
X(TTT) = 0
0,4
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
0
1
2
3
X
Probabilty Distribution
Evonet Summer School - Parma
13
Random Variables and Probability Distributions
Types of Random Variables
Discrete
Probability Mass Function
P( X  x)  p( x)  0
 p( x)  1
x
Continuous
DEI/CISUC
Probability Density Function (pdf)
f(x)
f ( x )  0, x


f ( x)dx  1

b
P(a  X  b)   f ( x)dx
© 2003 Ernesto Costa
a
0
x1 x2 x
Evonet Summer School - Parma
14
Random Variables and Probability Distributions
Measures of Random Variables
Location
Mean
E ( X )     xp( x)
E( X )   
x

 xf ( x)dx

Dispersion
Variance
V ( X )   2   ( x   ) 2 p ( x)
DEI/CISUC
x
V (X )   
2

2
(
x


)
f ( x)


Standard Deviation
© 2003 Ernesto Costa
  V (X )
Evonet Summer School - Parma
15
Random Variables and Probability Distributions
Independence of Random Variables
Two random Variables, X and Y, over the same sample
space S, are said to be independent iff:
p( X (s)  r1  Y (s)  r2 )  p( X (s)  r1 )* p(Y (s)  r2 )
Theorem of the Product
DEI/CISUC
E ( X * Y )  E ( X )* E (Y )
Theorem of Sum
V ( X  Y )  V ( X )  V (Y )
© 2003 Ernesto Costa
Evonet Summer School - Parma
16
Random Variables and Probability Distributions
Discrete Probability Distributions
Binomial Distribution
Domain {0,1,2,…n}
Probability mass function p( X  xi )  pi  Cin p i q ni
n
  E ( X )   Cin p i q n i i  np
Mean np 
i 0
 2  V ( X )  E ( X 2 )  E ( X ) 2  npq
Variance npq 
Binomial Distribution
P=0.3
0,3
0,25
0,25
0,2
0,2
Series1
0,15
0,1
Probability
Probability
DEI/CISUC
Binomial Distribution
P=0.5
0,15
Series1
0,1
0,05
0,05
0
0
1
2
3
4
5
6
7
Values x
© 2003 Ernesto Costa
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11 12
Values x
Evonet Summer School - Parma
17
Random Variables and Probability Distributions
Discrete Probability Distributions
Poisson Distribution
Approach the Binomial Distribution
Domain {0,1,2,3,...}
i l
l
Probability mass function p( X  i)  pi  e
i!
Mean: l
Variance: l
© 2003 Ernesto Costa
l=6
Poisson distribution
0,1
Series1
0,05
0
1
2
3
4
5
6
7
Values
8
9
10
11
12
Probability
0,15
l=8,4
Poisson Distribution
0,2
Probability
DEI/CISUC
l=np
0,16
0,14
0,12
0,1
0,08
0,06
0,04
0,02
0
Series1
1
2
3
4
5
6
7
8
9
10
11
12
Values
Evonet Summer School - Parma
18
Random Variables and Probability Distributions
Continuous Probability Distributions
Normal (Gaussian)
Distribution
( x )2

1
f ( x) 
e
 2
2
N (3, 2)
0.25
0.2
2
0.15
0.1
0.05
DEI/CISUC
-4
-2
2
Standard
Normal
Distribution
© 2003 Ernesto Costa
4
6
8
10
N (0,1)
0.4
0.3
0.2
0.1
-3
-2
-1
1
2
3
Evonet Summer School - Parma
19
Random Variables and Probability Distributions
Continuous Probability Distributions
Converting a normal distribution to a standard normal
distribution
X a random Variable with
Mean 
Standard Deviation σ
Using a translation
DEI/CISUC
Defining a new Random variable
© 2003 Ernesto Costa
Z
X 

Evonet Summer School - Parma
20
Random Variables and Probability Distributions
Continuous Probability Distributions
Student’s t-Distribution
Approximates the standard
normal distribution N(0,1)
  
f ( x) 


 1    x2 
 B( , )
2 2
DEI/CISUC
Degrees of freedom (df),
Mean 0, >1
Variance /(2), >2 -3
N(0,1)
=10
 1
1
© 2003 Ernesto Costa
0.4
0.3
2
0.2
=5
=1
0.1
-2
-1
1
2
Evonet Summer School - Parma
3
21
Background
Statistics
Goal: to apply probability theory to data analysis
How?
Model the data (population) by mean of a probability
distribution
Use a sample of the data instead of the all population
DEI/CISUC
Estimate the population parameters (, σ, p) using correspondent
sample statistics (x, s, p̂ )
population
parameters
© 2003 Ernesto Costa
sample

x
σ
s
p
p̂
statistics
Evonet Summer School - Parma
22
Background
Statistics
Unbiased estimator
A statistics with mean value equal to the population
parameter being estimated
DEI/CISUC
Point Estimators
Interval Estimators
© 2003 Ernesto Costa
Evonet Summer School - Parma
23
Background
Sample distribution of the sample mean and the Central Limit Theorem
Consider a population with mean  and standard

deviation σ. Let X denote the mean of the observations
in random samples of size n. Then:
DEI/CISUC
 x  E (X )  

x 
n
When the population distribution is normal, the sampling
distribution of X is also normal for any sample size n
(Central Limit Theorem) When n is sufficient large (n>30) the
sampling distribution is well aproximated by a normal curve, even
if the population distribution is not itself normal
© 2003 Ernesto Costa
Evonet Summer School - Parma
24
Background
Sample distribution of the sample mean
Unbiased estimators
Mean
 x  E (X )  
Standard Deviation
DEI/CISUC
ˆ x  s 
© 2003 Ernesto Costa
2
(
x

x
)
 i
i
n 1
(n-1) are the degrees of freedom (df)
Evonet Summer School - Parma
25
Background
Sample distribution of the sample mean and the Central Limit Theorem
Consequence
For a large sample or population whose distribution is
normal:
X 
Z
x
x
DEI/CISUC
has (approximately) a standard normal (Z) distribution.
© 2003 Ernesto Costa
Evonet Summer School - Parma
26
Background
Confidence Intervals – one sample
Estimate the mean 
The population standard deviation, σ, is known;
The sample mean from a random sample,
X
The sample size is large (>30)
is known,
The one sample Z confidence interval is
DEI/CISUC
  
x  Z critical _ value 

n


Example: for an 95% confidence interval Z=1.96.
© 2003 Ernesto Costa
Evonet Summer School - Parma
27
Background
Confidence Intervals – one sample
Example: we want a confidence level of 90%
Look into a N(0,1)
For a CL of 90%, we have to isolate the area of 5% to the left
and to the right of the bell shaped normal distribution.
The confidence interval will be given by
DEI/CISUC
  
x  Z 0.1 

n
2 
Looking in a table for the value of Z we obtain Z=1.65
© 2003 Ernesto Costa
Evonet Summer School - Parma
28
Background
Confidence Intervals – one sample
What does it means
interval of 95%?
having a confidence
DEI/CISUC
That there is a probability of 95% that the true mean
(population) is in the interval? NO!!
Mean that 95% of all possible samples
result in an interval that includes the true
mean!
© 2003 Ernesto Costa
Evonet Summer School - Parma
29
Background
Confidence Intervals – one sample
Estimate the mean 
The population standard deviation, is NOT known;
The sample mean from a random sample,
is known,
X
The sample size is large (>30) OR the population distribution is normal
The one sample t confidence interval is
 s 
x  tcritical _ value 

n


DEI/CISUC
where the t critical value is based on (n-1) degrees of freedom (df).
Example: for an 95% confidence interval and 19 df t=2.09.
The Student T Distribution can be used for small samples
assuming that the population distribution is approximately normal
© 2003 Ernesto Costa
Evonet Summer School - Parma
30
Background
DEI/CISUC
Hypothesis Testing – one sample
A hypothesis is a claim about the value of one
or more population characteristics.
A test procedure is a method for using sample
data to decide between to competing claims
about population characteristics. (= 100 or 
100)
Method by contradiction: we assume a
particular hypothesis. Using the sample data we
try to find out if there is convincing evidence to
reject this hypothesis in favor of a competing
one
© 2003 Ernesto Costa
Evonet Summer School - Parma
31
Background
DEI/CISUC
Hypothesis Testing – one sample
The null hypothesis, H0, is a claim about a
population characteristic that is initially assumed
to be true.
Ha is the alternative hypothesis or competing
claim.
Testing H0 versus Ha can lead to the conclusion
the H0 must be rejected or we fail to reject H0. I
that last case we cannot say that H0 is
accepted!
© 2003 Ernesto Costa
Evonet Summer School - Parma
32
Background
Hypothesis Testing – one sample
Errors
Type I error
Rejecting H0 when H0 is true
The probability of a type I error, , is called Level of
Significance of the test.
DEI/CISUC
Type II error
Failing to reject H0 when H0 is false
The probability of a Type II error is denoted by .
There is a tradeoff between  and : making
type I error very small increase the probability of
type II error.
© 2003 Ernesto Costa
Evonet Summer School - Parma
33
Background
DEI/CISUC
Hypothesis Testing – one sample
Test Statistic (Z,t): function of the sample data
on which a decision about reject or fail to reject
H0 is based;
p-value (observed significance level): is the
probability, assuming that H0 is true, of obtaining
a test statistics at least as inconsistent with H0
as what actually resulted.
Decision about H0: comparing the p-value with
the chosen .
Reject H0 if p-value 
© 2003 Ernesto Costa
Evonet Summer School - Parma
34
Background
Hypothesis Testing – one sample
DEI/CISUC
Hypothesis Testing – principles
What is the population parameter (mean,…)
State the H0 and Ha
Define the significance level 
The assumptions for the test are reasonable (big
sample,…)
Calculate the test statistic (Z,…)
Calculate the associated p-value
State the conclusion (reject if p-value  ,…)
© 2003 Ernesto Costa
Evonet Summer School - Parma
35
Background
Hypothesis Testing – one sample
DEI/CISUC
Example
Population parameter the mean, 
H0: =100, Ha: 100
Significance level =0.01
n=40 is large
From the sample: x =105,3, σ=8.4
105,3  100
z
 3.99
8.4
40
From the z-curve we know that the p-value 0
Therefore the null hypothesis, H0, is rejected with a significance
level of 0.01.
© 2003 Ernesto Costa
Evonet Summer School - Parma
36
Background
Comparing Two Populations based on independent samples
Use the sample distribution of the difference of the sample means:
x1  x2
Properties
The mean of the difference is equal to the difference of the means
 x  x  1  2
1
2
The variance of the difference is equal to the sum of the individuals
variances. Thus, the standard deviation:
DEI/CISUC
 x x 
1
2
 
2
2
1
2
n1
n2
The sampling distribution of the difference of the sample means, can
be considered approximately normal (each n large, each sample
mean come from a population (approximately) normal
© 2003 Ernesto Costa
Evonet Summer School - Parma
37
Background
Confidence interval for the mean of
 x  x  1  2
1
2
Assumptions
The two samples are independently random samples
Sample sizes are both large (n >30) OR the population
distributions are (approximately) normal.
Formulas
DEI/CISUC
x1  x2  tcritical _ value
df 
(V1  V2 )
V
2
1

2
2
1
2
n1
n2
s s
2
V
2
2
where V1 
s
2
1
n1
V2 
s
2
2
n2
n1  1 n2  1
© 2003 Ernesto Costa
Evonet Summer School - Parma
38
Background
Hypothesis Test
Same procedure, only the formulas are different!
Z Test
Large samples OR
Population distributions are (at least approximately) normal
DEI/CISUC
z
© 2003 Ernesto Costa
x1  x2  ( 1  2 )
 
2
2
1
2
n1
n2
Evonet Summer School - Parma
39
Background
Hypothesis Test
t test
Large samples OR
Population distributions normal AND the random samples are
independent
t
x1  x2  ( 1  2 )
2
2
1
2
n1
n2
DEI/CISUC
s s
© 2003 Ernesto Costa
df 
(V1  V2 )
V
2
1

2
V
2
2
where V1 
s
2
1
n1
V2 
s
2
2
n2
n1  1 n2  1
Evonet Summer School - Parma
40
Applying the Theory
The Busy Beaver Problem
Two algorithms
A standard GA
A standard GA + local learning (Baldwin Effect)
Goal: good quality machines
DEI/CISUC
Who is better? Comparing the means!
H0:1= 2 (no improvement!!!), Ha: 1≠ 2
Confidence level,  =0.01
Assuming that the population distributions are normal
Number of (independent) runs = 30 for each case
Use t test
© 2003 Ernesto Costa
Evonet Summer School - Parma
41
Applying the Theory
The Busy Beaver Problem
From the samples (# good machines)
sga=0.1
be=0.23
Sga2=0.093
Sbe2=0.185
DEI/CISUC
From the formulas
df=53
t=1.35
p-value2*0.1=0.2
Conclusion
With =0.01and p-value =0.2, the null hypothesis H0 cannot be
rejected
© 2003 Ernesto Costa
Evonet Summer School - Parma
42
Applying the Theory
Function Optimization
Two different GAs applied to function optimization
A standard GA using a 2 point CXover
A modified GA using transformation
Goal: find the minimum
The Schwefel Function
DEI/CISUC
Minimum = 0
1500
500
1000
500
250
0
-500
0
-250
-250
0
250
500
© 2003 Ernesto Costa
-500
Evonet Summer School - Parma
43
Applying the Theory
Function Optimization
DEI/CISUC
Who is better? Two point Crossover or
Transformation?
Comparing the means of the best fit!
H0:1= 2 (no improvement!!!), Ha: 1≠ 2
Confidence level,  =0.05
Assuming the population distributions are normal
Number of (independent) runs = 30 for each case
Use t test
© 2003 Ernesto Costa
Evonet Summer School - Parma
44
Applying the Theory
Function Optimization
From the samples (fitness of the best individuals)
sga=5.4838
tr=0.0768
Sga2=149.788
Str2=0.02958
DEI/CISUC
From the formulas
df=29
t=2.42
p-value2*0.012=0.024
Conclusion
With =0.05 and p-value =0.024, the null hypothesis H0 is
rejected.
© 2003 Ernesto Costa
Evonet Summer School - Parma
45
Conclusions
This is a very simple presentation
Assuming Normal distributions
There are many others
In many situations we cannot assume a normal distribution
DEI/CISUC
Many things left unmentioned
More than two populations
Analysis of Variance (ANOVA)
Regression and Correlation
Non parametric methods
© 2003 Ernesto Costa
Evonet Summer School - Parma
46
DEI/CISUC
Want to know more?
Paul Cohen, Empirical Methods for Artificial
Intelligence. MIT Press, Boston, 1995
James Kennedy and Russell Eberhart, Swarm
Intelligence (Appendix A),Morgan Kaufman,
2001.
Roxy Peck, Chris Olsen and Jay Devore,
Introduction
to
Statistics
and
Data
Analysis,Duxbury, 2001.
Mark Wineberg and Steffen Christensen, Using
Appropriate Statistics, GECCO’2003 Tutorial.
© 2003 Ernesto Costa
Evonet Summer School - Parma
47