• Study Resource
• Explore

# Download Lecture 2, 10/19 - Department of Computer Science

Survey
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Empirical Research Methods in
Computer Science
Lecture 2, Part 1
October 19, 2005
Noah Smith
Some tips




Perl scripts can be named encode instead of
encode.pl
encode foo ≢ encode < foo
chmod u+x encode
Instead of making us run java Encode,
write a shell script:




#!/bin/sh
cd `dirname \$0`
java Encode
Check that it works on (say) ugrad10.
Assignment 1




If you didn’t turn in a first version
yesterday, don’t bother – just turn in
the final version.
Final version due Tuesday 10/25, 8pm
We will post a few exercises soon.
Questions?
Today




Standard error
Bootstrap for standard error
Confidence intervals
Hypothesis testing
Notation



P is a population
S = [s1, s2, ..., sn] is a sample from P
Let X = [x1, x2, ..., xn] be some
numerical measurement on the si


distributed over P according to unknown F
We may use Y, Z for other
measurements.
Mean

What does mean mean?

μx is population mean of x



(depends on F)
μx is in general unknown
How do we estimate the mean?

Sample mean
n
x
x
i1
n
i
Gzip compression rate
usually < 1, but not always
Gzip compression rate
Accuracy


How good an estimate is the sample
mean?
Standard error (se) of a statistic:



We picked one S from P.
How would x vary if we picked a lot of
samples from P?
There is some “true” se value!
Extreme cases

n→∞

n=1
Standard error (of the
sample mean)

Known:
x
se( x ) 
n

true standard deviation of x under F
“Standard error” = standard deviation
of a statistic
Gzip compression rate
Central Limit Theorem

The sampling distribution of the
sample mean approaches a normal
distribution as n increases.
2
x
 
x  N μ,
 n



How to estimate σx?

“Plug-in principle”
1 n
2
xi  x 

ˆ

n i1

Therefore:

ˆ
sex   
n
 xi  x 



n 
i1 
n
2
Plug-in principle

We don’t have (and can’t get) P


We do have S (the sample)


We don’t know F, the true distribution
over X
We do know F̂, the sample distribution
over X
Estimating a statistic: use F̂ for F
Good and Bad News


We have a formula to estimate the standard
error of the sample mean!
We have a formula to estimate only the
standard error of the sample mean!





variance
median
trimmed mean
ratio of means of x and y
correlation between x and y
Bootstrap world
unknown distribution F
observed random sample X
  s(X)
statistic of interest ˆ
empirical distribution
F̂
bootstrap random sample X*
bootstrap replication ˆ
*  s(X*)
statistics about the estimate (e.g., standard error)
Bootstrap sample


X = [3.0, 2.8, 3.7, 3.4, 3.5]
X* could be:




[2.8, 3.4, 3.7, 3.4, 3.5]
[3.5, 3.0, 3.4, 2.8, 3.7]
[3.5, 3.5, 3.4, 3.0, 2.8]
...
Draw n elements with replacement.
Reflection




Imagine doing this with a pencil and
paper.
The bootstrap was born in 1979.
Typically, sampling is costly and
computation is cheap.
In (empirical) CS, sampling isn’t even
necessarily all that costly.
Bootstrap estimate of se




Let s(·) be a function for computing an
estimate ̂
True value of the standard error: seF ̂
Ideal bootstrap estimate: seF̂ ˆ
 *  seF̂
Bootstrap estimate with B boostrap
samples: se ˆ
B  *  seB
 
 

Bootstrap estimate of se
limseB  seF̂
B
ˆ
ˆ


*
[
i
]

 *
ˆ
se  *  
B 1
B
B
i1
2
Bootstrap, intuitively



We don’t know F.
We would like lots of samples from P,
but we only have one (S).
We approximate F by F̂


Plug-in principle!
Easy to generate lots of “samples” from F̂
B = 25 (mean compression)
B = 50 (mean compression)
B = 200 (mean compression)
Correlation (another statistic)




Population P, sample S
Two values, xi and yi for each element
of the sample
Correlation coefficient: ρ
Sample correlation coefficient:
n
r
 x
i1
n
 x
i1
i
 x y i  y 
 x
2
i
n
 y
i1
 y
2
i
Example: gzip compression
r = 0.9616
Accuracy of r


No general closed form for se(r).
If we assume x and y are bivariate
Gaussian

1 r 
(r) 
senor mal
2
n 3
senormal
1
senormal
0.5
0
-0.5
-1
-0.5
r
0
0.5
1
100
90
80
70
60
50
40
30
20
n
10
Normality


Why assume the data are Gaussian?
Alternative: bootstrap estimate of the
standard error of r
seB r * 

r * [i]  r *

2
B
i1
B 1
Example: gzip compression
r = 0.9616
senormal(r) = 0.0024
se200(r) = 0.0298



Plot the data.
Runtime?
Efron and Tibshirani:



B = 25 is informative
B = 50 often enough
seldom need B > 200 (for se)
Summary so far



A statistic is a “true fact” about the
distribution F
We don’t know F
For some parameter θ, we want:




estimate “θ hat”
accuracy of that estimate (e.g., standard
error)
For the mean, μ, we have a closed form
For other θ, the bootstrap will help!
```
Related documents