* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 16 - Rice University
Survey
Document related concepts
Transcript
Statistics : Statistical Inference
Krishna.V.Palem
Kenneth and Audrey Kennedy Professor of Computing
Department of Computer Science, Rice University
1
Sampling distribution of X
Population
and
Sample 1
x1
2
Sample 2
x2
Sample 3
x3
Sample 4
x3
Sampling Distribution
……
……
Sample k
xk
Central Limit Theorem
(4) The mean of the sampling distribution of X is equal to the
population mean, i.e.
X
(5) Standard deviation of the sampling distribution of X is the
population standard deviation divided by the square root of
sample size, i.e.
X
3
n
Sampling distribution of X for a Normal
population)
N=1: X 1.41, SD 0.145
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
N=5: X 1.40, SD 0.065
1.8
N=10: X 1.40, SD 0.047
1.02 1.11 1.2 1.29 1.38 1.47 1.56 1.65 1.74 1.83
4
1.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
N=50: X 1.40, SD 0.020
1 1.05 1.13 1.2 1.27 1.351.43 1.5 1.57 1.65 1.73 1.8 1.87
Sampling dist. of X for a non-Normal population
N=1:
1
1.1
1.2
N=50:
5
1
1.1
1.2
N=5:X
X = 1.40, SD = 0.147
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
1.4
1.5
1.6
1.7
1.8
1.9
1.1
1.2
1.3
X
X = 1.41, SD = 0.021
1.3
1
N=100:
2
= 1.40, SD = 0.066
1.4
1.5
1.6
1.7
1.8
1.9
2
= 1.41, SD = 0.015
1 1.06 1.151.24 1.331.42 1.5 1.58 1.671.76 1.851.942
Computer simulation of the sampling
distribution of the sample mean
Pick any probability distribution and specify a mean and standard
deviation.
Tell the computer to randomly generate 1000 observations from
that probability distributions
E.g., the computer is more likely to spit out values with high probabilities
Plot the “observed” values in a histogram.
Next, tell the computer to randomly generate 1000 averages-of-2
(randomly pick 2 and take their average) from that probability
distribution.
Plot “observed” averages in histograms.
Repeat for averages-of-10, and averages-of-100.
6
Uniform Distribution on [0,1]: average
of 1 sample (original distribution)
7
Uniform Distribution: 1000 averages
of 2 samples
8
Uniform Distribution: 1000 averages
of 5 samples
9
Uniform Distribution: 1000 averages of
100 samples
10
Exponential Distribution: 1000
averages of 2 samples
11
Exponential Distribution: average of
1 sample (original distribution)
12
Exponential Distribution: 1000
averages of 5 samples
13
Exponential Distribution: 1000 averages
of 100 samples
14
Contents
Summary of Statistics Learnt so Far
Statistical Inference
Central Limit Theorem and its implications
Estimation theory
Interval Estimation
What is Confidence Interval?
Tutorial
15
Estimation Theory
In statistics, estimation refers to the process by which one makes
inferences about a population, based on information obtained
from a sample.
Statisticians use sample statistics to estimate population
parameters.
For example, sample means are used to estimate population means; sample
proportions, to estimate population proportions.
16
Two types of Estimates
Point estimate. A point estimate of a population parameter is a
single value of a statistic.
For example, the sample mean x is a point estimate of the population mean
μ.
When we estimate the mean (μ) by x, the probability that we are
exactly correct is close to zero, i.e. P(x= μ) ~ 0
Assuming, the population is heterogeneous and the sample size n <<
population size N
Hence, we are not very “confident” about our estimates we make
using point estimates
17
Two Types of Estimates (contd.)
How can we be more confident about our estimates?
we want P(x = μ) to be a bigger value than zero
We can increase our confidence levels by using a less than
precise estimates instead of point estimates
estimate in an interval instead of point
Interval estimate. An interval estimate is defined by two
numbers, between which a population parameter is said to lie.
For example, a < x < b is an interval estimate of the population mean μ. It
indicates that the population mean is greater than a but less than b.
18
Contents
Summary of Statistics Learnt so Far
Statistical Inference
Central Limit Theorem and its implications
Estimation theory
Interval Estimation
What is Confidence Interval?
Tutorial
19
History of Interval Estimation
Neyman (1937) identified interval estimation ("estimation by
interval") as distinct from point estimation ("estimation by
unique estimate").
he was the first to recognize and formulate interval estimation
work quoting results in the form of an estimate plus-or-minus a
standard deviation was the interval estimation
his paper on this was titled "On the Two Different Aspects of the
Representative Method: The Method of Stratified Sampling and the
Method of Purposive Selection"
given at the Royal Statistical Society on 19 June 1934
20
You can download the paper from :
http://stevereads.com/papers_to_read/on_the_two_different_aspects_of_the_representative_method.pdf
What is an Interval Estimate?
In statistics, interval estimation is the use of sample data to
calculate an interval of possible (or probable) values of an
unknown population parameter
in contrast to point estimation, which is a single number.
Interval estimate. An interval estimate is defined by two
numbers, between which a population parameter is said to
lie.
for example, a < μ < b is an interval estimate of the population mean μ.
indicates that the population mean is greater than a but less than b.
we use x to estimate this interval
21
Interval estimates provide
a "best estimate" of a parameter
an indication of the precision with which the parameter is known.
Types of Interval Estimation
The most prevalent forms of interval estimation are:
confidence intervals
a frequentist method
credible intervals
a Bayesian method
Other common approaches to interval estimation, which are
encompassed by statistical theory, are:
Tolerance intervals
Prediction intervals
used mainly in Regression Analysis
22
Of these, confidence intervals is the most common and widely used
and hence, will be covered in more detail in this class
Contents
Summary of Statistics Learnt so Far
Statistical Inference
Central Limit Theorem and its implications
Estimation theory
Interval Estimation
What is Confidence Interval?
Tutorial
23
What is a Confidence Interval?
In statistics, a confidence interval (CI) is an interval estimate
of a population parameter.
instead of estimating the parameter by a single value, an interval
likely to include the parameter is given.
confidence intervals are used to indicate the reliability of an
estimate.
How likely the interval is to contain the parameter is
determined by the confidence level
increasing the desired confidence level will widen the confidence
interval.
Confidence intervals and interval estimates more generally have
applications across the whole range of quantitative studies.
24
Example of Confidence Interval
For example, a confidence interval can be used to describe how
reliable some opinion survey results are.
In a survey of election voting-intentions, the result might be that
40% of respondents intend to vote for a certain party.
A 95% confidence level for the proportion in the whole population
having the same intention on the survey date might be in the confidence
interval 36% to 44%.
From the same survey date one may calculate a smaller 90% confidence
level for the proportion in the whole population of for instance in
confidence interval 38% to 42%.
All other things being equal, a survey result with a small confidence
interval with a higher confidence level is more desired
25
Video on Confidence Interval
26
Example
In the whole of Houston, what percentage of adults do you think
will want to watch a movie sometime in the next 10 days?
assume a variance of 0.0625 for the whole population
Choose a random sample of 10 adults and ask their opinion
Will this be anywhere close to the actual percentage?
Let X be the random variable denoting the percentage of adults
attending the movies out of the sample.
Xi be the value from ith sample
How can we be sure to be closer to the actual mean?
27
Take very large
number of samples
Example (contd.)
But, taking large number of samples is generally not feasible.
We want to arrive at an estimate based on fewer samples.
For example, in the previous example, if you take only 1
sample of 10 people and found that 5 of the 10 people would
like to go for a movie, then you can say
We are pretty sure that 50% of the adult population would
want to go for a movie in the next 10 days.
Isn’t this ambiguous? How sure is pretty sure?
28
Need to be more
definitive
Example (contd.)
We use confidence interval to remove the ambiguity
What if we want to be 100% sure?
The only statement we can make which is 100% sure is that the
0%-100% of the adult population would want to watch a movie in
the next 10 days.
What if we want to be 50% sure?
This statement doesn’t hold much importance as you are wrong
half the time
Then, what kind of statements make sense?
90% sure or 95% sure or 98% sure or 99% sure
29
Confidence Levels
Calculating Confidence Level
The general norm is to vary the interval by multiples of σ
and compute the confidence level
σ is varied equally on the either side of the mean
The probability that μ is correct by the interval [x- σ,x+ σ]
can be calculated as
P( [ x , x ]) P( x x )
P( [ x , x ]) P( x )
Assuming Normal distribution, we get
P([ x , x ]) 0.6852
What if we increase the interval from 2σ to 4σ?
P([ x 2 , x 2]) 0.9544
30
Source for calculations: http://www.analyzemath.com/statistics/normal_calculator.html
Confidence Level Table
Some of the most commonly used confidence levels in
statistics are given in the table below:
Confidence Level
Number of σs away from mean
90%
1.64
95%
1.96
98%
2.33
99%
2.575
Less than 90% is generally not considered a strong
enough confidence level to make a statement
31
Example (Contd.)
Let us continue with computing the confidence interval
for our movie example
Assume that we took a random sample of 10 adults.
Among them, 5 adults said that they would like to go for
the movie in the next 10 days
Hence, we get, mean (x)= 0.5 (denotes 50% )
and standard deviation = 0.0625 0.1581 (Var(x) = σ /n )
2
10
Say, we want to be 95% confident about our estimation.
32
Example (Contd.)
From the table we can see that we have to be 1.96σ away
from the mean.
Hence, we need to be 1.96*0.1581 = 0.31 away from the
mean
Summarizing, we can now say with 95% confidence that the
mean of the actual population will be between [0.5-0.31,
0.5+0.31] = [0.19,0.81] which is between 19%-81% of
total population
What if you want to be 98% confident?
33
Graphical Representation of
Confidence Intervals
Example
A plot of a normal distribution (or bell curve).
Each colored band has a width of one standard deviation.
34
Confidence Interval for when is known
A 95% confidence interval for if is known is given by:
x 1.96
Overlay Plot
n
95% of the x ‘s lie between 1.96
0.4
Normal Density
0.3
0.2
0.1
95%
0
35
-3
1.96
n
-2
-1
0
X
1
1.96
n
2
3
X
n
Rationale for Confidence Interval
From the sampling distribution of X conclude that and
are within 1.96 standard errors ( ) of each other 95% of
n
the time
Otherwise stated, 95% of the intervals contain
So, the interval x can be taken as an interval that typically
would include
x 1.96
36
n
Example
A random sample of 80 tablets had an average potency of
15mg. Assume is known to be 4mg.
x =15, =4, n=80
A 95% confidence interval for is
15 1.96
4
80
= (14.12 , 15.88)
38