Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Outline for Remainder of Semester
Chapter 5: Joint Distributions/Random Samples
5.2: Covariance and Correlation (3/30)
5.3: Statistics and their Distributions (4/3)
5.4: Distribution of Sample Mean and Central Limit Theorem (CLT) (4/5)
Chapter 7: Statistical Intervals
7.1: Basic Properties of Confidence Intervals (CIs) (4/6)
7.2: Large-Sample CIs (4/10)
7.3: CIs and Prediction Intervals (PIs) for Normals (4/12 and 4/13)
Chapter 12:Simple Linear Regression and Correlation
12.1: Simple Linear Regression (4/17)
Review for Exam 3 (4/19)
Exam 3 (4/20)
12.1: More Linear Regression (4/24)
12.2: Estimating Model Parameters (4/25)
Review for Final Exam (4/28)
1
Sections 5.2 and
5.3
3/30/17
2
Covariance
Figure 5.4 illustrates the different possibilities. The
covariance depends on both the set of possible pairs and
the probabilities. In Figure 5.4, the probabilities could be
changed without altering the set of possible pairs, and this
could drastically change the value of Cov(X, Y).
p(x, y) = 1/10 for each of ten pairs corresponding to indicated points:
(a) positive covariance;
(b) negative covariance;
Figure 5.4
(c) covariance near zero
3
Correlation
Definition
4
Correlation
The following proposition shows that  remedies the defect
of Cov(X, Y) and also suggests how to recognize the
existence of a strong (linear) relationship.
5
Correlation
If we think of p(x, y) or f(x, y) as prescribing a mathematical
model for how the two numerical variables X and Y are
distributed in some population (height and weight, verbal
SAT score and quantitative SAT score, etc.), then  is a
population characteristic or parameter that measures how
strongly X and Y are related in the population.
In Chapter 12, we will consider taking a sample of pairs (x1,
y1), . . . , (xn, yn) from the population.
The sample correlation coefficient r will then be defined and
used to make inferences about .
6
Correlation
The correlation coefficient  is actually not a completely
general measure of the strength of a relationship.
Proposition
7
Correlation
This proposition says that  is a measure of the degree of
linear relationship between X and Y, and only when the
two variables are perfectly related in a linear manner will
 be as positive or negative as it can be.
However, if | p | << 1, there may still be a strong
relationship between the two variables, just one that is not
linear.
And even if | p | is close to 1, it may be that the relationship
is really nonlinear but can be well approximated by a
straight line.
8
Example 5.18
Let X and Y be discrete rv’s with joint pmf
The points that receive positive
probability mass are identified
on the (x, y) coordinate system
in Figure 5.5.
The population of pairs for Example 18
Figure 5.5
9
Example 5.18
cont’d
It is evident from the figure that the value of X is completely
determined by the value of Y and vice versa, so the two
variables are completely dependent. However, by
symmetry X = Y = 0 and
E(XY)
=0
The covariance is then Cov(X,Y) = E(XY) – X  Y = 0 and
thus X,Y = 0. Although there is perfect dependence, there
is also complete absence of any linear relationship!
10
Correlation
A value of 𝜌 near 1 does not necessarily imply that
increasing the value of X causes Y to increase. It implies
only that large X values are associated with large Y values.
For example, in the population of children, vocabulary size
and number of cavities are quite positively correlated, but it
is certainly not true that cavities cause vocabulary to grow.
Instead, the values of both these variables tend to increase
as the value of age, a third variable, increases. For children
of a fixed age, there is probably a low correlation between
number of cavities and vocabulary size.
In summary, association (a high correlation) is not the
same as causation.
11
Statistics and Their Distributions
Consider selecting two different samples of size n from the
same population distribution.
The xi’s in the second sample will virtually always differ at
least a bit from those in the first sample. For example, a
first sample of n = 3 cars of a particular type might result in
fuel efficiencies x1 = 30.7, x2 = 29.4, x3 = 31.1, whereas a
second sample may give x1 = 28.8, x2 = 30.0, and
x3 = 32.5.
Before we obtain data, there is uncertainty about the value
of each xi.
12
Statistics and Their Distributions
Because of this uncertainty, before the data becomes
available we view each observation as a random variable
and denote the sample by X1, X2, . . . , Xn (uppercase
letters for random variables).
This variation in observed values in turn implies that the
value of any function of the sample observations—such as
the sample mean, sample standard deviation, or sample
fourth spread—also varies from sample to sample. That is,
prior to obtaining x1, . . . , xn, there is uncertainty as to the
value of , the value of s, and so on.
13
Statistics and Their Distributions
Definition
14
Statistics and Their Distributions
Thus the sample mean, regarded as a statistic (before a
sample has been selected or an experiment carried out), is
denoted by ; the calculated value of this statistic is .
Similarly, S represents the sample standard deviation
thought of as a statistic, and its computed value is s.
If samples of two different types of bricks are selected and
the individual compressive strengths are denoted by
X1, . . . , Xm and Y1, . . . , Yn, respectively, then the statistic
, the difference between the two sample mean
compressive strengths, is often of great interest.
15
Statistics and Their Distributions
Any statistic, being a random variable, has a probability
distribution. In particular, the sample mean has a
probability distribution.
Suppose, for example, that n = 2 components are randomly
selected and the number of breakdowns while under
warranty is determined for each one.
Possible values for the sample mean number of
breakdowns are 0 (if X1 = X2 = 0), .5 (if either X1 = 0 and
X2 = 1 or X1 = 1 and X2 = 0), 1, 1.5, . . ..
16
Statistics and Their Distributions
The probability distribution of a statistic is sometimes
referred to as its sampling distribution to emphasize that
it describes how the statistic varies in value across all
samples that might be selected.
17
Random Samples
18
Random Samples
The probability distribution of any particular statistic
depends not only on the population distribution (normal,
uniform, etc.) and the sample size n but also on the method
of sampling.
Consider selecting a sample of size n = 2 from a population
consisting of just the three values 1, 5, and 10, and
suppose that the statistic of interest is the sample variance.
If sampling is done “with replacement,” then S2 = 0 will
result if X1 = X2.
19
Random Samples
However, S2 cannot equal 0 if sampling is “without
replacement.” So P(S2 = 0) = 0 for one sampling method,
and this probability is positive for the other method.
Our next definition describes a sampling method often
encountered (at least approximately) in practice.
20
Random Samples
Definition
21
Random Samples
Conditions 1 and 2 can be paraphrased by saying that the
Xi’s are independent and identically distributed (iid).
If sampling is either with replacement or from an infinite
(conceptual) population, Conditions 1 and 2 are satisfied
exactly.
These conditions will be approximately satisfied if sampling
is without replacement, yet the sample size n is much
smaller than the population size N.
22
Random Samples
In practice, if n/N  .05 (at most 5% of the population is
sampled), we can proceed as if the Xi’s form a random
sample.
The virtue of this sampling method is that the probability
distribution of any statistic can be more easily obtained
than for any other sampling method.
There are two general methods for obtaining information
about a statistic’s sampling distribution. One method
involves calculations based on probability rules, and the
other involves carrying out a simulation experiment.
23
Deriving a Sampling Distribution
24
Deriving a Sampling Distribution
Probability rules can be used to obtain the distribution of a
statistic provided that it is a “fairly simple” function of the
Xi’s and either there are relatively few different X values in
the population or else the population distribution has a
“nice” form.
Our next example illustrate such situation.
25
Example 5.21
A certain brand of MP3 player comes in three
configurations: a model with 2 GB of memory, costing $80,
a 4 GB model priced at $100, and an 8 GB version with a
price tag of $120.
If 20% of all purchasers choose the 2 GB model, 30%
choose the 4 GB model, and 50% choose the 8 GB model,
then the probability distribution of the cost X of a single
randomly selected MP3 player purchase is given by
with  = 106,  2 = 244
(5.2)
26
Example 5.21
cont’d
Suppose on a particular day only two MP3 players are sold.
Let X1 = the revenue from the first sale and X2 the revenue
from the second.
Suppose that X1 and X2 are independent, each with the
probability distribution shown in (5.2) [so that X1 and X2
constitute a random sample from the distribution (5.2)].
27
Example 5.21
cont’d
Table 5.2 lists possible (x1, x2) pairs, the probability of each
[computed using (5.2) and the assumption of
independence], and the resulting and s2 values. [Note
that when n = 2, s2(x1 – )2(x2 – )2.]
Outcomes, Probabilities, and Values of x and s2 for Example 20
Table 5.2
28
Example 5.21
cont’d
Now to obtain the probability distribution of , the sample
average revenue per sale, we must consider each possible
value and compute its probability. For example, = 100
occurs three times in the table with probabilities .10, .09,
and .10, so
Px (100) = P(
= 100) = .10 + .09 + .10 = .29
Similarly,
pS2(800) = P(S2 = 800) = P(X1 = 80, X2 = 120 or X1 = 120,
X2 = 80)
= .10 + .10 = .20
29
Example 5.21
The complete sampling distributions of
(5.3) and (5.4).
cont’d
and S2 appear in
(5.3)
(5.4)
30
Example 5.21
cont’d
Figure 5.8 pictures a probability histogram for both the
original distribution (5.2) and the distribution (5.3). The
figure suggests first that the mean (expected value) of the
distribution is equal to the mean 106 of the original
distribution, since both histograms appear to be centered at
the same place.
Probability histograms for the underlying distribution and x distribution in Example 20
Figure 5.8
31
Example 5.21
cont’d
From (5.3),
= (80)(.04) + . . . + (120)(.25) = 106 = 
Second, it appears that the distribution has smaller
spread (variability) than the original distribution, since
probability mass has moved in toward the mean. Again
from (5.3),
= (802)(.04) +    + (1202)(.25) – (106)2
32
Example 5.21
cont’d
The variance of is precisely half that of the original
variance (because n = 2). Using (5.4), the mean value of
S2 is
S2 = E(S2) =  S2  pS2(s2)
= (0)(.38) + (200)(.42) + (800)(.20) + 244 =  2
That is, the sampling distribution is centered at the
population mean , and the S2 sampling distribution is
centered at the population variance  2.
33
Example 5.21
cont’d
If there had been four purchases on the day of interest, the
sample average revenue would be based on a random
sample of four Xi’s, each having the distribution (5.2).
More calculation eventually yields the pmf of
for n = 4 as
34
Example 5.21
cont’d
From this, x = 106 =  and
= 61 =  2/4. Figure 5.8 is a
probability histogram of this pmf.
Probability histogram for
based on n = 4 in Example 20
Figure 5.9
35
Example 5.21
cont’d
Example 5.21 should suggest first of all that the
computation of
and
can be tedious.
If the original distribution (5.2) had allowed for more than
three possible values, then even for n = 2 the computations
would have been more involved.
The example should also suggest, however, that there are
some general relationships between E( ), V( ), E(S2),
and the mean  and variance  2 of the original distribution.
36
Simulation Experiments
37
Simulation Experiments
The second method of obtaining information about a
statistic’s sampling distribution is to perform a simulation
experiment.
This method is usually used when a derivation via
probability rules is too difficult or complicated to be carried
out. Such an experiment is virtually always done with the
aid of a computer.
38
Simulation Experiments
The following characteristics of an experiment must be
specified:
1. The statistic of interest (
mean, etc.)
, S, a particular trimmed
2. The population distribution (normal with  = 100 and
 = 15, uniform with lower limit A = 5 and upper limit
B = 10,etc.)
3. The sample size n (e.g., n = 10 or n = 50)
4. The number of replications k (number of samples to be
obtained)
39
Simulation Experiments
Then use appropriate software to obtain k different random
samples, each of size n, from the designated population
distribution.
For each sample, calculate the value of the statistic and
construct a histogram of the k values. This histogram gives
the approximate sampling distribution of the statistic.
The larger the value of k, the better the approximation will
tend to be (the actual sampling distribution emerges as
k  ). In practice, k = 500 or 1000 is usually sufficient if
the statistic is “fairly simple.”
40
Simulation Experiments
The final aspect of the histograms to note is their spread
relative to one another.
The larger the value of n, the more concentrated is the
sampling distribution about the mean value. This is why the
histograms for n = 20 and n = 30 are based on narrower
class intervals than those for the two smaller sample sizes.
For the larger sample sizes, most of the values are quite
close to 8.25. This is the effect of averaging. When n is
small, a single unusual x value can result in an value far
from the center.
41
Simulation Experiments
With a larger sample size, any unusual x values, when
averaged in with the other sample values, still tend to yield
an value close to .
Combining these insights yields a result that should appeal
to your intuition:
based on a large n tends to be closer to  than does
based on a small n.
42