Download Continuous random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
The mean and the std. dev. of the sample mean
• Select a SRS of size n from a population and measure a
variable X on each individual in the sample.
• The data consists of observations on n r.v’s X1,X2…,Xn.
• If the population is large we can consider X1,X2…,Xn to be
independent.
• The sample mean of a SRS of size n is X .
X 1  X 2  ...  X n
X 
n
• If the population has mean  and std dev. , what is the:
 mean of the total T = X1+X2+···+Xn ?
Answer: μT = μ X1+X2+···+Xn = n·μ
week9
1
 Mean of the sample mean X ?
X  1
n
 X 1  X 2  X n 

 Variance of the total T ?
 T2   2X  X
1
2

n

2  X n 
 Variance of the sample mean X ?

1
2
   n 
n
n
2
 X2   12
n
 X 1  X 2  X n 
week9
2
2
Sampling distribution of a sample mean
• If a population has the N(,) distribution, then the sample
mean X of n independent observations has the N(, / n )
• Example
A bottling company uses a filling machine to fill plastic bottles
with a popular cola. The bottles are supposed to contain 300
milliliters (ml). In fact, the contents vary according to a normal
distribution with mean 298 ml and standard deviation 3 ml.
(a) What is the probability that an individual bottle contains less
than 295 ml?
(b) What is the probability that the mean contents of the bottles in
a six-pack is less than 295ml?
.
week9
3
The central limit theorem
• Draw a SRS of size n from a population with mean  and std
dev. . When n is large, sampling distribution of a sample mean
X is approximately normal with mean  and std dev.  / n .
• Note: The normal approximation for the sample proportion and
counts is an important example of the central limit theorem.
• Note: The total T = X1+X2+···+Xn is approximately normal
n
with mean n and stdev.
·.
week9
4
Example (Question 24 Final Dec 98)
Suppose that the weights of airline passengers are known to
have a distribution with a mean of 75kg and a std. dev. of
10kg. A certain plane has a passenger weight capacity of
7700kg. What is the probability that a flight of 100 passengers
will exceed the capacity?
week9
5
Example
In a certain University, the course STA100 has tutorials of size 40. The
course STA200 has tutorials of size 25, and the course STA300 has tutorials
of size 15. Each course has 5 tutorials per year. Students are enrolled by
computer one by one into tutorials. Assume that each student being
enrolled by computer may be considered a random selection from a very
big group of people wherein there is a 50-50 male to female sex ratio.
Which of the following statements is true?
A) Over the years STA100 will have more tutorials with 2/3 females (or more).
B) Over the years STA200 will have more tutorials with 2/3 females (or more).
C) Over the years STA300 will have more tutorials with 2/3 females (or more).
D) Over the years, each course will have about the same number of tutorials
with 2/3 females (or more).
E) No course will have tutorials with 2/3 females (or more).
week9
6
Question
State whether the following statements are true or false.
(i) As the sample size increases, the mean of the sampling
distribution of the sample mean X decreases.
(ii) As the sample size increases, the standard deviation of the
sampling distribution of the sample mean X decreases.
(iii) The mean X of a random sample of size 4 from a negatively
skewed distribution is approximately normally distributed.
(iv) The distribution of the proportion of successes X in a
sufficiently large sample is approximately normal with mean p
and standard deviation np1  p where p is the population
proportion and n is the sample size.
(v) If X is the mean of a simple random sample of size 9 from
N(500, 18) distribution, then X has a normal distribution with
mean 500 and variance 36.
week9
7
Question
State whether the following statements are true or false.
o A large sample from a skewed population will have an
approximately normal shaped histogram.
o The mean of a population will be normally distributed if the
population is quite large.
o The average blood cholesterol level recorded in a SRS of 100
students from a large population will be approximately
normally distributed.
o The proportion of people with incomes over $200 000, in a
SRS of 10 people, selected from all Canadian income tax filers
will be approximately normal.
week9
8
Exercise
A parking lot is patrolled twice a day (morning and afternoon).
In the morning, the chance that any particular spot has an
illegally parked car is 0.02. If the spot contained a car that was
ticketed in the morning, the probability the spot is also ticketed
in the afternoon is 0.1. If the spot was not ticketed in the
morning, there is a 0.005 chance the spot is ticketed in the
afternoon.
a) Suppose tickets cost $10. What is the expected value of the
tickets for a single spot in the parking lot.
b) Suppose the lot contains 400 spots. What is the distribution of
the value of the tickets for a day?
c) What is the probability that more than $200 worth of tickets
are written in a day?
week9
9
Exercises
1.
Z ~ N(0, 1). Find P (-1.96 < Z < 1.96).
2.
Z ~ N(0, 1). Find the value of c such that P(-c < Z < c) = 0.95.
3.
Z ~ N(0, 1). Find the value of c such that P(-c < Z < c) = 0.90.
4.
X ~ N(500, 15). Find the values of c and d such that
P(c < X < d ) = 0.95.
week9
10
5.
X~N(, ). Find the values of c and d (in terms of , and )
such that P(c < X < d ) = 0.95
6.
X~N(, ). Find the values of c and d (in terms of , and )
such that P(c < X < d ) = 0.90
7.
X~N(500, 15). Let X be the mean of a random sample of
size 9. Find the values of c and d such that
P( c < X < d ) = 0.95
8. X~ N(, ) Let X be the mean of a random sample of size n
Find the values of c and d such that P( c < X < d ) = 0.95
week9
11
Point Estimates and CI
• A basic tool in statistical inference is point estimate of the
population parameter. However, an estimate without an
indication of it’s variability is of little value.
• Example:
Parameter Estimate
μ
σ2
X
S2
p
p̂
Std. Error
• A level C confidence interval for a parameter is an interval
computed from sample data by a method that has probability C
of producing an interval containing the true value of the
parameter.
week9
12
Confidence interval for the population mean
• Choose a SRS of size n from a population having unknown
mean  and known stdev. . A level C confidence interval for 
is an interval of the form,
x  z* 
n

 



,xz 
x  z 

n
n

• Here z* is the value on the standard normal curve with area C
between  z* and z* . The interval is exact when the population
distribution is normal and approximately correct for large n in
other cases.
• In general CIs have the form: Estimate  margin of error
• In the above case,
Margin of error = m = z* 
n
week9
13
• Note, in the above formula for the CI for the population mean,

n is the stdev. of the sample mean X (this is also known as
the std. error of the sample mean X ) and it can also be written
as
x  z*Std.Error( X )
• The width of any CI is L = 2m i.e. twice the margin of error.
• Here are three ways to reduce the margin of error (and the
width of the CI)
 Use a lower level of confidence (smaller C)
 Increase the sample size n.
 Reduce  (usually not possible).
week9
14
Sample size for desired margin of error
• The CI for population mean will have a specified margin of
error m when the sample size is





z*
n m





2
• Example:
A limnologist wishes to estimate the mean phosphate content
per unit volume of lake water. It is known from previous
studies that the stdev. has a fairly stable value of 4mg. How
many water samples must the limnologist analyze to be 90%
certain that the error of estimation does not exceed 0.8 mg?
week9
15
Example
• You want to rent an unfurnished one-bedroom apartment for
next semester. The mean monthly rent for a random sample of
10 apartments advertised in the local newspaper is $580.
Assume that the stdev. is $90. Find a 95% CI for the mean
monthly rent for unfurnished one-bedroom apartments
available for rent in this community.
• How large a sample of one-bedroom apartments would be
needed to estimate the mean µ within ±$20 with 90%
confidence?
week9
16
Exercise
• Data on the Degree of Reading Power (DRP) scores for 44 students
are recorded. Suppose that the SD of the population of DRP scores
is know to be σ =11. 95% CI for the population mean score is given
in the MINITAB output below.
DRP Scores
40
26
39
47
19
26
52
25
35
47
35
48
14
35
35
22
42
34
33
33
18
15
29
41
Z Confidence Intervals
The assumed sigma = 11.0
Variable
N
Mean
StDev
DRP Scor
44 35.09
11.19
25
44
34
51
43
40
41
27
SE Mean
1.66
46
38
49
14
27
31
28
54
19
46
52
45
95.0 % CI
(31.84 , 38.34)
• MINITAB Command
Stat > Basic Statistics > 1 Sample Z and select ‘Confidence interval’
week9
17
Exercise
a)
b)
c)
d)
e)
A random sample of 85 students in Chicago city high schools
taking a course designed to improve SAT scores. Based on
these students a 90% CI for the mean improvement in SAT
scores for all Chicago high school students is computed as
(72.3, 91.4) points.
Which of the following statements are true?
90% of the students in the sample improved their scores by
between 72.3 and 91.4 points.
90% of the students in the population improved their scores
by between 72.3 and 91.4 points.
95% CI will contain the value 72.3.
The margin of error of the 90% CI above is 9.55.
90% CI based on a sample of 340 ( 85 X 4) students will have
margin of error 9.55/4.
week9
18
Statistical Tests
• A significance test is a formal procedure for comparing
observed data with a hypothesis whose truth we want to
assess. The hypothesis is a statement about the parameters in a
population or model.
• Null hypothesis
The statement being tested in a test of significance is called the
null hypothesis. The test of significance is designed to assess
the strength of the evidence against the null hypothesis.
Usually the null hypothesis is a statement of “no effect” or “no
difference”.
• We abbreviate “null hypothesis” as H0 .
week9
19
Example
Each of the following situations requires a significance test about a
population mean . State the appropriate null hypothesis H0 and alternative
hypothesis Ha in each case.
(a) The mean area of the several thousand apartments in a new development is
advertised to be 1250 square feet. A tenant group thinks that the apartments
are smaller than advertised. They hire an engineer to measure a sample of
apartments to test their suspicion.
(b) Larry's car consume on average 32 miles per gallon on the highway. He
now switches to a new motor oil that is advertised as increasing gas
mileage. After driving 3000 highway miles with the new oil, he wants to
determine if his gas mileage actually has increased.
(c) The diameter of a spindle in a small motor is supposed to be 5 millimeters.
If the spindle is either too small or too large, the motor will not perform
properly. The manufacturer measures the diameter in a sample of motors to
determine whether the mean diameter has moved away from the target.
week9
20
Test Statistic
• The test is based on a statistic that estimate the parameter that
appears in the hypotheses. Usually this is the same estimate we
would use in a confidence interval for the parameter. When H0
is true, we expect the estimate to take a value near the
parameter value specified in H0.
• Values of the estimate far from the parameter value specified
by H0 give evidence against H0. The alternative hypothesis
determines which directions count against H0.
• A test statistic measures compatibility between the null
hypothesis and the data.
• We use it for the probability calculation that we need for our
test of significance
• It is a random variable with a distribution that we know.
week9
21
Example
• An air freight company wishes to test whether or not the mean
weight of parcels shipped on a particular root exceeds 10
pounds. A random sample of 49 shipping orders was examined
and found to have average weight of 11 pounds. Assume that the
stdev. of the weights () is 2.8 pounds.
• The null and alternative hypotheses in this problem are:
H0: μ = 10 ;
Ha: μ > 10 .
• The test statistic for this problem is the standardized version of X
Z  X 
/ n
• Decision: ?
week9
22
P-value and Significance level
• The probability computed under the assumption that H0 is true,
that the test statistic would take a value as extreme or more
extreme than that actually observed is called the P-value of the
test. The smaller the P-value the stronger the evidence against H0
provided by the data.
• The decisive value of the P is called the significance level. It is
denoted by .
• Statistical significance
If the P-value is as small or smaller than , we reject H0 and say
that the data are statistically significant at level .
• The P-value is the smallest level α at which the data are
significant.
week9
23
Z Test for a population mean ( known)
• To test the hypothesis H0: µ = µ0 based on a SRS of size n
from a population with unknown mean µ and known stdev σ,
compute the test statistic
x
z
0

n
• In terms of a standard Normal variable Z, the P-value for the
test of H0 against
Ha : µ > µ0 is P( Z ≥ z )
Ha : µ < µ0 is
P( Z ≤ z )
Ha : µ ≠ µ0 is 2·P( Z ≥ |z|)
• These P-values are exact if the population distribution is
normal and are approximately correct for large n in other
cases.
week9
24
Critical value approach
• We can base our test conclusions on a fixed level of significant
α without computing the P-value.
• For this we need to find a critical value z* from the standard
normal distribution with a specified tail area (to the right or
left depending on Ha). This tail area is called the rejection
region.
• If the test statistic falls in the rejection region we reject H0 and
conclude that the data are statistically significant at level .
• A P-value is more informative then a reject-or-not finding at a
fixed significance level because it can tell us about the strength
of evidence we found against the H0.
week9
25
Example
• The Pfft Light Bulb Company claims that the mean life of its 2
watt bulbs is 1300 hours. Suspecting that the claim is too
high, Nalph Rader gathered a random sample of 64 bulbs and
tested each. He found the average life to be 1295 hours. Test
the company's claim using  = 0.01. Assume  = 20 hours.
week9
26
Exercise
• A standard intelligence examination has been given for several
years with an average score of 80 and a standard deviation of
7. If 25 students taught with special emphasis on reading skill,
obtain a mean grade of 83 on the examination, is there reason
to believe that the special emphasis changes the result on the
test? Use  = 0.05.
week9
27
Exercise
• Data on the Degree of Reading Power (DRP) scores for 44 students in a
suburban school district (same data as on slide 17). Suppose that the
SD of scores in this school district is known to be σ =11. The
researcher believes that the mean score μ of all the students in this
district is higher than the national mean which is 32. The MINITAB
output for the test is given below.
Z-Test
Test of mu = 32.00 vs mu > 32.00
The assumed sigma = 11.0
Variable
N
Mean
StDev
SE Mean
DRP Scor 44
35.09
11.19
1.66
Z
1.86
P
0.031
• MINITAB Command
Stat > Basic Statistics > 1 Sample Z and select ‘Test mean’
week9
28
Confidence Intervals and two-sided tests
• A level  two-sided significance test rejects a hypothesis
H0: μ = μ0 exactly when the value μ0 falls outside the 1- α
confidence interval for .
• Example
For the exercise on slide 27 a 95% CI is
83 ± 1.96·(7/5) = (80.256, 85.744)
The value 80 is not in this interval and so we reject H0:  = 80
at the 5% level of significance.
week9
29