Download Lecture - 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Generalized linear model wikipedia, lookup

Birthday problem wikipedia, lookup

Pattern recognition wikipedia, lookup

Psychometrics wikipedia, lookup

Taylor's law wikipedia, lookup

Corecursion wikipedia, lookup

Confidence interval wikipedia, lookup

Transcript
Lecture - 8
Final Review
Decisions and Conclusions
P-value method:
Using the significance level α
If P-value ≤ α, reject H0
If P-value > α, fail to reject H0
Traditional method: If the test statistic falls within the critical region,
reject H0
If the test statistics does not fall within the
critical region, fail to reject H0.
Confidence intervals: If a confidence interval does not include
a claimed value of a population parameter,
reject that claim
Q.1) Find the critical z values. Assume that the normal distribution applies.
Two-tailed test; α = 0.04
Solution:
Since the problem is two tailed test,
α = 0.02. Now, look at the z- values corresponding to the
𝟐
probability 0.02 and 1 - 0.02 = 0.98.
The probability close to 0.02 is 0.0202 and 0.0197 and the
corresponding z-values are -2.05 and -2.06 respectively. Since
0.02 is in between 0.0202 and 0.0197 we take the average of
those two z- values. Hence the left critical z- value is
−𝟐.𝟎𝟓+(−𝟐.𝟎𝟔)=−𝟐.𝟎𝟓𝟓 .
𝟐
Since the normal distribution is symmetric, the right critical zvalue is 2.055.
Hence z = -2.055, 2.055
Q.2) Find the critical z-values. Assume that the normal distribution applies.
α = 0.005; H1 is p < 0.17.
Solution:
It is left tailed test. Look at the z- value that corresponds to the
probability 0.005.
z = -2.575 (look at previous example)
If we need to round to two decimal place we write
z = -2.58
 If it were the right tailed test (where H1 is p > 0.17), we would
have z = 2.58
 If it were two tailed test (where H1 is p ≠ 0.17), we would look at
the z value corresponding to the probability 0.005/2 = 0.0025.
Hence, z = -2.81, 2.81
Q.3) Use the information below to find the P-value. Also use a 0.05
significance level and state the conclusion about the null hypothesis.
The test statistic in the left- tailed test is z= -1.07
Solution:
Here probability corresponding to z = -1.07 is 0.1423.
Hence the P-value = 0.1423.
(P-value is the area to the left of the test statistics )
Since the P- value is greater than the level of significance, 0.05, we fail
to reject the null hypothesis.
 If level of significance were 0.15, we would reject the null hypothesis.
 If the test statistic in the right tailed test were z = 2.50, the P-value
would be (1-0.9938) = 0.0062 < 0.05. Then we would reject the null
hypothesis.
Q.4) A manual states that in order to be a hit, a song must be no longer than 190
seconds. A SRS of 40 current hit songs results in a mean length of 445 sec. Assume
the population s.d. of song lengths is 54.5 sec. Use a 0.03 significance level to test
the claim that the sample is from a population of songs with a mean greater than
190 sec. What do these result suggest about the advice given in the manual?
Solution:
H0: μ = 190 sec
H1: μ > 190 sec
Critical value of z is z = 1.88
For the value of Test statistic,
z = 𝑥−µ
σ = 6.38
𝑛
Since the test statistic value falls within the critical region, we reject H0.
This result suggest that that the advice given in the manual is not sound.
Q.5) People have died in boat accidents because
an obsolete estimate of the mean weight of men
was used. Using the weights of the simple random
sample of men from Data Set 1 in Appendix B, we
obtain these sample statistics: n  40 and
x  172.55 lb, and   26.33lb . Do not assume
that the value of  is known. Use these results to
test the claim that men have a mean weight
greater than 166.3 lb, which was the weight in the
National Transportation and Safety Board’s
recommendation M-04-04. Use a 0.05 significance
level, and the traditional method.
Solution:
Requirements are satisfied: simple random
sample, population standard deviation is not
known, sample size is 40 (n  30)
Step 1: Express claim as   166.3lb
Step 2: alternative to claim is   166.3lb
Step 3:   166.3lb does not contain equality,
it is the alternative hypothesis:
H 0 :   166.3lb null hypothesis
H1 :   166.3lb alternative hypothesis and
original claim
Step 4: significance level is   0.05
Step 5: claim is about the population mean,
so the relevant statistic is the sample
mean, 172.55 lb
Step 6: calculate t
x   x 172.55  166.3
t

 1.501
s
26.33
n
40
df = n – 1 = 39, area of 0.05, one-tail
yields t = 1.685;
Step 7:
t = 1.501 does not fall in the critical region bounded
by t = 1.685, we fail to reject the null hypothesis.
 = 166.3
or
z=0
x  172.55
Critical value t
= 1.685
or
t = 1.52
Because we fail to reject the null hypothesis, we conclude that
there is not sufficient evidence to support a conclusion that the
population mean is greater than 166.3 lb, as in the National
Transportation and Safety Board’s recommendation.
Q.6) Find the critical value Zα/2 that corresponds to a
98% confidence level
Solution:
Here σ = 0.02. Looking at the z-value corresponding to the
probability 0.01, we get - 2.33.
Hence Zα/2 = 2.33
Q.7) Express the confidence interval 0.4 < p < 0.62 in the form
of 𝒑 ± 𝑬.
Solution:
Here 𝒑 = (0.4+0.62)/2 = 0.51 and E = (0.62 – 0.4)/2 = 0.11
Hence the required answer is 0.51 ±𝟎. 𝟏𝟏
Q.8) In a Pew Research Center poll of 1501 randomly selected
U.S. adults showed that 70% of the respondents believe in global
ˆ  0.70
warming. The sample results are n = 1501, and p
a. Find the margin of error E that corresponds to a 95% confidence level.
b. Find the 95% confidence interval estimate of the population proportion p.
c. Based on the results, can we safely conclude that the majority of adults
believe in global warming?
d. Assuming that you are a newspaper reporter, write a brief statement that
accurately describes the results and includes all of the relevant information.
Solution:
Requirement check:
 simple random sample;
 fixed number of trials, 1501;
 trials are independent;
 two categories of outcomes (believes or does not);
 probability remains constant.
Note: number of successes and failures are both at least 5.
a) Use the formula to find the margin of error.
0.70  0.30 
ˆˆ

pq
E  z 2
 1.96
n
1501
E  0.023183
b) The 95% confidence interval:
pˆ  E  p  pˆ  E
0.70  0.023183  p  0.70  0.023183
0.677  p  0.723
c)
Based on the confidence interval obtained in part (b), it does appear that
the proportion of adults who believe in global warming is greater than 0.5
(or 50%), so we can safely conclude that the majority of adults believe in
global warming. Because the limits of 0.677 and 0.723 are likely to contain
the true population proportion, it appears that the population proportion is
a value greater than 0.5.
d)
Here is one statement that summarizes the results: 70% of United States
adults believe that the earth is getting warmer. That percentage is based on
a Pew Research Center poll of 1501 randomly selected adults in the United
States. In theory, in 95% of such polls, the percentage should differ by no
more than 2.3 percentage points in either direction from the percentage
that would be found by interviewing all adults in the United States.
Q.9) During a certain week the mean price of gasoline in
California was $1.164 per gallon. What is the probability that
the mean price for the sample of 38 gas stations in California is
between $1.169 and $1.179? Assume the standard deviation =
$0.049.
Since n > 30 the sampling distribution of
will be normal
mean
standard deviation
Calculate the standard z-score for sample values of $1.169 and $1.179.
P( 0.63 < z < 1.90)
= 0.9713 – 0.7357
= 0.2356
z
.63
1.90
The probability is 0.2356 that the mean for the sample is
between $1.169 and $1.179.
Multiplication Rule
 P(A and B) = P(A) • P(B|A)
 Note that if A and B are independent events, P(B|A)
is really the same as P(B).
P(A and B) = P(event A occurs in a first trial and event B occurs in
a second trial)
P(A and B) = P(A) • P(B); if the events are independent.
Question 10:
While rolling a die twice what is the probability of getting the event ‘3’ in the
1st trial and ‘4’ in the 2nd trial?
Answer:
P(3 and 4) = P(3) X P(4) = 1/6 X 1/6 =1/36.
Caution:
When applying the multiplication rule, always consider whether the
events are independent or dependent, and adjust the calculations
accordingly.
Q 11) A recent survey asked 100 people if they thought women
in the armed forces should be permitted to participate in
combat. The results of the survey are shown below.
Find the probability that the respondent answered yes (Y),
given that the respondent was a female (F).
P(Y|F) = 8/50 = 0.16
P(N ∩ 𝑴) = 𝟏𝟖/𝟏𝟎𝟎 = 𝟎. 𝟏𝟖
P(F U Y) = (50 + 40 - 8)/100 = 0.82
Q.12) A health research shows that there is roughly 80% chance that a person of
age 20 years will be alive at age of 65 years. Suppose that 3 people of age 20 years
are selected at random. Find the probability that the number alive at age 65 years
will be:
(a) Exactly two (b) at most one (c) al least one
(d) Determine the probability distribution of the number of alive at age 65.
Solution: Here , Probability of Success, p=80% = 0.80, q = 0.20, n = 3 (# of trials)
(a) P(X=2) = 3!/(2!).(1!) (0.8)^2.(0.2)^1 = 0.384
(b) P(X<=1) = P(X=0) + P(X=1) = 0.104
(c) P(X>=1) = 1 – P(X=0) = 1-0.008 = 0.992
(d) P(X=3) = 1.(0.8)^3.(0.2)^0 = 0.512
# of alive at age 65 (X)
Probability [ P(X)]
0
0.008
1
0.096
2
0.384
3
0.512
Boxplot
 A boxplot (or box-and-whisker-diagram) is a
graph of a data set that consists of a line
extending from the minimum value to the
maximum value, and a box with lines drawn at
the first quartile, Q1; the median; and the third
quartile, Q3.
Boxplot of Movie Budget Amounts
Parameter
Statistic
Discrete Data

Discrete data result when the number of
possible values is either a finite number or a
‘countable’ number (i.e. the number of possible
values is
0, 1, 2, 3, . . .)
 Example: The number of eggs that a hen lays, Test
score, shoe size, age, world ranking, number of
brothers etc.
 The number of eggs that a hen lays is discrete
quantitative measure because it is numeric but can
only be a whole number
Continuous Data
 Continuous (numerical) data
•

•


result from infinitely many possible values that
correspond to some continuous scale that covers a
range of values without gaps, interruptions, or jumps
Example: Height, weight, length, amounts of milk from cows
etc.
Height is continuous quantitative measure because it can
take any numerical value in a particular range.
The amount of milk that a cow produces; e.g. 2.343115
gallons per day.
Nominal Level
 Nominal level of measurement is characterized by data
that consist of names, labels, or categories only, and
the data cannot be arranged in an ordering scheme
(such as low to high)
•
Examples:
 Survey responses yes, no, undecided
 Political Party: The political party affiliation of survey
respondents (Democrat, Republican, Independent,
other)
Ordinal Level
Ordinal level of measurement
•
involves data that can be arranged in some order,
but differences (obtained by subtraction) between data
values either cannot be determined or are meaningless
•
Example:
 Course grades A, B, C, D, or F
 Universities rank in USA (like 1st, 2nd, 3rd, 4th,…)
Interval Level
 Interval level of measurement is like the ordinal level, with the
additional property that the difference between any two data
values is meaningful. However, data at this level do not have a
natural zero starting point (where none of the quantity is present).
•
Example:
 Body temperatures of 96.2 F and 98.6 F (There is no natural
starting point. The value of 0 F might seem like a starting point,
but it is arbitrary and does not represent the total absence of
heat.)
 Years: 1000, 2000, 1776, and 1492. (Time did not begin in the year
0, so the year 0 is arbitrary instead of being a natural zero starting
point representing “no time.”)
Ratio Level
Ratio level of measurement Is the interval level with the additional
property that there is also a natural zero starting point (where zero
indicates that none of the quantity is present); for values at this level,
differences and ratios are meaningful.
•
Example:
 Prices: Prices of college textbooks ($0 represents no cost, a $100 book
costs twice as much as a $50 book.)
 Distances: Distances (in miles) travelled by cars (0 mile represents no
distance travelled, and 60 miles is twice as far as 30 miles)