Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Lecture - 8 Final Review Decisions and Conclusions P-value method: Using the significance level α If P-value ≤ α, reject H0 If P-value > α, fail to reject H0 Traditional method: If the test statistic falls within the critical region, reject H0 If the test statistics does not fall within the critical region, fail to reject H0. Confidence intervals: If a confidence interval does not include a claimed value of a population parameter, reject that claim Q.1) Find the critical z values. Assume that the normal distribution applies. Two-tailed test; α = 0.04 Solution: Since the problem is two tailed test, α = 0.02. Now, look at the z- values corresponding to the 𝟐 probability 0.02 and 1 - 0.02 = 0.98. The probability close to 0.02 is 0.0202 and 0.0197 and the corresponding z-values are -2.05 and -2.06 respectively. Since 0.02 is in between 0.0202 and 0.0197 we take the average of those two z- values. Hence the left critical z- value is −𝟐.𝟎𝟓+(−𝟐.𝟎𝟔)=−𝟐.𝟎𝟓𝟓 . 𝟐 Since the normal distribution is symmetric, the right critical zvalue is 2.055. Hence z = -2.055, 2.055 Q.2) Find the critical z-values. Assume that the normal distribution applies. α = 0.005; H1 is p < 0.17. Solution: It is left tailed test. Look at the z- value that corresponds to the probability 0.005. z = -2.575 (look at previous example) If we need to round to two decimal place we write z = -2.58 If it were the right tailed test (where H1 is p > 0.17), we would have z = 2.58 If it were two tailed test (where H1 is p ≠ 0.17), we would look at the z value corresponding to the probability 0.005/2 = 0.0025. Hence, z = -2.81, 2.81 Q.3) Use the information below to find the P-value. Also use a 0.05 significance level and state the conclusion about the null hypothesis. The test statistic in the left- tailed test is z= -1.07 Solution: Here probability corresponding to z = -1.07 is 0.1423. Hence the P-value = 0.1423. (P-value is the area to the left of the test statistics ) Since the P- value is greater than the level of significance, 0.05, we fail to reject the null hypothesis. If level of significance were 0.15, we would reject the null hypothesis. If the test statistic in the right tailed test were z = 2.50, the P-value would be (1-0.9938) = 0.0062 < 0.05. Then we would reject the null hypothesis. Q.4) A manual states that in order to be a hit, a song must be no longer than 190 seconds. A SRS of 40 current hit songs results in a mean length of 445 sec. Assume the population s.d. of song lengths is 54.5 sec. Use a 0.03 significance level to test the claim that the sample is from a population of songs with a mean greater than 190 sec. What do these result suggest about the advice given in the manual? Solution: H0: μ = 190 sec H1: μ > 190 sec Critical value of z is z = 1.88 For the value of Test statistic, z = 𝑥−µ σ = 6.38 𝑛 Since the test statistic value falls within the critical region, we reject H0. This result suggest that that the advice given in the manual is not sound. Q.5) People have died in boat accidents because an obsolete estimate of the mean weight of men was used. Using the weights of the simple random sample of men from Data Set 1 in Appendix B, we obtain these sample statistics: n 40 and x 172.55 lb, and 26.33lb . Do not assume that the value of is known. Use these results to test the claim that men have a mean weight greater than 166.3 lb, which was the weight in the National Transportation and Safety Board’s recommendation M-04-04. Use a 0.05 significance level, and the traditional method. Solution: Requirements are satisfied: simple random sample, population standard deviation is not known, sample size is 40 (n 30) Step 1: Express claim as 166.3lb Step 2: alternative to claim is 166.3lb Step 3: 166.3lb does not contain equality, it is the alternative hypothesis: H 0 : 166.3lb null hypothesis H1 : 166.3lb alternative hypothesis and original claim Step 4: significance level is 0.05 Step 5: claim is about the population mean, so the relevant statistic is the sample mean, 172.55 lb Step 6: calculate t x x 172.55 166.3 t 1.501 s 26.33 n 40 df = n – 1 = 39, area of 0.05, one-tail yields t = 1.685; Step 7: t = 1.501 does not fall in the critical region bounded by t = 1.685, we fail to reject the null hypothesis. = 166.3 or z=0 x 172.55 Critical value t = 1.685 or t = 1.52 Because we fail to reject the null hypothesis, we conclude that there is not sufficient evidence to support a conclusion that the population mean is greater than 166.3 lb, as in the National Transportation and Safety Board’s recommendation. Q.6) Find the critical value Zα/2 that corresponds to a 98% confidence level Solution: Here σ = 0.02. Looking at the z-value corresponding to the probability 0.01, we get - 2.33. Hence Zα/2 = 2.33 Q.7) Express the confidence interval 0.4 < p < 0.62 in the form of 𝒑 ± 𝑬. Solution: Here 𝒑 = (0.4+0.62)/2 = 0.51 and E = (0.62 – 0.4)/2 = 0.11 Hence the required answer is 0.51 ±𝟎. 𝟏𝟏 Q.8) In a Pew Research Center poll of 1501 randomly selected U.S. adults showed that 70% of the respondents believe in global ˆ 0.70 warming. The sample results are n = 1501, and p a. Find the margin of error E that corresponds to a 95% confidence level. b. Find the 95% confidence interval estimate of the population proportion p. c. Based on the results, can we safely conclude that the majority of adults believe in global warming? d. Assuming that you are a newspaper reporter, write a brief statement that accurately describes the results and includes all of the relevant information. Solution: Requirement check: simple random sample; fixed number of trials, 1501; trials are independent; two categories of outcomes (believes or does not); probability remains constant. Note: number of successes and failures are both at least 5. a) Use the formula to find the margin of error. 0.70 0.30 ˆˆ pq E z 2 1.96 n 1501 E 0.023183 b) The 95% confidence interval: pˆ E p pˆ E 0.70 0.023183 p 0.70 0.023183 0.677 p 0.723 c) Based on the confidence interval obtained in part (b), it does appear that the proportion of adults who believe in global warming is greater than 0.5 (or 50%), so we can safely conclude that the majority of adults believe in global warming. Because the limits of 0.677 and 0.723 are likely to contain the true population proportion, it appears that the population proportion is a value greater than 0.5. d) Here is one statement that summarizes the results: 70% of United States adults believe that the earth is getting warmer. That percentage is based on a Pew Research Center poll of 1501 randomly selected adults in the United States. In theory, in 95% of such polls, the percentage should differ by no more than 2.3 percentage points in either direction from the percentage that would be found by interviewing all adults in the United States. Q.9) During a certain week the mean price of gasoline in California was $1.164 per gallon. What is the probability that the mean price for the sample of 38 gas stations in California is between $1.169 and $1.179? Assume the standard deviation = $0.049. Since n > 30 the sampling distribution of will be normal mean standard deviation Calculate the standard z-score for sample values of $1.169 and $1.179. P( 0.63 < z < 1.90) = 0.9713 – 0.7357 = 0.2356 z .63 1.90 The probability is 0.2356 that the mean for the sample is between $1.169 and $1.179. Multiplication Rule P(A and B) = P(A) • P(B|A) Note that if A and B are independent events, P(B|A) is really the same as P(B). P(A and B) = P(event A occurs in a first trial and event B occurs in a second trial) P(A and B) = P(A) • P(B); if the events are independent. Question 10: While rolling a die twice what is the probability of getting the event ‘3’ in the 1st trial and ‘4’ in the 2nd trial? Answer: P(3 and 4) = P(3) X P(4) = 1/6 X 1/6 =1/36. Caution: When applying the multiplication rule, always consider whether the events are independent or dependent, and adjust the calculations accordingly. Q 11) A recent survey asked 100 people if they thought women in the armed forces should be permitted to participate in combat. The results of the survey are shown below. Find the probability that the respondent answered yes (Y), given that the respondent was a female (F). P(Y|F) = 8/50 = 0.16 P(N ∩ 𝑴) = 𝟏𝟖/𝟏𝟎𝟎 = 𝟎. 𝟏𝟖 P(F U Y) = (50 + 40 - 8)/100 = 0.82 Q.12) A health research shows that there is roughly 80% chance that a person of age 20 years will be alive at age of 65 years. Suppose that 3 people of age 20 years are selected at random. Find the probability that the number alive at age 65 years will be: (a) Exactly two (b) at most one (c) al least one (d) Determine the probability distribution of the number of alive at age 65. Solution: Here , Probability of Success, p=80% = 0.80, q = 0.20, n = 3 (# of trials) (a) P(X=2) = 3!/(2!).(1!) (0.8)^2.(0.2)^1 = 0.384 (b) P(X<=1) = P(X=0) + P(X=1) = 0.104 (c) P(X>=1) = 1 – P(X=0) = 1-0.008 = 0.992 (d) P(X=3) = 1.(0.8)^3.(0.2)^0 = 0.512 # of alive at age 65 (X) Probability [ P(X)] 0 0.008 1 0.096 2 0.384 3 0.512 Boxplot A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3. Boxplot of Movie Budget Amounts Parameter Statistic Discrete Data Discrete data result when the number of possible values is either a finite number or a ‘countable’ number (i.e. the number of possible values is 0, 1, 2, 3, . . .) Example: The number of eggs that a hen lays, Test score, shoe size, age, world ranking, number of brothers etc. The number of eggs that a hen lays is discrete quantitative measure because it is numeric but can only be a whole number Continuous Data Continuous (numerical) data • • result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps Example: Height, weight, length, amounts of milk from cows etc. Height is continuous quantitative measure because it can take any numerical value in a particular range. The amount of milk that a cow produces; e.g. 2.343115 gallons per day. Nominal Level Nominal level of measurement is characterized by data that consist of names, labels, or categories only, and the data cannot be arranged in an ordering scheme (such as low to high) • Examples: Survey responses yes, no, undecided Political Party: The political party affiliation of survey respondents (Democrat, Republican, Independent, other) Ordinal Level Ordinal level of measurement • involves data that can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless • Example: Course grades A, B, C, D, or F Universities rank in USA (like 1st, 2nd, 3rd, 4th,…) Interval Level Interval level of measurement is like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, data at this level do not have a natural zero starting point (where none of the quantity is present). • Example: Body temperatures of 96.2 F and 98.6 F (There is no natural starting point. The value of 0 F might seem like a starting point, but it is arbitrary and does not represent the total absence of heat.) Years: 1000, 2000, 1776, and 1492. (Time did not begin in the year 0, so the year 0 is arbitrary instead of being a natural zero starting point representing “no time.”) Ratio Level Ratio level of measurement Is the interval level with the additional property that there is also a natural zero starting point (where zero indicates that none of the quantity is present); for values at this level, differences and ratios are meaningful. • Example: Prices: Prices of college textbooks ($0 represents no cost, a $100 book costs twice as much as a $50 book.) Distances: Distances (in miles) travelled by cars (0 mile represents no distance travelled, and 60 miles is twice as far as 30 miles)