Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Module III – Hypothesis Testing Sampling distribution of the sample mean is the distribution of all possible sample ~ means ( x ) of a given sample size from a population. The larger the sample size, the smaller the sampling error tends to be in estimating a ~ population mean, , by a sample mean x . ~ For samples of size n, the mean of the variable x is denoted by ~ , and ~ = for x x each sample size. The population standard deviation is denoted by . ~ For samples of size n, the standard deviation of the variable x is denoted by ~ , and x ~ x Note: n for each sample size. ~ If the population variable is normally distributed then x is normally distributed regardless of the sample size. ~ If the sample size is large, then x is approximately normally distributed, regardless of the distribution of the population variable. Inferential Statistics 68.26-95.44-99.74 Rule: If the population variable is normally distributed, the 68.26-95.44-99.74 Rule states that 95.44% of all possible observations lie within 2 standard deviations to either side of the ~ mean. If we apply this rule to the variable x , 95.44% of all samples of size n have the mean within 2 ~ 2 of . Or, equivalently, 95.44% of all samples of size n have x n ~ ~ the property that the interval [ x 2 ~ , x 2 ~ ] may or may not contain . x x ~ ~ [ x 2 ~ , x 2 ~ ] is called the confidence interval and 95.44% is the confidence level x x that the interval may or may not contain Hypothesis Tests Terminology: A hypothesis is a statement that something is true Null hypothesis is a hypothesis to be tested Notation: ( H 0 : 0 ) Alternative hypothesis is a hypothesis to be considered as an alternative to the null hypothesis Notation: ( H a : 0 ) - two-tailed test ( H a : 0 ) - left-tailed test ( H a : 0 ) - right-tailed test Basic Logic behind carrying out the hypothesis test for a normally distributed population variable: ~ If a sample mean x is approximately equal to the population mean , we are inclined not to reject H 0 . ~ If a sample mean x differs too much from the population mean, we are inclined to reject H 0 and conclude that the alternative hypothesis is true. ~ Using the “95.44%” part of the 68.26-95.44-99.74 Rule, if a sample mean x is more than two standard deviations from the population mean , we reject the null hypothesis ( H 0 : 0 ) , and conclude the alternative hypothesis ( H a : 0 ) . Properties of Chi-square ( 2 ) curves The total area under the 2 - curve equals 1 A 2 - curve starts at 0 on the horizontal axis and extends to the right asymptotically to the horizontal axis. A 2 - curve is right-skewed As the number of degrees of freedom ( df n 1 , where n is the sample size) becomes larger, 2 - curves look increasingly like normal curves. df = 5 df = 10 df = 19 A variable is said to have a chi-square distribution if its distribution has a the shape of a chi-square curve Chi-square goodness of fit test This procedure can be used to perform a hypothesis test about the distribution of a qualitative variable or a discrete quantitative variable that has only finitely many possible values. Example: A violent crime is classified as murder, forcible rape, robbery, or aggravated assault. Distribution of violent crimes in the United States in 1995 Type of violent crime Murder Forcible rape Robbery Agg.. assault Relative frequency 0.012 0.054 0.323 0.611 1.000 Sample results for 500 randomly selected violent-crime reports from last year Type of violent crime Murder Forcible rape Robbery Agg.. assault frequency 9 26 144 321 500 Population – last years reported violent crimes Variable – type of violent crime Possible values of variable – murder, forcible rape, robbery, and aggravated assault. Null hypothesis to be tested: H 0 : Last year’s violent-crime distribution is the same as the 1995 distribution Alternative hypothesis: H a : Last year’s violent-crime distribution is different from the 1995 distribution Expected frequencies if last year’s violent-crime distribution is the same as the 1995 distribution: Expected frequency E = np, where n is the sample size and p is the relative frequency from the distribution of violent crimes in 1995. Type of violent crime Murder Forcible rape Robbery Agg.. assault Relative frequency (p) 0.012 0.054 0.323 0.611 Expected frequency (E = np) 500(0.012) = 6 500(0.054) = 27 500(0.323) = 161.5 500(0.611) = 305.5 Question: Do the frequencies observed last year match the expected frequencies? To answer this question, we perform the following steps: Determine whether the expected frequencies satisfy the assumptions below: 1. All expected frequencies are 1 or greater. (Yes) 2. At most 20% of the expected frequencies are less than 5. (none of the expected frequencies are less than 5) Decide the significance level, . We will choose to perform the test at the 5% significance level, or 0.05 . (TYPE I ERROR: Rejecting the null hypothesis when in fact it is true. The probability of making a Type I error is called the significance level, , of a hypothesis test) Compute the test statistic ( 2 = the sum of the chi-square subtotals) that measures how good the fit is. Type of Observed Expected Difference Chi-square 2 violent crime frequency frequency O-E subtotal (O – E) x O E (O – E) 2 /E Murder 9 6 3 9 1.5 Forcible rape 26 27 -1 1 0.037 Robbery 144 161.5 -17.5 306.25 1.896 Agg.. assault 321 305.5 15.5 240.25 0.786 500 500 0 4.219 From the table 2 = (O – E) 2 /E = 4.219 Find the critical value 2 with df = k – 1, where k is the the number of possible values of the variable “type of violent crime”. In our example k = 4, so df = 4 – 1 =3 and 02.05 7.815 from Table provided. Do not reject H 0 Reject H 0 7.815 The final step is to reject Null hypothesis if the value of the test statistic falls in the reject region , otherwise, do not. In our example, 2 = 4.219, which falls in the do not reject region. Interpretation: At the 5% significance level, the data do not provide sufficient evidence to conclude that last year’s violent-crime distribution differs from the 1995 distribution. Reference: Elementary Statistics by Neil Weiss, 5th /6th edition