Download Chapter III: Descriptive Statistics

Chapter VIII: Elements of Inferential Statistics Hypothesis Testing • Hypothesis testing is a form of statistical inference to reach conclusions about scientific problems. • Hypothesis testing is the mechanism that updates and tests scientific principles and refines these principles. This in turn is the basis of the advancement of scientific research. • The testing of sample validity is crucial to the application of inferential statistics, thus we must be able to test our samples. Classical/Traditional Methods • Classical Hypothesis Testing o Formal multi-step process that leads to a conclusive statement regarding the hypothesis: • Step 1: State Null and Alternate Hypothesis • Step 2: Select appropriate statistical test • Step 3: Select level of significance • Step 4: Delineate regions of rejection and non-rejection of null hypothesis • Step 5: Calculate test statistic • Step 6: Make decision regarding null and alternate hypotheses State Null and Alternate Hypothesis • State null (𝐻0 ) and alternate (𝐻𝐴 ) hypothesis. • The null is the established value: o 𝐻0 : 𝜇 = μ𝐻 or 𝐻0 : 𝜇 − μ𝐻 = 0 • This states that the mean value of the null hypothesis is equal to the actual mean. Thus, if you subtract the hypothesized mean from the mean then the value should equal zero. • The alternate hypothesis and the null are mutually exclusive. o 𝐻𝐴 comes in three variations, one nondirectional and two directional • 𝐻𝐴 ≠ 𝐻0 thus the alternate is any value but the null value. • 𝐻𝐴 < 𝐻0 or 𝐻𝐴 > 𝐻0 thus the alternate is chosen to establish itself as greater than or less than the null hypothesis depending on the experiment. Hypothesis Testing • Choosing direction o This graph represents a nondirectional test. The test is only interested if the alternate hypothesis does not equal the null hypothesis. o In the case of a < than or > than scenario, the curve would only be shaded on the side of the alternate hypothesis. Greater than the mean would shade the right tail, less than the left. Error in Hypothesis Testing • Error in hypothesis testing o In hypothesis testing we are using a sample to either retain or reject a null hypothesis. Due to it being from a sample, there is always going to be a probability of error in rejecting or not rejecting the null hypothesis. • Type I Error o A type I error occurs when the null hypothesis is rejected when it is actually true. The likelihood of this occurring is alpha - 𝛼 o This level is selected prior to the hypothesis test • Type II Error o A type II error occurs when the null hypothesis is not rejected but the null hypothesis is actually false. The likelihood of this error is beta - 𝛽 Selecting a Statistical Test • An appropriate statistical test must be chosen. In the chosen case a sample difference of means test is chosen. • However, many different tests exist. A list can be found at: • http://www.graphpad.com/www/Book/Choose.htm o A sample difference of means test is: • 𝑍 𝑜𝑟 𝑡 = • • • • • • 𝑥−𝜇 σ𝑥 or 𝑥−𝜇 σ/ 𝑛 Where Z or t is the test statistic ( Z if n > 30, t if n < 30) 𝑥 is the sample mean 𝜇 is the population mean σ𝑥 is the standard error of the mean 𝜎 is the population standard deviation n is the sample size Selecting the Level of Significance • The level of significance or alpha (𝛼) can be established for a hypothesis test. • The alpha is essentially the probability of error in your hypothesis test, thus you have control over the chance of a type I error. • To reduce the chance of error 𝛼 is often set very low as errors leading to false conclusions can seriously flaw research. • 𝛼 = .05 or .01 are common levels of significance used. Selecting Level of Significance Continued • Error is very important when setting the 𝛼 of your hypothesis test. o The acceptable error of your test must be established based on the needs of the study. o Common example in type I errors is a jury trial where an innocent person is found guilty. • In this case you would want to establish a very small margin of error if possible. Is 1 in 20 acceptable? Is 1 in 100? Delineate Regions of Rejection and Non-rejection of the Null Hypothesis • Once 𝛼 has been established we must determine where we want it to be under the curve. • Directional or Non-Directional tests are then revisited to establish where 𝛼 occurs. • A two-tailed format (nondirectional) must divide 𝛼 between the two tails of the curve thus 𝛼 = .05 represents .025 of each tail region. If 𝛼 = .01 then it represents .005 in each tail. • A one-tailed format (directional) establishes the entire 𝛼 value in one tail depending on whether we want a < or > value for the alternate hypothesis. Z-Scores in Hypothesis Testing • By converting the 𝛼 into a z-score we can establish the scores which result in the rejection of the null hypothesis. o In a two-tailed test 𝛼 = .025 leaves .95 of the area under the curve. o Using a table or calculator we can determine the range of .95 to be -1.96 to 1.96 as a z-score. • Common 𝛼 to z-scores are: o α = .05 = -1.96, 1.96 o α = .02 = -2.326, 2.326 o α = .01 = -2.576, 2.576 o http://people.richland.edu/james/lecture/m170/ch08-int.html Calculate the Test Statistic • Use: o 𝑍 𝑜𝑟 𝑡 = 𝑥−𝜇 σ𝑥 or 𝑥−𝜇 σ/ 𝑛 o Use the z-score (or t-score) of the test statistic to establish whether the statistic falls within the selected range of alpha. o If the score is within the range then the null hypothesis is retained. o If the score is not within the range then the null hypothesis is rejected. P-Value Hypothesis Testing • Most commonly used approach as classical approach has limitations. o In Classical testing: • Selection of alpha can lack theoretical basis • Binary rejection/retention of hypothesis is limited o P-Value Testing: • Establishes exact probability of getting test statistic essentially the probability of a type I error • Establishes a rejection region P-Value Testing • Four Steps: o Calculate Z-Score of test statistic as in classical approach o Calculate the probability of the value using the area under the normal curve from a table o Shade the rejection area by subtracting the probability from value. o Double the area established if using a nondirectional hypothesis o A step-by-step example can be found here: http://www.youtube.com/watch?v=uVvWcFrrvsI P-Value Testing Continues • P-Values allow more direct experimental conclusions: o Significance and Type I errors are calculated making the exact measurements available. o Easy to obtain multiple values over time or area and compare the values for analysis. o Still require significant analysis and rigorousness in studies to prevent invalid results or conclusions. Applications Using Small Samples • Values can be calculated similar to the Z-Test but using the t-test equations. o 𝑡= 𝑥−𝜇 𝑠 / 𝑛 −1 or 𝑡 = 𝑥−𝜇 σ/ 𝑛 o Sample sizes less than 30 have a similar distribution with the exception the there is a higher probability of values falling in the tails due to the increased uncertainty of the small sample size in finding the true value. o Hypothesis rejection is still based on the p-value and if p < .05 then there is most likely high level of certainty. One Sample Difference of Proportions Test • Z-Tests can be done comparing proportions as well. o 𝑝 ± 𝑍𝜎𝑝 = 𝑝 ± 𝑧 𝑝(1−𝑝) 𝑁 −𝑛 𝑛 −1 𝑁 o This can establish the proportion to be compared to the population. o Example of this methodology: http://www.youtube.com/watch?v=0jeDp03jymQ Issues in Inferential Testing and Test Selection • Defining Degrees of Freedom o In a sample size of n there are n degrees of freedom o When a parameter is estimated one degree of freedom is lost. • In the t-test sample we are estimating 𝜇 and thus one degree of freedom is lost. This is represented as n - 1 in the equation. o In two sample tests 𝜇1 and 𝜇2 are calculated so two degrees of freedom are lost (n – 2). o Degrees of Freedom is an important concept for many other distributions. Sampling Issues and Inferential Testing • Regardless of sampling method all samples must be drawn independently and separately. o Although there are a couple of exceptions not examined here. • Artificial and Natural Sampling o Artificial samples draw unbiased random samples from the population and infers characteristics based on the sample data. o Natural Samples draw from random events and are analyzed under the idea that natural events are random processes in themselves. Inferential processes should be applied with care when applied to natural samples. o The differences between these methods are important and debated in Geography. Inferential Test Selection • Choosing methods for geographic problems is quite complicated and requires careful analysis of the procedural methods. • Table 8.7 in the text (page 126-127) provides a set to assist in this process. • Parametric and NonParametric tests o Parametric Tests are when inferential tests require knowledge about population parameters and make assumptions about the underlying population distribution. o Non-parametric Tests do not require this underlying knowledge. Data and Tests • Parametric tests are run for data at: o An ordinal or nominal scale; in the case of these scales a parametric test is required. o At the intervallic/ratio scale different strategies can be used: • Run only a parametric test – when no doubt that assumptions and requirements to run test are met. • Run only a nonparametric test – when there is reason to believe that the assumptions or parameters are violated. The data must be downgraded to a nominal or ordinal scale and then one can run a parametric test. • Run both tests – If there is uncertainty in the amount the statistics are violated. P-tests can be run and the values compared to establish and investigate the analytical questions. The End • Finally, here is an applet designed to demonstrate classical hypothesis testing for further exploration: • http://wwwpersonal.umd.umich.edu/~pksmith/JavaStat/Critica lRegionsPValues.html

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter III: Descriptive Statistics