Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence interval wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Regression toward the mean wikipedia , lookup
History of statistics wikipedia , lookup
Analysis of variance wikipedia , lookup
Statistical inference wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Introduction to Statistics Nyack High School Science Research Researchers often must determine if their data is statistically significant, or a result of a βfluke,β or measurement uncertainties. A researcher will often test a hypothesis on a sample of the population. sample: a group of people who participate in a study population: all the people who the study is meant to generalize. A frequency distribution is a table in which all scores are listed, along with the frequency with which each occurs. Hereβs an example of scores from an AP Physics test: Frequency Distribution rf Score frequency (relative frequency) 56 1 0.077 71 3 0.231 79 1 0.077 80 2 0.154 82 2 0.154 93 2 0.154 95 1 0.077 96 1 0.077 N= 13 1.000 Often, data is presented as a frequency distribution of intervals: AP Test Scores 6 frequency rf 5 56-64 1 0.077 65-73 3 0.231 74-82 5 0.385 83-91 0 0.000 92-100 4 0.308 Frequency Score Interval 4 3 2 1 0 N= 13 1.000 56-64 65-73 74-82 83-91 92-100 Test Score The bars in the graph are touching, indicating that the data is continuous. Hereβs an example of a frequency distribution for discreet data: Pet Preference Pet Preference dog frequency 7 6 6 cat 5 neither 3 Frequency 5 4 3 2 1 N= 14 0 dog cat Type of pet neither Population mean(µ) = the average of all the scores of the population: ππππ’πππ‘πππ ππππ = OR π= π π’π ππ π πππππ ππ’ππππ ππ π πππππ π π Sample mean (πΏ): π= π π Median = the middle score in a distribution organized from highest-to-lowest, or lowest-to-highest. Referring to the example above, the median score would be β80.β Mode = the score with the highest frequency. In the example above, the mode is β71.β Measures of Variation Range = highest score β lowest score Standard deviation for a population(Ο) is the average distance of all the scores in the distribution from the mean, or central point of the distribution. π= (π β π)2 π Where: X = individual score value What statistical tool do I use? Determining the relationship between 2 variables Comparing 2 sample means Correlation/ Regression analysis T-test Comparing a sample mean to a population mean Comparing more than 2 samples ANOVA (Analysis of Variance) Comparing observed categorical results to expected Chi-squared Z-test Standard Scores z-scores are a measure of how many standard deviation units the individual raw score falls from the mean. For an individual score, in comparison to a sample: πβπ π§= π For an individual score, in comparison to a population: πβπ π§= π Standard Scores AP Physics Exam Score (Score β mean) z-score 96 15.46 1.32 95 14.46 1.24 56 -24.54 -2.10 71 -9.54 -0.82 93 12.46 1.07 71 -9.54 -0.82 93 12.46 1.07 71 -9.54 -0.82 80 -0.54 -0.05 81 0.46 0.04 80 -0.54 -0.05 81 0.46 0.04 79 -1.54 -0.13 Mean 80.54 Std. Dev 11.68 Null and Alternative Hypotheses Null hypothesis (Ho): Whatever the research topic, the null hypothesis predicts that there is no difference between the groups being compared. (Which is typically what the research does not expect to find.) Ex: Say I want to find out if students who attend a review session score higher than those who do not. The null hypothesis would be that the mean score of the group who attended review session would be the same as the mean score of the group who did not attend a review session: ππππ£πππ€ π ππ π πππ = ππππππππ ππππ’πππ‘πππ The alternative hypothesis (Ha or H1) would be: ππππ£πππ€ π ππ π πππ > ππππππππ ππππ’πππ‘πππ Null and Alternative Hypotheses A one-tailed hypothesis is when the direction of difference is predicted. Ex: If I predict that students who attend review session will score higher. A two-tailed hypothesis is when differences are expected, but the researcher is unsure what they will be. Ex: If I predict that attending a review session will affect scores, but donβt know if the scores would be higher or lower. Null and Alternative Hypotheses We must determine if our data is βstatistically significant.β In other words, we must determine if the data actually supports our hypothesis, or if it just looks that way due to uncontrollable conditions. There are two types of errors: Type I error: If we reject the null hypothesis, and the null hypothesis is disproved (i.e., the data shows that there is a difference between the population and the sample), but the difference is due to a βflukeβ (experimental errors, good guesses, etc.) Type II error: If we accept the null hypothesis, and the population mean is equal to the sample mean, but this is due to a βfluke.β Determining Statistical Significance We can use either a z-test or a t-test to determine statistical significance. The test we use depends on our data. z-test: The z-test is used when the population variance is known. It allows the user to compare a sample to a population. The z-test uses the mean and the standard deviation of the sample to determine whether the sample mean is significantly different from the population mean. t-test: The t-test is used when the population variance is not known. Use the t-test when you have a small sample and you do not know Ο. Determining Statistical Significance Once you have performed a z-test or a t-test, you can plug that value into a program to get your p-value. The discussion of pvalues is later. But wait! There are a other types of t-tests!!! What if you are comparing two different samples, instead of comparing one sample to one population? Then, you must use a different algorithm. We will look at the two possibilities: Determining Statistical Significance t-test for Independent Groups/Samples Use this test when you are comparing two samples, representing two populations. You can compare the two groups in one of two ways: 1. One group is the control group, and one is the experimental group, or 2. Both groups are experimental, and there is no control. Determining Statistical Significance t-test for Correlated Groups/Samples Use this test when you are comparing the performance of participants in two groups, but the same people are used in each group, or different participants are matched between groups (i.e., you are working with pairs of scores for each participant.) This test is based on the difference score (D), which is the difference between the pairs of scores for each participant. Determining Statistical Significance t-test for Correlated Groups/Samples Ex: Eight participants are asked to listen to a single genre of music (hip hop), and then rate the severity of their nightmares on a 1-5 Likert scale (1=mild, 5 = severe.) They are then asked to repeat the process, this time listening to classical music. Nightmare Severity Participant Hip-Hop Classical D (difference score) 1 5 1 4 2 4 3 1 3 4 4 0 4 5 4 1 5 3 2 1 6 3 3 0 7 4 3 1 8 3 1 2 Total 10 Mean (π·) 1.25 Determining Statistical Significance Chi-Square Tests are nonparametric tests: They do not involve the mean or standard deviation of the population, for one thing. Determining Statistical Significance 1. Chi-Square (Ο2) Goodness-of-Fit Test Used for comparing categorical information (observed frequencies) against what we would expect based on previous knowledge (expected frequencies.) For example, say a study of students at Nyack High School samples 54 students, and finds 8 of the students (15% of the sample) are overweight or obese. Assume that nationwide, 30% of high school students have been found to be overweight or obese. Observed and Expected frequencies are shown: Frequencies Overweight/obese Not Overweight/obese Observed (O) 8 46 Expected (E) 16 38 Determining Statistical Significance 2. Chi-Square (Ο2) Test of Independence Whereas a Ο2 goodness-of-fit test compares how well an observed frequency distribution of one nominal variable fits some expected pattern of frequencies, a Ο2 test of independence compares how well two nominal variables fits some expected pattern of frequencies. Determining Statistical Significance For example, say a study of Nyack High School students looked at whether students who have already taken a health class exercise more than those who have not. We have two variables (taking a health class and exercising.) We find that of the 100 students who have taken a health class, 75 exercise regularly. In the group of students who have not yet taken health, 35 out of 80 exercise regularly. Data is shown in Table 8 below, where the numbers in parenthesis are the expected frequencies, based on the total students polled (180): Taken Health Class Yes No Row Totals (RT) exercisers 75 (61) 35 (49) 110 Non-exercisers 25 (39) 45 (31) 70 100 80 180 Column Totals (CT) P-values and Statistical Significance You can now use an on-line p-value calculator to find the pvalue. If youβre doing a z-test, simply insert the z-value. If youβre doing a t-test, you need the t-value and the degrees of freedom. For a Ο2 test, you need the Ο2 value and the degrees of freedom. An on-line p-value calculator for two-tailed tests is available at: http://graphpad.com/quickcalcs/pvalue1.cfm The p-value for a one-tailed test would be double the p-value for a two-tailed test. P-values and Statistical Significance So what is a p-value? A p-value is a probability, with a value ranging from zero to one. A value of zero would mean that, if a random sample of a population were taken, there would be no chance that it would have a larger difference from the total population than what you observed. If the p-value was 0.03, there is a 3% chance of observing a difference as large as you observed. P-values and Statistical Significance In general, the smaller the p-value, the more βstatistically significantβ your data is. Itβs up to you to set a threshold pvalue. Once this is done, every result is either statistically significant or not. Many scientists refer to data as being either βvery significantβ if the p-value is below a threshold (usually less than 0.05), and βextremely significantβ if below a lower threshold (often less than 0.01). Sometimes values are flagged with one asterisk for βvery significant,β and two asterisks for βextremely significant.β Confidence Intervals If we donβt know the population mean (µ), we can calculate a confidence interval. A confidence interval is a range of values which we feel βconfidentβ will contain the population mean, µ. The confidence level describes the uncertainty involved with a sampling method. A 90% confidence level means that we are 90% confident that the population mean falls within this interval. Confidence intervals can be calculated from z-scores or t-scores. Confidence Intervals Referring back to the nightmare study, where the severity of nightmares was rated on a scale of 1-5 (1-mild, 5=severe), the 95% confidence interval in the difference score was calculated to be 0.11-2.39. Thus, we can say that we are 95% confident that the difference in nightmare severity after listening to classical music is between 0.11 and 2.39 less than the nightmare severity after listening to hip-hop. Correlation Coefficients When you are looking at a cause-and-effect type of relationship between two variables, a correlation coefficient (r), can be used to measure the strength of the relationship. The value of a correlation coefficient is between 0.00 and 1.00, as follows: Correlation coefficient (r) ±0.70-1.00 ±0.30-0.69 ±0.00-0.29 Strength of Relationship Strong Moderate None(0.00) to Weak Correlation Coefficients Sofia.usgs.gov Summary When analyzing data: β’ Decide what type of experiment you are conducting, and what you are comparing: sample(s) vs. population. β’ If applicable, plot the frequency distribution. β’ Look for trends. β’ Choose a method to test for statistical significance: οΌ z- test οΌ t=test οΌ Chi-Squared test οΌ ANOVA οΌ Regression Claim 1: Money canβt buy you love, but it can buy you a good ball team β’ Specifically, claim is that baseball teams with bigger salaries win more games than those will smaller salaries β’ Data are average (mean) salaries and winning percentages for the 2012 baseball season The data TEAM Arizona Diamondbacks Atlanta Braves Baltimore Orioles Boston Red Sox Chicago Cubs Chicago White Sox Cincinnati Reds Cleveland Indians Colorado Rockies Detroit Tigers Houston Astros Kansas City Royals Los Angeles Angels Los Angeles Dodgers Miami Marlins Milwaukee Brewers Minnesota Twins New York Mets New York Yankees Oakland Athletics Philadelphia Phillies Pittsburgh Pirates San Diego Padres San Francisco Giants Seattle Mariners St. Louis Cardinals Tampa Bay Rays Texas Rangers Toronto Blue Jays Washington Nationals AVG SALARY $ 2,653,029 $ 2,776,998 $ 2,807,896 $ 5,093,724 $ 3,392,193 $ 3,876,780 $ 2,935,843 $ 2,704,493 $ 2,692,054 $ 4,562,068 $ 2,332,730 $ 2,030,540 $ 5,327,074 $ 3,171,452 $ 4,373,259 $ 3,755,920 $ 3,484,629 $ 3,457,554 $ 6,186,321 $ 1,845,750 $ 5,817,964 $ 2,187,310 $ 1,973,025 $ 3,920,689 $ 2,927,789 $ 3,939,316 $ 2,291,910 $ 4,635,037 $ 2,696,042 $ 2,623,746 winning percentage 0.5 0.58 0.574 0.426 0.377 0.525 0.599 0.42 0.395 0.543 0.34 0.444 0.549 0.531 0.426 0.512 0.407 0.457 0.586 0.58 0.5 0.488 0.469 0.58 0.463 0.543 0.556 0.574 0.451 0.605 How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis Proportion of games won 0.65 0.60 Scatter plot 0.55 0.50 0.45 0.40 0.35 0.30 1 2 3 4 5 Mean salary (million $/yr) 6 7 How is this claim best evaluated? -graph and statistical analysis 0.65 2 r = 0.03, p = 0.37 Proportion of games won Nationals 0.60 Scatter plot, Linear regression 0.55 0.50 0.45 Red Sox 0.40 0.35 0.30 1 2 3 4 5 Mean salary (million $/yr) 6 7 Conclusion β’ Money canβt buy you a winning ball team, either Claim 2: Eels control crayfish populations β’ Specifically, claim is that crayfish population densities are lower in streams where eels are present β’ Background: dietary studies show that eels eat a lot of crayfish, and old Swedish stories suggest that eels eliminate crayfish β’ Data are crayfish densities (count along transects, snorkelling) in local streams with and without eels The data River Site Croton Green Chimneys 3.225 0 Croton PEP 0.119 0 Delaware Buckingham 0.25 1 Delaware Callicoon 0 1 Delaware Hankins 0.109 1 Delaware Mongaup 0 1 Delaware Pond Eddy 0.067 1 Neversink Bridgeville 0.233 0 Neversink TNC 0 1 4.53 0 1.1 0 Shawangunk Mount Hope Shawangunk Ulsterville Crayfish (no./m^2) eels Webatuck Levin 0.812 0 Webatuck Shope 1.719 0 Webatuck Still Point 1.4 0 How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? -graph and statistical analysis error bars show 95% confidence limits 2 Crayfish density (number/m ) 4 Bar graph 3 2 1 0 no eels eels How is this claim best evaluated? -graph and statistical analysis error bars show 95% confidence limits 2 Crayfish density (number/m ) 4 Bar graph, t-test p = 0.02 3 2 1 0 no eels eels Conclusion β’ Looks like streams containing eels have fewer crayfish Claim 3: Human life expectancy varies among continents β’ Data are mean life expectancy for women in different countries The data Africa Asia Americas Europe algeria 75 bangladesh 70.2 argentina 79.9 austria 83.6 cameroon 53.6 china 75.6 brazil 77.4 belgium 82.8 cote d'ivoire 57.7 india 67.6 canada 85.3 bulgaria 77.1 egypt 75.5 indonesia 71.8 chile 82.4 czech rep 81 kenya 59.2 iran 75.3 columbia 77.7 denmark 87.4 morocco 74.9 japan 87.1 mexico 79.6 estonia 80 nigeria 53.4 malaysia 76.9 peru 76.9 finland 83.3 south africa 54.1 pakistan 66.9 usa 81.3 france 84.9 zimbabwe 52.7 philippines 72.6 venezuela 77.7 germany 83 singapore 83.7 greece 82.6 How is this claim best evaluated? -graph and statistical analysis How is this claim best evaluated? Life expectancy for women (years) -graph and statistical analysis 90 error bars show 95% confidence limits 80 Bar graph 70 60 50 Africa Asia Note that y-axis doesnβt start at 0 Americas Europe How is this claim best evaluated? Life expectancy for women (years) -graph and statistical analysis 90 error bars show 95% confidence limits 80 Bar graph, 1-way ANOVA, p = 0.0000001 70 60 50 Africa Asia Americas Europe Anova: Single Factor SUMMARY Groups Africa Asia Americas Europe Count 9 10 9 10 Sum Average 556.1 61.78889 747.7 74.77 718.2 79.8 825.7 82.57 ANOVA Source of Variation Between Groups Within Groups 1353.931 34 Total 3705.531 37 SS 2351.6 df MS Variance 104.6261 42.78233 7.7875 7.731222 F P-value F crit 3 783.8666 19.68451 1.42E-07 2.882604 39.8215 Conclusion β’ Life expectancy of women appears to differ among continents β’ (The ANOVA doesnβt tell us which continents are different; further tests would be necessary to test claims about specific continents) Measures of Variation The standard deviation for a sample (S) formula is similar: 2 π= (π β π) π And, when using sample data to estimate the standard deviation of a population β this is called the unbiased estimator of the true population standard deviation (s), use the following formula: 2 π = (π β π) (π β 1) Statistical Distribution Shapes Fao.org A: Normal Distribution B: Positively Skewed Distribution C: Negatively Skewed Distribution