* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Stats PowerPoint (t-test)
Survey
Document related concepts
Transcript
Histograms and Distributions Questions: Do athletes have faster reflexes than non-athletes? Experiment: - You go out and 1st collect the reaction time of 25 nonathletes. Histograms and Distributions Non-Athletes Individual Reaction Time (ms) 1 230 2 268 3 243 4 233 5 210 6 329 7 314 8 278 9 324 10 311 11 210 12 225 13 295 14 282 15 274 16 270 17 307 18 247 19 298 20 276 21 257 22 233 23 256 24 298 25 300 Non-Athletes reaction time in millliseconds (ms) Calculate the mean… 278.5 Histograms and Distributions Athletes Individual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Reaction Time (ms) 215 218 223 226 230 231 231 245 251 255 261 265 268 270 275 275 284 287 290 294 294 298 301 307 315 Athletes reaction time in millliseconds (ms) Calculate the mean score… 264.4 Compare: mean athletes Nonathletes 264.4 278.5 Histograms and Distributions Non-Athletes reaction time in millliseconds (ms) arranged from low to high reaction time Make a histogram to display the data… Histograms and Distributions Histogram 4.5 4 frequency 3.5 3 2.5 Series1 2 1.5 1 0.5 reaction time (ms) Sample size: 25 Histogram = a plot of frequency Non-athletes 330-339 321-329 311-320 301-310 291-300 281-290 271-280 261-270 251-260 241-250 231-240 221-230 210-220 200-210 0 Sample size: 25 reaction time (ms) Athletes 330-339 321-329 311-320 301-310 291-300 281-290 271-280 261-270 251-260 241-250 231-240 221-230 210-220 200-210 frequency Histograms and Distributions Histogram 4.5 4 3.5 3 2.5 2 Series1 1.5 1 0.5 0 Histograms and Distributions Compare the histograms of non-athletes to athletes: Histogram Histogram 4.5 4 3.5 3.5 3 3 reaction time (ms) MEAN: reaction time (ms) Non-athletes Athletes 278.5 264.4 330-339 321-329 311-320 301-310 291-300 281-290 271-280 261-270 251-260 330-339 321-329 311-320 301-310 291-300 281-290 271-280 261-270 0 251-260 0 241-250 0.5 231-240 0.5 221-230 1 210-220 1 241-250 1.5 231-240 1.5 Series1 2 221-230 2 2.5 210-220 Series1 200-210 2.5 frequency 4 200-210 frequency 4.5 Histograms and Distributions Compare the histograms of non-athletes to athletes: Number of students (frequency) 4.5 4 3.5 3 2.5 Series1 2 Series2 1.5 1 0.5 0 Reaction time (ms) MEAN: Non-athletes Athletes 278.5 264.4 Q: Is there really a difference between these two groups??? Histograms and Distributions The student decided to collect more data (larger sample size), which is really the only option at this point… bin 200-210 210-220 221-230 231-240 241-250 251-260 261-270 271-280 281-290 291-300 301-310 311-320 321-329 330-339 sample size non-athletes athletes 0 1 2 2 1 2 2 6 12 17 15 9 4 0 73 3 6 8 12 15 10 8 6 3 3 2 1 0 0 77 Number of students (frequency) 18 16 14 12 10 8 Series1 6 Series2 4 2 0 Reaction time (ms) Non-athletes MEAN: 298 Athletes 251 Histograms and Distributions Comparison of histograms with small vs. large sample size: Series2 Number of students (frequency) Series1 200-210 210-220 221-230 231-240 241-250 251-260 261-270 271-280 281-290 291-300 301-310 311-320 321-329 330-339 Number of students (frequency) 18 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 16 14 12 10 8 Series1 6 Series2 4 2 0 Reaction time (ms) MEAN: Non-athletes 279 Athletes 264 Sample size: 25 in each group (N=50) Reaction time (ms) MEAN: Non-athletes 298 Athletes 251 Sample size: 73 in non-athletes 77 in athletes (N=150) Histograms and Distributions Let’s go back to the small sample size data… Number of students (frequency) 4.5 4 3.5 3 2.5 Series1 2 Series2 1.5 1 0.5 0 Reaction time (ms) MEAN: Non-athletes Athletes 278.5 264.4 How can we determine if there is a significant difference between these two groups? Histograms and Distributions Normal or Gaussian Distribution Standard deviation (sigma) First one needs to determine the standard deviation, which is basically a measure of the width of the histogram. For example, the mean of the non-athletes is 278.5 ms. If the standard dev. is determined to be 30 ms, then it is assumed that 68.2% of the data will fall between 278.5 +/- 30ms (between 248.5 and 308.5 ms). Would you prefer your standard dev. to be larger or smaller in value? Histograms and Distributions How do we determine the standard deviation (sigma) of the mean? Histograms and Distributions 1. Find the distance between each value and the mean Non-Athletes Individual Reaction Time (ms) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 210 225 233 233 247 256 257 268 270 274 276 278 282 286 287 295 298 298 300 305 307 311 314 324 329 -68.52 -53.52 -45.5 -45.5 -31.5 -22.5 -21.5 -10.5 -8.5 -4.5 -2.5 -0.5 3.5 7.5 8.5 16.5 19.5 19.5 21.5 26.5 28.5 32.5 35.5 45.5 50.5 210-278.5 225-278.5 233-278.5 233-278.5 … This will tell you how far away each value is from the mean and begin to help you understand the width of your distribution. Histograms and Distributions 2. Square all the differences Non-Athletes Individual Reaction Time (ms) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 210 225 233 233 247 256 257 268 270 274 276 278 282 286 287 295 298 298 300 305 307 311 314 324 329 -68.52 -53.52 -45.5 -45.5 -31.5 -22.5 -21.5 -10.5 -8.5 -4.5 -2.5 -0.5 3.5 7.5 8.5 16.5 19.5 19.5 21.5 26.5 28.5 32.5 35.5 45.5 50.5 4694.9904 2864.3904 2070.25 2070.25 992.25 506.25 462.25 110.25 72.25 20.25 6.25 0.25 12.25 56.25 72.25 272.25 380.25 380.25 462.25 702.25 812.25 1056.25 1260.25 2070.25 2550.25 Histograms and Distributions 3. Sum all the squares Non-Athletes Individual Reaction Time (ms) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 210 225 233 233 247 256 257 268 270 274 276 278 282 286 287 295 298 298 300 305 307 311 314 324 329 -68.52 -53.52 -45.5 -45.5 -31.5 -22.5 -21.5 -10.5 -8.5 -4.5 -2.5 -0.5 3.5 7.5 8.5 16.5 19.5 19.5 21.5 26.5 28.5 32.5 35.5 45.5 50.5 4694.9904 2864.3904 2070.25 2070.25 992.25 506.25 462.25 110.25 72.25 20.25 6.25 0.25 12.25 56.25 72.25 272.25 380.25 380.25 462.25 702.25 812.25 1056.25 1260.25 2070.25 2550.25 23957.13 Histograms and Distributions 4. Divide the sum by the number of scores minus 1 Non-Athletes Individual Reaction Time (ms) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 210 225 233 233 247 256 257 268 270 274 276 278 282 286 287 295 298 298 300 305 307 311 314 324 329 -68.52 -53.52 -45.5 -45.5 -31.5 -22.5 -21.5 -10.5 -8.5 -4.5 -2.5 -0.5 3.5 7.5 8.5 16.5 19.5 19.5 21.5 26.5 28.5 32.5 35.5 45.5 50.5 4694.9904 2864.3904 2070.25 2070.25 992.25 506.25 462.25 110.25 72.25 20.25 6.25 0.25 12.25 56.25 72.25 272.25 380.25 380.25 462.25 702.25 812.25 1056.25 1260.25 2070.25 2550.25 23957.13 24 998.2 (variance) Histograms and Distributions 5. Take the square root of the variance Non-Athletes Individual Reaction Time (ms) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 210 225 233 233 247 256 257 268 270 274 276 278 282 286 287 295 298 298 300 305 307 311 314 324 329 -68.52 -53.52 -45.5 -45.5 -31.5 -22.5 -21.5 -10.5 -8.5 -4.5 -2.5 -0.5 3.5 7.5 8.5 16.5 19.5 19.5 21.5 26.5 28.5 32.5 35.5 45.5 50.5 4694.9904 2864.3904 2070.25 2070.25 992.25 506.25 462.25 110.25 72.25 20.25 6.25 0.25 12.25 56.25 72.25 272.25 380.25 380.25 462.25 702.25 812.25 1056.25 1260.25 2070.25 2550.25 31.6 (standard deviation) Histograms and Distributions Standard deviation formula (what we just did): - the square root of the sum of the squared deviations from the mean divided by the number of scores minus one Histograms and Distributions Standard deviation formula: Non-athletes: 278.5 SD(σ)=31.6 Athletes: 264.4 SD(σ)=30.6 Are these groups statistically different from each other?? Histograms and Distributions T-Test assesses whether the means of two groups are statistically different from each other Histograms and Distributions Histograms and Distributions Histograms and Distributions = Standard Error of the difference Histograms and Distributions Histograms and Distributions Histograms and Distributions Therefore the t-value is related to how different the means are and how broad yours data is. A high t-value is obviously what you hope for… Calculate the t-score Histograms and Distributions t = -1.61 -Degrees of freedom is the sum of the people in both groups minus 2 df = 48 Histograms and Distributions The null hypothesis vs the hypothesis 1. The hypothesis: Athletes will have a quicker reaction time than non-athletes. 2. The null hypothesis: The null hypothesis always states that there is no relationship between the two groups or there is no difference in reaction time between athletes and nonathletes. Histograms and Distributions The p-value 1. The p-value is a number between 0 and 1. 2. It is the probability (hence the p-value) that there is no difference between the groups supporting the null hypothesis. 3. Therefore, the probability that there is a difference between the two groups is 1 minus the p-value. 4. In order for the data to support the hypothesis, the p-value must be high or low? The p-value should be low (<0.05), which says that there is less than a 5% chance that there is no difference between the two groups. Therefore, there is greater than 95% chance that there is a difference. Histograms and Distributions Statistical Significance When the p-value is less than 0.05, we say that the data is statistically significant, and there may be a real difference between the two groups. Be warned that just because p is less than 0.05 between two groups doesn’t mean that there is actually a difference. For example, if we find p < 0.05 for the reaction time experiment, it doesn’t mean that there is a definite difference between athletes and non-athletes. It only means that there is a difference in our data, but our data might be flawed or there is not enough data yet (sample size too small) or we measured the data improperly, or the sampling wasn’t random, or the experiment was garbage, etc… Doubt is the greatest tool of any scientist (person). Histograms and Distributions How is the p-value determined? The p-value is found by using a standard t-table in combination with the t-value and the degrees of freedom previously determined: http://bioinfo-out.curie.fr/ittaca/documentation/Images/ttable.gif http://davidmlane.com/hyperstat/t-table.html Histograms and Distributions Now you try it: 1. On Edmodo you will find data collected by Tom and Ileana regarding one’s ability to estimate the length of a line or the number of spots on a screen. 2. The questions were accompanied with a survey that asked for the subject’s grade level, ethnicity, participation in sports, and honors vs. regents level. 3. The wanted to know if any of these differences would correlate to their ability to estimate. How should we analyze this data? Histograms and Distributions 1. Begin by choosing the dependent variable like grade for example. Since the T-test can only look at two groups simultaneously and there are four grades, we need to perform all the possible combinations (there was apparently only one 9th grader and therefore the sample size is too low to look at this grade): 10th vs 11th 10th vs 12th 11th vs 12th We also would want to know if the mean of each group is significantly different than the actual value. Actual value vs 10th Actual value vs 11th Actual value vs 12th This needs to be done twice, once for the line estimation and once for the dots estimation!! Histograms and Distributions These are the tables you need to fill out: Grade Mean SD Variance 10th 11th 12th Gades Difference of means Variability of Groups T-score P-value 10th vs actual 11th vs actual 12th vs actual 10th vs 11th 10th vs 12th 11th vs 12th Write a conclusion based on your analysis. Remember, just because p < 0.5 it doesn’t necessarily mean you hypothesis is supported!