Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript

Statistics [1/2,3/2] The Essential Mathematics Standard Error • What standard deviation is to an individual (relative to a population mean), standard error is to a sample mean (relative to a population mean) • standard deviation/sqrt(n) • All parameters have a standard error associated with them...we use them to “normalize” statistical tests Short Exercise • What is the mean of {1,2,3,4,5}? • Now, let’s take all possible triplets: • {1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}, {1,4,5}, {2,3,4}, {2,3,5}, {2,4,5}, {3,4,5} Short Exercise • • • • What is the mean of {1,2,3,4,5} = 3 • Std. Dev (sample) = 1.58114 Now, let’s take all possible triplets: 2, 7/3, 8/3, 8/3, 3, 10/3, 3, 10/3, 11/3, 4 • • Mean = 3, Std. Dev (sample) = .60858 Maximum offset: 1 (was originally 2) Message: having a group reduces the Std. Dev, hence we have standard error Statistical Tests • Null hypothesis: A hypothesis of no change • Alternate hypothesis: A hypothesis of change • All stats tests assume “no change from something”...the goal is to prove otherwise... Common Tests • Skewness and Kurtosis • Z-Test / T-Test • ANOVA / F-Test • Correlation Test Standard Error Skewing Who’s Skewed? Standard Error Skewing Who’s Kurtic? • • • Central Limit Theorem Population distribution X Take n (large) random samples and compute the mean of the samples The distribution of these random sample means (independent of X) will follow the Gaussian distribution, hence we call it Normal Normal Distribution Z-Test • • Assumes normality • Either you know it should be normal, or you have enough of a sample size to use the Central Limit Theorem (observed - mean)/(std. dev / sqrt(n)) • This equation is generalized for sample means of sample size n (individual is n = 1) Example • A group of 9 people takes an IQ test. The population is known to follow a normal distribution with average score of 100 on the same test with a standard deviation of 15. The group of 9 averaged a score of 105. Should we assume that this group differs from the population of test takers? Calculation • (sample mean - population mean) = 5 • (std. dev)/sqrt(9) = 5 • z = 5/5 = 1 • What does this 1 mean? Generalization • An arbitrary Gaussian distribution down to a Gaussian distribution with mean 0 and standard deviation 1 • It’s a value that helps us find another value p-value • Every statistical test has a p-value • The probability that other observations (less than it) have already occurred • In other words, how extreme the observation is relative to others of its kind • z = 1 links to a p-value of .8414 (or .1586) • Not something very extreme a-level • • • • Every statistical test has an alpha level The level at which you reject the null hypothesis in favor of the alternate hypothesis This defines how you handle the p-value Otherwise known as Type 1 Error (false rejection probability) T-test • A test for when normality cannot be assumed • Behaves just like a z-test, but has a different distribution to work from • Degrees of freedom ANOVA • A way to test whether or not there is a difference based upon some factor in a study • Partitions variance into sources and uses the ratio as the determining factor One-Way ANOVA SS Betwee n Within Total df k-1 SSTSSB MS p SSB/df MSB/M FdfB,dfW( B SW f) SSW/df k(n-1) W kn-1 F Two-Way ANOVA SS df A a-1 B b-1 AB Within Total MS SSA/df A SSB/df B F p MSA/MSW FA,W(fA) MSB/MSW FB,W(fB) SSAB/dfA MSAB/MS (a-1)(b-1) F AB,W(fAB) B W SST-SSAdfT-dfASSBdfB-dfAB SSAB nab-1 SSW/dfW Example ANOVA Switch vs. LHP Switch vs. RHP Example • It has always been said that hitter of the opposite hand as the pitcher throws will succeed at a higher rate • Does this claim hold water? Example • Managers frequently set their lineups on the principle that they do not want lefthanded hitters back-to-back because a left-handed specialist (almost always an LHP) can be used to get consecutive outs, yet righties are frequently stacked without concern. • Are these managers paranoid, or is there some merit to this? Sample Set • • 30 of the top 75 qualifying hitters for MLB batting titles in 2012 were selected • • • Top 10 right-handed hitters Top 10 left-handed hitters Top 10 switch hitters (both left and right) Average against LHP and average against RHP was recorded for each of these 30 hitters Let’s check it out! Correlation Test • I got an r-value from a regression that I performed • What does it tell me? • Long story short, it depends on the sample size Correlation Test Statistic H0: correlation (r) = p HA: correlation is <,> that Interesting Picture Positively Correlated, but could be perfect positive model Not Correlated, but could be perfect positive model Not No Correlated Clue Not Correlated, but could be perfect negative model Negatively Correlated, but could be perfect negative model Positively Correlated Negatively Correlated What did we learn? • When dealing with correlation studies, make sure you have at least 13 observations • • • • You can disassociate no correlation from the possibility of a perfect model at this sample size (at 95% confidence) With more confidence, you will need more observations to achieve this A little correlation goes a long way in large samples With small samples, more correlation is required to make a claim Assignment • • Given definition of outliers for a population: • • • 25% - 1.5(IQR) 75% + 1.5(IQR) Determine what the z-scores of the minimum outliers on either side would be I will send you an ANOVA table: • Tell me the factorial environment • • • A has a levels B has b levels How many subjects per block n