Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Measures of Central Tendency “to be or not to be Normal” TOPICS • • • • • • • Normal Distributions Skewness & Kurtosis Normal Curves and Probability Z- scores Confidence Intervals Hypothesis Testing The t-distribution Is this normal ? 3.5 3.0 2.5 2.0 1.5 1.0 Std. Dev = 160.68 .5 Mean = 178.3 N = 6.00 0.0 100.0 200.0 300.0 400.0 500.0 VAR00001 VAR00001 Valid 70.00 100.00 150.00 500.00 Total Frequency 1 2 2 1 6 Percent 16.7 33.3 33.3 16.7 100.0 Valid Percent 16.7 33.3 33.3 16.7 100.0 Cumulative Percent 16.7 50.0 83.3 100.0 Statistics VAR00001 N Mean Skewness Std. Error of Skewnes s Kurtos is Std. Error of Kurtos is Valid Mis sing 6 0 178.3333 2.242 .845 5.219 1.741 Normal Distributions • Are your curves normal? • Why do we care about normal curves? • What do normal curves tell us? Answer: The curves tell us something about the distribution of the population The curves allow us to make statistical inferences regarding the probability of some outcomes within some margin of error The normal distribution • A distribution is easily depicted in a graph where the height of the line determined by the frequency of cases for the values beneath it. • Most cases cluster near the middle of a distribution if close to normal The Normal Curve • Bell-shaped distribution or curve • Perfectly symmetrical about the mean. Mean = median = mode • Tails are asymptotic: closer and closer to horizontal axis but never reach it. Skewness and Sample Distributions Not all curves are normal, even if still bell-shaped Skewness • Formula for skewness 3(mean median) Skewness Sy Kurtosis (It’s not a disease) • Beyond skewness, kurtosis tells us when our distribution may have high or low variance, even if normal. • The kurtosis value for a normal distribution will equal 3. Anything above this is a peaked value (low variance) and anything below is platykurtic (high variance). Back to normal distributions • The power of normal distributions, or those close to it, is that we can predict where cases will fall within a distribution probabilistically. • For example, what are the odds, given the population parameter of human height, that someone will grow to more than eight feet? • Answer, likely less than a .025 probability Sample Distribution • What does Andre the Giant do to the sample distribution? • What is the probability of finding someone like Andre in the population? • Are you ready for more inferential statistics? • Answer: Oh boy, yes!! Normal Curves and probability • We have answered the question of what Andre and the Sumo wrestler would do to the distribution • But what about the probability of finding someone the same height as Andre in the population? • What is the probability of finding someone the same height as Dr. Peña or Dr. Boehmer? More on normal curves and probability Dr. Boehmer would be here Andre would be here Z-Scores (no sleeping!!) • We can standardize the central tendency away from the mean across different samples with z-scores. • The basic unit of the z-score is the standard deviation. (Xi X ) z s We can use the z-score to score each observation as a distance from the mean. How far is a given observation from the mean when its z-score = 2? Answer: 2 standard deviations. Approximately what percentage of cases is a given case higher than if its z-score = 2? Answer: 97% Random Sampling Error • Ever hear a poll report a margin of error? What is that? Random Sampling Error = standard deviation/ square root of the sample size Or N As the variance of the population increases, so does the chance that a sample could not reflect the population parameters Standard Error • We often refer to both the random sampling error with both the chance to err when sampling but also the error of a specific sample statistic, the mean. We typically use the term Standard Error. • A sample statistic standard error is the difference between the mean of a sample and the mean of the population from which it is drawn. Standard Error Example: What if most humans were 200 pounds and only 1 million globally were 250 pounds? The random sampling error would be low since the chance of collecting a sample consisting heavily of those heavier humans would be unlikely. There would not be much error in general from sampling because of the low variance. Standard Error • Example continued. Now, when we take a sample, each sample has a mean. If a population has low variance, so should the samples. We should see this reflected in low standard error in the mean of the sample, the sample statistic. • Of course, higher variance in the population also causes higher error in samples taken from it. Some more notation Distributions Mean Sample of observed data X Population μ Repeated Sampling μ Standard Dev. s σ N Random Sampling Error Error in a Sample’s mean is the Standard Error s n Central Limit Theorem Remember that if we took an infinite number of samples from a population, the means of these samples would be normally distributed. Hence, the larger the sample relative to the population, the more likely the sample mean will capture the population mean. Confidence Intervals • We can actually use the information we have about a standard deviation from the mean and calculate the range of values for which a sample would have if they were to fall close to the mean of the population. • This range is based on the probability that the sample mean falls close to the population mean with a probability of .95, or 5% error. How Confident Are You? • Are you 100% sure? • Social scientists use a 95% as a threshold to test whether or not the results are product of chance. • That is, we take 1 out of 20 chances to be wrong • What do you MEAN? We build a 95% confidence interval to make sure that the mean will be within that range Confidence Interval (CI) Y Z / 2 y Y = mean Z = Z score related with a 95% CI σ = standard error samplemean 1.96(or 2) * standarder ror Building a CI • Assume the following y 100 y 15 N 400 Y y y 15 400 N .750 CI 100 (1.96)(0.750 ) Upper 101 .47 Lower 98.53 Why do we use 1.96? Calculating a 95% CI 1. Let’s look at the class population distribution of height 2. Is it a normal or skew distribution? 3. Let’s build a 95% CI around the mean height of the class Why do we care about CI? • We use CI interval for hypothesis testing • For instance, we want to know if there is an income difference between El Paso and Boston • We want to know whether or not taking class at Kaplan makes a difference in our GRE scores Mean Difference testing Mean USA El Paso Las Cruces Income levels Boston