* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2030Lecture5
Survey
Document related concepts
Transcript
Final Project • Some details on your project – Goal is to collect some numerical data pertinent to some question and analyze it using one of the statistical tests we’ve discussed in class – You will be graded on all aspects of the task from the nature of the question to the execution of the statistical test Final Project • Some examples: – Does the price of oil correlate with the price of gasoline? • Approach: record daily price of oil and the price of gas at some gas station over several weeks and run a correlation – Is Calgary colder/windier/rainer than Edmonton • Collect data from Environment Canada’s web site – Do Canadians score more than other NHL players? • Collect data from any sports section or website Final Project • Guidelines: – Use readily available observational data • Don’t run an experiment unless you check with me first!!! – Keep questions simple and straightforward • Get your idea checked by Farshad before you proceed – Plan to do your project with Excel or some stats program • Turn in the data, the relevant statistics, and one or two sentences explaining your question and the answer should fit on one page. Some Review • A population is a really big bunch of numbers Some Review • A population is a really big bunch of numbers • A sample is some of the numbers from a population Some Review • All sets of numbers have a distribution – The population has a mean – A sample has a mean that is probably similar but not necessarily the same as the population Some Review • All sets of numbers have a distribution – The population has a standard deviation – A sample has a standard deviation that is probably similar but not necessarily the same as the population Some Review • If we think in terms of standard deviation, we can know things like whether or not a single number is very different from the mean of a population Some Review • But often we’re not interested in single numbers - we’ve collected a sample and computed a mean • That mean comes from a population of sample means (you just happened to pick one of them) • The mean of the distribution of sample means is the mean of the population • The standard deviation of the sample means is the standard error Some Review • If we think in terms of standard errors, we can know things like whether a particular mean is very different from the mean of a population Keep these ideas straight • If we think in terms of standard deviation, we can know things like whether or not a single number is very different from the mean of a distribution xi x zi Sx • If we think in terms of standard errors, we can know things like whether a particular mean is very different from the mean of a population Zx x x x Some Review • We use the Z table to look up the probability that a particular Z score came from any normal population Some Review • We use the Z table to look up the probability that a particular Z score came from any normal population • Since the population of sample means is normal (Central Limit Theorem), we can use the same Z table to look up the probability that a sample mean came from a population with a particular mean Now a Real Example • Break into groups of 10 • Write down your heights in inches • Compute the mean of your n=10 sample • Compute the standard deviation • Hand it all in to Fraser Critical Z Value • In our examples we’ve been testing the hypothesis that one sample has a mean that is higher (or lower) than a population mean Critical Z Value • In our examples we’ve been testing the hypothesis that one sample has a mean that is higher (or lower) than a population mean • Let’s turn this around a bit…let’s work backwards Critical Z Value • How much bigger would a sample mean have to be so that there’s only a 5% chance that it came from a particular population? Critical Z Value • How much bigger would a sample mean have to be so that there’s only a 5% chance that it came from a particular population? This is the alpha = .05 threshold Gaussian (Normal) Distribution 0.6 0.5 probability 0.4 0.3 95% 5% 0.2 0.1 0 -4 -3 -2 -1 0 score 1 2 3 4 What Z score? Critical Z Value • This is sometimes called the critical Z value or Zcrit (one tailed) 1.64 Directional vs. Bidirectional Tests • In our examples we’ve been testing the hypothesis that one sample has a mean that is higher (or lower) than a population mean • We call this a directional or “one-tailed” test • What does that one-tailed bit mean !? Directional vs. Bidirectional Tests • We were checking to see if our sample had a mean far enough into the positive tail of the distribution and ignoring the negative tail Directional vs. Bidirectional Tests • Often we haven’t made a directional hypothesis, but have simply predicted “a difference” Directional vs. Bidirectional Tests • Often we haven’t made a directional hypothesis, but have simply predicted “a difference” • In that situation, we are twice as likely to make a Type I error: the sample mean could, by chance, be in either tail ! Directional vs. Bidirectional Tests • What would the critical Z value be so that there is a 5% chance that a mean is beyond it in either direction? Directional vs. Bidirectional Tests • What would the critical Z value be so that there is a 5% chance that a mean is beyond it in either direction? This is the alpha = .05 threshold Gaussian (Normal) Distribution 0.6 0.5 probability 0.4 0.3 2.5% 95% 2.5% 0.2 0.1 0 -4 -3 -2 -1 0 score 1 2 3 4 What Z score? Directional vs. Bidirectional Tests • Thus: Zcrit (two tailed) 1.96