Download 2030Lecture5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Final Project
• Some details on your project
– Goal is to collect some numerical data pertinent to
some question and analyze it using one of the
statistical tests we’ve discussed in class
– You will be graded on all aspects of the task from
the nature of the question to the execution of the
statistical test
Final Project
• Some examples:
– Does the price of oil correlate with the price of
gasoline?
• Approach: record daily price of oil and the price of gas at some
gas station over several weeks and run a correlation
– Is Calgary colder/windier/rainer than Edmonton
• Collect data from Environment Canada’s web site
– Do Canadians score more than other NHL players?
• Collect data from any sports section or website
Final Project
• Guidelines:
– Use readily available observational data
• Don’t run an experiment unless you check with me first!!!
– Keep questions simple and straightforward
• Get your idea checked by Farshad before you proceed
– Plan to do your project with Excel or some stats
program
• Turn in the data, the relevant statistics, and one or two
sentences explaining your question and the answer should fit on one page.
Some Review
• A population is a really big bunch of
numbers
Some Review
• A population is a really big bunch of
numbers
• A sample is some of the numbers from
a population
Some Review
• All sets of numbers have a distribution
– The population has a mean
– A sample has a mean that is probably
similar but not necessarily the same as the
population
Some Review
• All sets of numbers have a distribution
– The population has a standard deviation
– A sample has a standard deviation that is
probably similar but not necessarily the
same as the population
Some Review
• If we think in terms of standard
deviation, we can know things like
whether or not a single number is very
different from the mean of a population
Some Review
• But often we’re not interested in single
numbers - we’ve collected a sample and
computed a mean
• That mean comes from a population of
sample means (you just happened to pick
one of them)
• The mean of the distribution of sample means
is the mean of the population
• The standard deviation of the sample means
is the standard error
Some Review
• If we think in terms of standard errors,
we can know things like whether a
particular mean is very different from
the mean of a population
Keep these ideas straight
• If we think in terms of
standard deviation, we
can know things like
whether or not a single
number is very different
from the mean of a
distribution
xi  x
zi 
Sx
• If we think in terms of
standard errors, we can
know things like
whether a particular
mean is very different
from the mean of a
population
Zx 
x  x
x
Some Review
• We use the Z table to look up the
probability that a particular Z score
came from any normal population
Some Review
• We use the Z table to look up the
probability that a particular Z score
came from any normal population
• Since the population of sample means
is normal (Central Limit Theorem), we
can use the same Z table to look up the
probability that a sample mean came
from a population with a particular mean
Now a Real Example
• Break into groups of 10
• Write down your heights in inches
• Compute the mean of your n=10
sample
• Compute the standard deviation
• Hand it all in to Fraser
Critical Z Value
• In our examples we’ve been testing the
hypothesis that one sample has a mean that
is higher (or lower) than a population mean
Critical Z Value
• In our examples we’ve been testing the
hypothesis that one sample has a mean that
is higher (or lower) than a population mean
• Let’s turn this around a bit…let’s work
backwards
Critical Z Value
• How much bigger would a sample mean have to be
so that there’s only a 5% chance that it came from a
particular population?
Critical Z Value
• How much bigger would a sample mean have to be
so that there’s only a 5% chance that it came from a
particular population?
This is the alpha
= .05 threshold
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
95%
5%
0.2
0.1
0
-4
-3
-2
-1
0
score
1
2
3
4
What Z score?
Critical Z Value
• This is sometimes called the critical Z value or
Zcrit (one  tailed) 1.64

Directional vs. Bidirectional
Tests
• In our examples we’ve been testing the
hypothesis that one sample has a mean that
is higher (or lower) than a population mean
• We call this a directional or “one-tailed” test
• What does that one-tailed bit mean !?
Directional vs. Bidirectional
Tests
• We were checking to see if our sample had a
mean far enough into the positive tail of the
distribution and ignoring the negative tail
Directional vs. Bidirectional
Tests
• Often we haven’t made a directional
hypothesis, but have simply predicted “a
difference”
Directional vs. Bidirectional
Tests
• Often we haven’t made a directional
hypothesis, but have simply predicted “a
difference”
• In that situation, we are twice as likely to
make a Type I error: the sample mean could,
by chance, be in either tail !
Directional vs. Bidirectional
Tests
• What would the critical Z value be so that
there is a 5% chance that a mean is beyond it
in either direction?
Directional vs. Bidirectional
Tests
• What would the critical Z value be so that
there is a 5% chance that a mean is beyond it
in either direction?
This is the alpha
= .05 threshold
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
2.5%
95%
2.5%
0.2
0.1
0
-4
-3
-2
-1
0
score
1
2
3
4
What Z score?
Directional vs. Bidirectional
Tests
• Thus:
Zcrit (two tailed)   1.96
