Download AP STAT SEMINAR - 1 - Hatboro

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Name _______________________________
AP STATISTICS CHAPTER 1:
CUES
1. What is the difference between descriptive and
inferential statistics?
2. How is a sample related to a population?
3. What are two types of numerical variables?
4. What is the difference between a categorical and a
quantitative variable?
5. Define Distribution:
6. What are 3 attributes of a good chart?
The following terms help describe the shape of a
distribution? Draw a sketch to help explain each
term.
BIMODAL VS UNIMODAL –
UNIFORM DISTRIBUTION -
SYMMETRIC DISTRIBUTION –
SKEWED LEFT –
SKEWED RIGHT –
APPROXIMATELY NORMAL -
7. What acronym should you use to describe the
overall pattern of a distribution?
8. How are histograms and stemplots similar? How
are they different?
SUMMARY/QUESTIONS TO ASK IN CLASS
NOTES
Name _______________________________
AP STATISTICS CHAPTER 1:
CUES
NOTES
The following stem and leaf plot displays the number
of home runs hit, per season, by Barry Bonds from
his rookie year in 1986 to the year 2003.
1
2
3
4
5
6
7
69
455
334477
025669
3
MEASURING SPREAD:
can be used to describe the spread of a distribution.
The 5-number summary:
Boxplot:
The inter-quartile range (IQR):
Call an observation an
falls more than
below Q1.
if it
above Q3 or
Does the Barry Bonds data contain any outliers?
MEASURING CENTER: We use the following
measures to describe the center of a distribution:
MEAN –
MEDIAN –
What are the mean and median of the Barry Bonds
data?
What is the effect on the mean and the median if the
outlier (73) is changed to 51?
A statistic is said to be
if it is
unaffected by changes to individual data points.
SUMMARY/QUESTIONS TO ASK IN CLASS
Name _______________________________
AP STATISTICS CHAPTER 1:
NOTES
CUES
The most common statistic used for describing spread
is standard deviation
Deviations
x
16
19
24
25
25
33…
Sample variance: s2 =
Sample standard deviation: s =
Are the variance and standard deviation resistant?
SUMMARY/QUESTIONS TO ASK IN CLASS
x  x
Squared deviations
x  x
2
Name _______________________________
AP STATISTICS CHAPTER 1:
Why divide by n-1 and not n?
"Some questions are more easily asked than answered"
Edward Sapir(?)
Possibly the most frequently asked and least frequently
answered question is why does the definition of the
standard deviation involve division by n-1, when n might
seem the obvious choice. This is a question which
perplexes introductory statistics students and calculator
manufacturers alike. The explanations given in calculator
manuals tend to range from obscure to fanciful, and both
options are given on the keypad, usually labelled n-1 and
n to add to the confusion. (The symbol  is reserved for
the standard deviation of a random variable or a
population/distribution, an entity which lecturers valiantly
try but usually fail to keep distinct from its sample
counterpart.) Australian students first meet the standard
deviation in secondary school, where the definition given
does indeed involve division by n. This definition is
preferred to avoid the question about the n-1 being raised it
would seem. Secondary school teachers have a hard enough
life as it is. And it must be remembered that the difference
between the two definitions is largely academic for all but
the smallest of sample sizes (say, less than 10
observations). So, don't get too agitated by the revamped
definition. The truth can be told but the telling usually
quells the desire to know. If the fire still burns in your
belly, read on.
The most widely accepted explanation involves the concept
of unbiasedness, and I see some have stopped reading
already. If you fire arrows at a target and consistently hit a
mark 5 cm to the left of the bullseye, there is something
wrong with your aim. It shows a bias. The definition of the
sample variance which involves division by n has this flaw.
It consistently underestimates the variance of the
population/distribution from which the sample was drawn.
The n-1 formula fixes the astigmatism. Compelling and
relatively simple as this argument is, it doesn't quite ring
true. Both definitions of the sample standard deviation
produce biased estimates of the standard deviation of the
population/distribution, although the n-1 alternative is less
biased. If you're after unbiasedness, why not use a
definition which gives you unbiasedness where it's needed on the original scale of measurement, rather than on the
squared scale. Such a contender exists, but it involves
gamma functions in the definition, and I see quite a few
more people have drifted away. (Gamma functions extend
the concept of factorials to non-integers.)
The real reason is a simple housekeeping issue. If you deal
with the n-1 straight away in the definition of the standard
deviation, it doesn't keep popping up in every subsequent
procedure involving the standard deviation, to the
increasing annoyance of all concerned. The subsequent
procedures in question involve the definition of the t and 2
distributions where the issue of degrees of freedom arises.
Degrees of freedom means what it says - in how many
independent directions can you move at once. If you're a
SUMMARY/QUESTIONS TO ASK IN CLASS
point moving on a page, you are moving in two dimensions
and you have correspondingly two degrees of freedom. The
freedom to move up the page and the freedom to move
across it. Any motion on a page can be described in terms
of these two independent motions. Now consider a sample
of size n. It inhabits an n dimensional space. There are n
degrees of freedom in total. Each sample member is free to
take any value it likes, independently from all the others. If
however, you fix the sample mean, then the sample values
are constrained to have a fixed sum. You can let n-1 of
them roam free, but the value of the remaining sample
value is determined by the fixed sum. The space inhabited
by the deviations from the sample mean is thus n-1
dimensional rather than n dimensional, since the deviations
must sum to zero. The sum of the squared deviations,
although looking like a sum of n things is actually a sum of
only n-1 independent things, and its natural divisor - its
degrees of freedom - is also n-1.
But wait, there's more. Degrees of freedom will return to
haunt you if and when you do analysis of variance
(ANOVA to its friends). You will be ahead of the game if
you grasp the concept now. Degrees of freedom can neither
be created nor destroyed. You start off with n, the sample
size. You use up a few trying to estimate the structure of
the mean. For example, the mean could be a straight line, as
in simple linear regression. You need two degrees of
freedom to estimate the two characteristics possessed by all
straight lines - a slope and an intercept. These
characteristics are called parameters. So, two degrees of
freedom have gone into the mean. This is the signal.
Everything else in this model is noise. The remaining n-2
degrees of freedom go into estimating the one parameter
which describes the noise - the variance. If you're not part
of the solution (the signal or mean) you're part of the
problem (the noise). The simplest model is the one which
says the mean is a single constant, ably estimated by the
sample mean. Everything else is just inexplicable variation
about that constant. That's n-1 degrees of freedom's worth
of noise, all kindly donated to the sample variance.
FROM:
http://www.maths.murdoch.edu.au/units/statsnotes/samples
tats/stdevmore.html
Name _______________________________
AP STATISTICS CHAPTER 1:
CUES
In problem 1.14 on page 23, the salaries of CEO’s
from 59 small businesses was provided (in thousands
of dollars). This data is now held safely in your
calculators, in L1. Give the mean and standard
deviation of this data, and create a boxplot. Describe
the shape of the distribution.
Consider the following changes to the data. Use your
graphing calculator to again determine the mean and
standard deviation, and describe the shape of the
distribution. Create side-by-side boxplots to display
the distribution.
1. The salary of each CEO is increased by 80
thousand dollars. (Use L2)
2. The salary of each CEO rises 30%. (Use L3)
3. The salary of each CEO falls 40%, but then rises
$20,000. (Use L4)
How do changes to a data set, through addition,
subtraction, or multiplication, affect the mean and
standard deviation of a distribution?
LINEAR TRANSFORMATION: A linear
transformation changes the original variable x into
the new variable xnew given an equation by the form:
xnew = a + bx .
Effects of a:
Effects of b:
Effects on the shape of the distribution:
SUMMARY/QUESTIONS TO ASK IN CLASS
NOTES
Name _______________________________
AP STATISTICS CHAPTER 1:
CUES
Consider the following data and graph concerning
heights of freshmen boys:
Line Scatter Plot
Heights of Freshman Males
Heights of Freshman Males
HeightIn... Percent...
18
<new >
16
=
1
64
3
2
65
8
3
66
13
4
67
14
5
68
16
6
69
15
7
70
11
8
71
8
4
PercentOfFreshmen
14
12
10
8
6
9
72
4
2
10
73
4
0
11
74
2
12
75
1
13
76
1
64
66
NOTES
What percent of freshmen have heights below 70
inches?
What is the median height of a freshman boy?
DEFINITION: The pth percentile of a distribution
is the value such that p percent of the observations
fall at or below it.
What is the 30th percentile of this distribution?
What is the 90th percentile?
At what percentile would a student of height 71
inches fall?
An OGIVE, or a relative frequency graph allows
us to view the relative standing of individual
observations. The y-axis in this type of graph tells
the relative cumulative frequency for the distribution.
Create an ogive for the height of freshmen males data
set.
SUMMARY/QUESTIONS TO ASK IN CLASS
68
70
72
HeightInches
74
76
78