Download Variables fall into two main categories: A categorical, or qualitative

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
AP Statistics
Chapters 0 & 1 Review
Variables fall into two main categories:
A categorical, or qualitative, variable
places an individual into one of several
groups or categories.
A quantitative variable takes numeric
values for which arithmetic operations
make sense.
The distribution of a variable tells
us what values the variable takes
on and how often it takes on those
values.
Statistical inference involves
drawing conclusions about a large
group, called the population
by gathering information from a
smaller subgroup, called the
sample.
The main statistical designs for producing
data are surveys, experiments and
observational studies.
In an observational study, we observe
individuals and measure variables of
interest but do not attempt to influence
the responses.
In an experiment, we deliberately do
something to individuals in order to
observe their response.
What two types of graphs are typically
used for categorical variables?
Pie Charts and Bar Graphs
What two types of graphs are typically
used for quantitative variables?
Dot plots, Stem plots and histograms
Please know:
Cumulative frequency histogram
Relative frequency histogram
When you describe the distribution pay
special attention to the …
shape: overall pattern, symmetric or skewed.
The length of the “tails” will tell us whether a graph (i.e.
distribution) is left-skewed (left tail is the longest) or
right-skewed (the right tail is the longest).
modes: the values that occur most often (i.e. peaks)
unimodal - one major peak,
bimodal - two major peaks
Center: the middle
The two most common measures of center
are the mean and the median.
Spread: how varied (i.e. spread out is the data
The IQR and standard deviation are probably
the two most common measures of spread.
Outliers: any value(s) that fall outside the overall
pattern.
When you have to describe the shape of a
distribution, don’t get mad,
C
E
N
T
E
R
U
N
U
S
U
A
L
S
P
R
E
A
D
S
H
A
P
E
Measuring Center: The Mean & Median
To calculate the mean, add the values of the
observations and divide by the number of
observations.
The mean of a sample is denoted x ,
pronounced x-bar.
The mean of a population is denoted  , the
Greek letter Mu.
Measuring Center: The Median
The median (denoted by M) is the midpoint of a
distribution:
To calculate the median….
1. Order the observations from smallest to
largest.
2. If the number of observations is odd, the
median is simply the middle value in the list.
You can find the location by counting (n+1)/2
observations from the bottom (or top).
3. If the number of observations is even, you
should average the two middle numbers. The
location of the median is again (n+1)/2 from
the bottom or top of the list.
EXAMPLE:
Consider the following set of numbers…
13, 25, 28, 36, 47
28
M= _______
29.8
x =________
Now, consider adding a 6th number, say 104.
M= _______
32
42.1 6
x =________
We say that the median is an outlier resistant
measure of center, while the mean is not.
Mean versus Median
The mean and median of a roughly symmetrical
distribution will be close together. If the distribution is
exactly symmetric, the mean and median are equal.
In a skewed distribution, the mean is farther out in the
long tail than the median. In a skewed distribution, the
median is the more accurate measure of center.
In descriptions of data, the “average” value of a
variable is usually referred to as the mean whereas the
“typical” value is usually referred to as the median.
Measuring Spread: The Quartiles
One way to measure spread, or variability, is to
calculate the range, which is the difference between
the largest and smallest observations.
Another way to describe the spread of a distribution is
by considering different percentiles. The pth percentile
of a distribution is the value that has p percent of the
observations at or below it. The median is the 50%
percentile. The 25th percentile is called the 1st quartile
while the 75th percentile is called the 3rd quartile.
The Five-Number Summary and Boxplots
The five-number summary of a set of
observations consists of the smallest value, the
1st quartile, the median, the 3rd quartile and the
largest value.
The five-number summary can be presented
visually by a boxplot.
The 1.5IQR Rule for Outliers
The distance between the 1st and 3rd quartiles is called
the interquartile range, which is abbreviated IQR for
obvious reasons.
The quartiles and IQR are resistant to changes in either
tail of a distribution.
****Since the median and the IQR are resistant to
outliers, they should be used when describing a skewed
distribution.
We will call a data value a
“suspected” outlier if it falls more
than 1.5 x IQR above Q3 or below
Q1.
In a modified boxplot, the whiskers extend only
to vlaues not “flagged” as outliers and asterisks
are used to denote any outliers.
Measuring Spread: The Standard Deviation
The standard deviation measures spread by
determining how far each value is from the
mean and then “averaging” these distances.
The standard deviation of a sample is denoted
by s.
The standard deviation of a population is
denoted  , the Greek letter Sigma.
The following formula is used to compute the
standard deviation of a sample.
1
2
s
 xi  x 
n 1
The variance of a set of observations, s 2 or  2 , is
simply the square of the standard deviation.
Properties of the Standard Deviation
1. s measures spread about the mean and should be
used only when the mean is used as the measure of
center
2. s = 0 only when there is no spread/variability (i.e. all
the values are the same . Otherwise, s > 0. As the
observations become more spread out about their
mean, s gets greater.
3. s, like the mean x , is not resistant to outliers. A few
outliers can make s very large. Distributions with
outliers and strongly skewed distributions have very
large standard deviations. As such, the number s does
not give much helpful information about such
distributions.
Choosing Measures of Center and Spread
The five number summary, in particular the
median and the IQR, is usually better than the
mean and standard deviation for describing a
skewed distribution or a distribution with strong
outliers.
Use x and s only for reasonably symmetric
distributions that are free of outliers.
Adding the same number, a, to each
observation adds a to the measure of
center but does not affect the measure of
spread.
Multiplying each observation by the same
number, b, multiplies both the measures of
center and spread by b.