Download Statistical Measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical Measures
Mrs. Watkins
AP Statistics
Chapters 5,6
MEASURES OF CENTER
Mean: arithmetic average of all data values
population mean:
sample mean:
Formula:
Mode: the most common value in a data set
Median: the middle value in a data set
Midrange: average of the extremes
Trimmed Mean: when you find the mean
of data set with a certain percentage of
data values trimmed of the ends of the
distribution
Ex:
5 number summary
5 important numbers in data set:
Min:
Q1:
Med:
Q3:
Max:
Q1, Med, Q3, may not be actual data values
BOXPLOT
graphical display of data using 5 number summary
(if outliers shown, called “modified box plot”)
OUTLIERS
Outliers:
IQR Test for Outliers:
(IQR )(1.5) = multiplier M
Q1 - M = outlier lower bound
Q3 + M = outlier upper bound
If values exceed these bounds, they are outliers
RESISTANCE
Resistant Measures:
Non-resistant Measures:
Mean, Midrange:
Median, IQR, Trimmed Mean:
MEASURES OF SPREAD
Range: the spread between high and low
Resistant?
IQR (Interquartile Range) :
Resistant?
STANDARD DEVIATION
a measure of the average amount of deviation from
the mean among the data values
Population St. Deviation:
Sample St. Deviation:
We generally use sx because we usually do not have
entire population.
VARIANCE
the square of the standard deviation
what you get before taking square root
Population Variance:
Sample Variance:
This measure not used much in elementary
statistics but you need to know what it is.
Coefficient of Variance
measure of how relatively large a st. dev. is
Ex: St. deviation of IQ = 15, Mean 100
St. deviation of height = 3 in, Mean 69
“Comment on the distribution”
You now have numbers to support your
statements, rather than just graphs.
SHAPE:
OUTLIERS:
CENTER:
SPREAD: how widely does the data vary?
Unusual Features: gaps, clusters
SHAPE
If the mean > median, then data distribution
is skewed ________The mean is in the tail.
If the mean < median, then data distribution
is skewed ________The mean is in the tail.
If the mean ≈ median, then data distribution
is approximately ____________.
SHAPE
Symmetric if mean = median
SKEWNESS
Skewed left if mean < median
Skewed right if mean > median
Left
Right
Mean is in the tail of the data
OTHER SHAPES
Uniform distribution: all values relatively
evenly distributed across interval
Bimodal distribution: two peaks
TRANSFORMATIONS TO DATA
What would happen to the statistical measures
if each data value had a constant added to or
subtracted from it?
Mean:
Standard Deviation:
Median:
IQR:
What would happen to the statistical measures
if each data value had a constant multiplied or
divided by it?
Mean:
Standard Deviation:
Median:
IQR:
TRANSFORMATIONS TO DATA SET
What would happen to the statistical measures
if one very low or very high data value was
added to the set?
Mean:
Standard Deviation:
Median:
IQR:
MEASURES OF POSITION
Give a numerical approximation of where a
single data value stands compared to the
whole distribution
Quartiles:
Percentiles:
Z Scores:
Z SCORES
standardized score
how a single value compares to entire data set
in terms of position in distribution
z=
How unusual are you?
Compute your z score for height?
Compute your z score for Math SAT?
Compute your z score for IQ?
NORMAL MODEL
shows how data is distributed symmetrically
along an interval according to empirical rule
Empirical Rule:
of data within 1 st. deviation of μ
of data within 2 st. deviations of μ
of data within 3 st. deviations of μ
ANOTHER OUTLIER TEST
Using Empirical Rule:
Data values of z > +2 st. deviations away
from mean are mild outliers
Data values of z > +3 st. deviations away
from mean are extreme outliers
NORMAL CURVE
a theoretical ideal about how traits/characteristics
are distributed
Many human traits are approximately normally
distributed such as height, body temp, IQ, pulse
Avoid using “normal” when describing data—say
“approximately normal or symmetric” unless
clearly mound-shaped, bell-shaped
NORMAL CURVE
Normal curve—symmetric, mound-shaped
Area under curve=
A z score can be used to establish what % of
the curve is less or more than the z score,
and establish probability of a data value being
in that position.
FINDING PERCENTILE/PROBABILITY
USING NORMAL CURVE
1. Calculate z score for data value
2. Use calculator: normalcdf under DISTR
key
Looking for area > z score: normalcdf (z, ∞)
Looking for area < z score: normalcdf (∞, z)
Looking for area between z scores:
normalcdf (z1, z2)
FINDING CUT OFF SCORES
If you are given a percentile or probability, and
need to determine the “cut off score”
1. Sketch curve to determine where z score is located.
2. Determine if you want area above or below this
percentile
3. Use INVNORM on calculator
invnorm(percentile)= z score
4. Use z score formula to solve for x.
Does the data fit a normal model?
1. Check mean and median
2. Make a NORMAL PROBABILITY PLOT—
3. Make a BOXPLOT on calculator.
AVOID using histograms on calculator to check.