Download Topic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Topic: Chapter 7
Scatterplots,
Associations, and
Correlation
Name: _____________________________________________________
Objectives: We will be able to use scatterplots to display the relationships
between two sets of quantitative data.
What do we use to
describe a
distribution?
When describing distributions, we need to discuss shape, center, and spread. How we
measure the center and spread of a distribution depends on its __________________. The
center of a distribution is a “typical” value. If the shape is unimodal and symmetric, a
“typical” value is in the _______________________. If the shape is skewed, however, a “typical”
value is not necessarily in the middle.
Skewed
Distributions:
How can we describe
skewed distributions?
For _______________________ distributions, use the ______________________ to determine the
______________________ of the distribution and the ____________________________________________ to
describe the ______________________ of the distribution.
The median:
 is the _______________ data value (when the data have been _______________) that
divides the histogram into two equal _______________
 has the same _______________ as the data
 is _______________ to outliers (extreme data values)
The range:
 is the difference between the _______________ value and the _______________ value
 is a _______________, NOT an _______________
 is _
___to outliers
The interquartile range:
 contains the _______________ of the data
 is the difference between the _______________ and _______________ quartiles
 is a _______________, NOT an _______________
 is _______________ to outliers
Symmetrical
Distributions:
How can we describe
symmetrical
distributions?
For _______________ distributions, use the _________________ _to determine the _______________ of
the distribution and the _______________ to describe the _______________ of the distribution.
The mean:
 is the arithmetic _______________ of the data values
 is the ____
_ of a histogram
 has the same _______________ as the data
 is _______________ to outliers
 is given by the formula
The standard deviation:
 measures the “typical” distance each data value is from the _______________
 Because some values are above the mean and some are below the mean, finding
the sum is not useful (positives cancel out negatives); therefore we first
_______________ the deviations, then calculate an _______________ _______________ . This is
called the _______________. This statistics does not have the same units as the data,
since we squared the deviations. Therefore, the final step is to take the
_______________ of the variance, which gives us the _______________
.
 is given by the formula

How can calculate
standard deviation
by hand?
is _______________
_to outliers, since its calculation involves the _______________
It takes into account how far EACH data point is from the mean of the data set.
A High standard deviation shows that many of the data values are scattered far from the
mean.
A Low Standard Deviation show that many of the data values are close to the mean.
Example: To find the Standard Deviation using the data set below:
2
s
2
 (x  x)

2
n 1
1) Find the mean of the data set
Original Values
6
 (x  x)
2
Divide by n – 1
n 1
Finally take the square root
9
11
14
___________
Deviations
Now add up the squared deviations.
s
6
Squared Deviations
Why do we call a box
plot a five-numbersummary?
Create a box plot for the following information:
1) Create a box plot for the following information:
Max
47 years
Q3
22
Median
19
Q1
17
Min
13
2) 80, 82, 84, 86, 90, 81, 91, 82, 83, 77
Thinking about Variation:
 ________________________ is an important fundamental concept in Statistics.
 It helps us to be precise about what we don’t know. If the data values are
scattered far from the center, the IQR and the standard deviation will be large. If
the data values are __________ to the center, then these measures of _________________
will be large.
Shape, center, and spread:
 You should always report the shape of a distribution and include a center and
spread.
o Skewed: report the __________________
o You might want to include the _________________________________, but you
should point out why the ________________________________ differ. The fact that
the
o Symmetric: report the mean and standard deviation and possibly the
median and IQR. The ___________ is usually a bit larger than
the_______________________.
o If there are any clear outliers present and you are reporting the mean and
standard deviation, report them with:
Summary: