Download 1.2 Describing Distributions with Numbers Shape, center, and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
1.2 Describing Distributions with Numbers
Shape, center, and spread provide a good description of the overall pattern of any distribution for
a quantitative variable.
Mean—the most common measure of center. It is the average. To find the mean ( X -pronounced x-bar) of a set of observations, add the values together and divide by the number (n)
observations.
Σ—The capital Greek symbol sigma—means to add everything together
The mean is sensitive to the influence of a few extreme observations. A skewed distribution
pulls the mean toward its long tail. This means the mean is not a resistant measure of center. Its
value may change
Median—the midpoint of the distribution. Half of the observations of smaller than the median
and half are larger than the median.
To find the median of a distribution
1. Arrange all observations in order of size, from smallest to largest
2. If the number of observations n is odd, there is one center observation. The median M
is the center observation in the ordered list.
3. If the number of observations n is even, the median M is the mean(average) of the two
center observations in the ordered list.
The median is not affected by outliers or skewed distributions. It is resistant.



The mean and median values of a symmetric distribution are close together.
If the distribution is exactly symmetric, the mean and median are the same value.
In a skewed distribution, the mean is farther out in the long tail than the median.
Range—the difference between the largest and smallest observations. The range shows the full
spread of the data. Remember, either of these observations may be outliers.
Quartiles—divides the observation set into 4 equal (roughly) parts.
Second Quartile-- divides the distribution in half. This is the median
First Quartile—represented by Q1--The “median” of the smaller half of the data.
Third Quartile-- represented by Q3-- The “median” of the larger half of the data.
To calculate Q1 and Q3:
1. Find the median value in the ordered list of observations.
2. Q1 is the median of the observations whose position in the ordered list is to the left of
the location of the overall median.
3. Q3 is the median of the observations whose position in the ordered list is to the right of
the location of the overall median.
Interquartile Range (IQR)—the distance between the first and third quartiles.
IQR = Q3 - Q1
50% of the observations are in the IQR
Outliers—an observation which falls more than 1.5 x IQR above the third quartile or below the
first quartile.
The Five-Number Summary—For a data set, it consists of the smallest observation, the first
quartile, the median, the third quartile, and the largest observation. It is written in order from
smallest to largest.
Minimum
Q1
M
Q3
Maximum
Boxplot -- a graph which shows the five-number summary of a distribution.
How to make a boxplot:
1. Make an axis, include a scale in even increments and title the graph.
2. Make a box, with the side lengths at the first and third quartiles. Locate the median,
which is the center of the distribution, and draw a line through the box at the median.
3. Lines extend from the box out to the smallest and largest observations.
http://www.google.com/imgres?imgurl=http://edubuzz.org/blogs/nbhs3x1/files/2007/01/boxplo
http://www.google.com/imgres?imgurl=http://admin-apps.isiknowledge.com/JCR
Boxplots give an indication of the symmetry or skewness of a distribution. When symmetric, the
first and third quartiles are equally distant from the median.
Boxplots show less detail than other graphs, so they are best for comparing more than one
distribution. Boxplots can be horizontal or vertical.
http://www.google.com/imgres?imgurl=http://www.introtoie.com/BoxPlot.gif&imgrefurl=
Modified Boxplot—this plots outliers as individual points. Modified boxplots show more detail.
For our book, when it says boxplot, they are talking about a modified boxplot.
To make a modified boxplot.
How to make a boxplot:
1. Make an axis, include a scale in even increments and title the graph.
2. Make a box, with the side lengths at the first and third quartiles. Locate the median,
which is the center of the distribution, and draw a line through the box at the median.
3. Observations more than 1.5 x IQR (outliers) are plotted individually.
4. Lines extend from the box out to the smallest and largest observations that are not
outliers.
http://www.google.com/imgres?imgurl=http://www.unc.edu/~nielsen/soci709/m10/m10007b.jpg&imgrefurl
Page 46-47 in your book has how to use the graphing calculator to make a boxplot.
Standard Deviation—measures spread by how far the observations are from the mean.
Variance --- s2—The average of the squared of the deviations of the observations from their
mean. It is the sum of the squared deviations divided by one less than the number of
observations.
Standard deviation—s—the square root of the variance (s2)