Download Chapter 5B: Exploring Data. Another way to describe the center is by

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 5B: Exploring Data.
Another way to describe the center is by mode. The mode of a distribution is the most frequently occurring value.
A set of data that has a tie in terms of most frequently occurring value is said to be multi-modal.
Mean vs. Median vs. Mode.
These three measures of the center are an effective way to describe data. However, in certain instances, one method
is usually better than the others.
When using qualitative data, the mode is the best way of describing the center of the values. The mode would be
the most popular category, thus would be the best way to describe the center of the data. This is also apparent
because mean and median are meaningless values for qualitative information.
Also, as stated earlier, outliers greatly affect the mean. Thus, if your histogram is symmetric, the mean is the most
useful of the three. If your histogram is skewed, your most useful measure is the median.
To recap:
If your variable is qualitative, mode is the most useful.
If your variable is quantitative and its histogram is skewed left or skewed right, median is most useful.
If your variable is quantitative and its histogram is symmetric, mean is most useful.
Describing the Spread Using The Median.
The range of a set of data points is the nonnegative difference between the largest and the smallest observations.
Range = Max - Min.
The quartile system is a way to improve description of spread. Using the median to divide the data into two halves,
the median of the lower half is called the First Quartile (Q1). The median of the upper half is called the Third
Quartile (Q3).
Your TI-83 or 84 calculator is of great use for lists that are extensive. Again, you may use any calculator during group
work and quizzes, but you will only be allowed to use a ti-83 calculator that I will provide you during the exam.
To input a list of data in your calculator, click STAT B 1 B ENTER. Then type in your data one at a time. When
youre done, click 2nd B MODE to quit.
To have the calculator compute statistics from your data, click STAT B CALC B ENTER B ENTER. You will be
given a list of values to interpret: x is your mean, n is the number of individuals in your data, and if you scroll down,
you will see the five number summary.
Example 1: Determine Q1 and Q3 of the following values: 15, 29, 33, 18, 22, 19, 27, 25, 22, 21, 22.
Example 2: Using the data from the previous worksheet, determine Q1 and Q3 in Example 1 (State Counties). The
data values are: 67, 75, 67, 159, 64, 82, 100, 46, 95.
Five Number Summary and Boxplots.
The Five Number Summary of a set of data is a list of the following five observations: the minimum, the first
quartile, the median, the third quartile, and the maximum. These five numbers break the data points into four equal
groups with equal numbers of observations. A boxplot can be created to display the Five Number Summary.
Describing the Spread Using The Mean.
The standard deviation of a histogram measures how your data values deviate from the mean (which is sometimes
denoted as µ or x).
Standard deviations, along with the mean describe variability your observed values, much like with the quartiles and
the median.
The standard deviation (usually denoted as σ) can be computed from a formula (or your calculator). You will not
have to compute this value, but you will need to be able to use and interpret it, as illustrated below.
Bell Shaped Curves and Normal Distributions.
Many histograms are symmetric, single-peaked, and has a distinct bell shape. Statisticians usually super impose a
smooth curve over these histograms. The curve is an idealized description of the distribution. Such a curve is called
a normal curve and a distribution whose shape is described by that curve is called a normal distribution.
Characteristics of a normal curve:
• The x-value corresponding to the highest point in the middle is the mean. (It is also the median and mode.)
• The area between the curve and the x-axis sums up to 1.00, or 100%.
• 68% of the total area of the curve will fall within one standard deviation away from the mean.
• 95% of the total area of the curve will fall within two standard deviations away from the mean.
• 99.7% of the total area of the curve will fall within three standard deviations away from the mean.
Collectively, the last three items listed above is called the Empirical Formula, or the 68 − 95 − 99.7 rule.
An outlier is a data point in your histogram that lies beyond three standard deviations of your mean. By this
definition, data in a normal distribution will have 0.3% of its entries as outliers.
Example 3: Analyzing the Normal Curve and the Empirical Formula.
Compute the areas of each of the shaded regions. Based on this result, in a normal distribution, what percent of our
data should we expect to be left and right outliers?
Example 4: Grading on a Curve. Some teachers grade on a bell curve. One way to do this is to assign a C to
students that fall one standard deviation away from the mean. Students that scored between one and two standard
deviations above the mean would receive a B, and students who scored more than two standard deviations above the
mean would receive an A. The teacher would then symmetrically assign Ds and Fs.
Suppose that there were 200 students in the class and that the scores followed the Empirical formula. Determine the
class grade breakdown.
Example 5: Human Heights. The height (in inches) of men in the United States follows a normal, bell shaped
curve. The mean is 69.3 inches and the standard deviation is 2.8 inches. Answer the following:
1.
2.
3.
4.
5.
What
What
What
What
What
is
is
is
is
is
the
the
the
the
the
range of heights that would fall within one standard deviation of the distribution?
range of heights that would fall within two standard deviations of the distribution?
range of heights that would fall within three standard deviations of the distribution?
largest whole number value that would be considered to be a left outlier?
smallest whole number value that would be considered to be a right outlier?