Download σ μ - Palm Beach State College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
Chapter 3 Statistics for Describing, Exploring and Comparing Data
Section 3.2 Measures of Center
Measure of Center is the value at the center or middle of a data set.
-mean, median, mode and midrange
Notation:

x
n
N
denotes the sum of a set of values.
is the variable usually used to represent the individual data values.
represents the number of data values in a sample.
represents the number of data values in a population.
Note: When rounding the value of a measure of center, carry one more decimal place than is present in the original
data set. -Round only the final answer, not the calculations in the middle.
Measure of
center
Arithmetic
Mean
Definition
Formula
the measure of
 x mean of sample values
x
center obtained by
n
adding the values
 x mean of all values in a
and dividing the total  
N
by the number of
population
values.
Example: Find the mean of 12, 14, 10, 8, 16, 8, 16
Find the mean of 6, 9, 15, 12, 5, 4, 7, 3
Comment
Advantages: Is relatively reliable,
takes every data value into
account.
Disadvantage: Is sensitive to
every data value, one extreme
value can affect it dramatically; is
not a resistant measure of center
Median
the middle value
when the original
data values are
arranged in order of
increasing (or
decreasing)
magnitude.
First sort the values (arrange them in
order), the follow one of these
-If the number of data values is odd,
the median is the number located in
the exact middle of the list.
-If the number of data values is
even, the median is found by
computing the mean of the two
middle numbers.
The median is not affected by an
extreme value - is a resistant
measure of the center
Example: Find the median of 12, 14, 10, 8, 16, 8, 16
Find the median of 6, 9, 15, 12, 5, 4, 7, 3
Mode
the value that occurs -Bimodal two data values occur with the Mode is the only measure of
with the greatest
same greatest frequency
central tendency that can be used
frequency.
-Multimodal more than two data values with nominal data
occur with the same greatest frequency
-No Mode no data value is repeated
Example: Find the mode of 12, 14, 10, 8, 16, 8, 16
Find the mode of 6, 9, 15, 12, 5, 4, 7, 3
Midrange
the value midway
between the
maximum and
minimum values in
the original data set.
midrange 
max value  min value
2
Sensitive to extremes because it
uses only the maximum and
minimum values, so rarely used.
Example: Find the midrange of 12, 14, 10, 8, 16, 8, 16
Find the midrange of 6, 9, 15, 12, 5, 4, 7, 3
Mean from a frequency distribution: Assume that all sample values in each class are equal to the class midpoint.
Use class midpoint of classes for variable x.
Example: Find the mean.
Section 3.3 Measures of Variation
Measures how much data values vary. -range, standard deviation and variance
Note: When rounding the value of a measure of center, carry one more decimal place than is present in the original
data set.-Round only the final answer, not the calculations in the middle.
Measure of
Variation
Range
Definition
Formula
Comment
The difference between
the maximum and
minimum value.
Range = (max value) – (min value)
Very sensitive to extreme values;
therefore it is not as useful as
other measures of variation.
Example: Find the range
a) 12, 14, 10, 8, 16
Standard
deviation
It’s a measure of
variation of values
about the mean.
b) 6, 9, 15, 12, 5, 4
s
 x  x 
s
n x 2   x 
2
sample s.d.
n 1
2
n(n  1)
sample s.d.
shortcut
 x   
2

N
population s.d.
Because we generally deal with sample
data we will usually use the formula for
sample standard deviation.
-Most commonly used measure of
variation in statistics.
-The value of the standard
deviation can increase
dramatically with the inclusion of
one or more outliers (data values
far away from all others).
-Values close together have a
small standard deviation, but
values with much more variation
have a larger standard deviation
-The units of the standard
deviation are the same as the
units of the original data values.
Example: Find the standard deviation
a) 12, 14, 10, 8, 16
b) 6, 9, 15, 12, 5, 4
Variance
measure of variation
equal to the square of the
standard deviation.
s 2 sample variance
 2 population variance
Example: Find the variance
a) 12, 14, 10, 8, 16
b) 6, 9, 15, 12, 5, 4
The sample variance is an
unbiased estimator of the
population variance, which
means values of sample
variance tend to target the
value of population
variance.
Standard Deviation from a frequency distribution:
-x represents the class midpoint
-f represents the frequency
-n represents the total number of sample values (add up all the frequencies)
s
 
   f  x 
n  f  x2 
2
n( n  1)
Example: Find the standard deviation.
3.4 Relative Standing and Boxplots
This section introduces measures of relative standing, which are numbers showing the location of data values relative
to the other values within a data set. They can be used to compare values from different data sets, or to compare
values within the same data set. The most important concept is the z score. We will also discuss percentiles and
quartiles, as well as a new statistical graph called the boxplot.
Z-score (or standardized value) is the number of standard deviations that a given value x is above or below the
mean.
(round 2 decimal places)
z
xx
s
z
x

Example: For men, the heights yield a sample mean of 68.34 in. and sample standard deviation 3.02 in.; the
weights yield a sample mean of 172.55 lbs and sample standard deviation of 26.33 lbs. Which value is more
extreme: 76.2 in. man or 237.1 lb man?
Interpreting z-scores:
Whenever a value is less than the mean, its corresponding z score is negative.
Ordinary values:
Unusual Values:
–2 ≤ z score ≤ 2
z score < –2 or z score > 2
Example: The U.S. Army requires women’s heights to be between 58 inches and 80 inches. Women have heights with
a mean of 63.6 inches and standard deviation of 2.5 inches. Find the z-score corresponding to the minimum and
maximum height requirement. Determine whether the minimum and maximum heights are unusual.
Percentiles: Percentiles are measures of location. There are 99 percentiles denoted
set of data into 100 groups with about 1% of the values in each group.
For example: the 40th percentile, denoted
P40 has about 60% of the data values above it and 40% below it.
Finding the percentile of a data value:
Percentile of x 
number of values less than x
 100%
total number of values
Example: 34, 36, 39, 43, 51, 53, 62, 63, 73, 79
What is the percentile of the 51?
What is the percentile of the 73?
P1 , P2 , P3 ,..., P99 , which divide a
Quartiles: Are measures of location, denoted
25% of the values in each group.
Q1 ,Q2 and Q 3 , which divide a set of data into four groups with about
Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%.
Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%.
Q3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
Example: 34, 36, 39, 43, 51, 53, 62, 63, 73, 79
What is
Q2 ?
What is
Q1 ?
What is
Q3 ?
Boxplots: A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the
minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third
quartile, Q3
Q1
(minimum)
Q2
(median)
Q3
(maximum)
Example: A simple random sample of pages from Merriam Webster Dictionary was obtained. Listed below are the
numbers of defined words on those pages, and they are arranged in order. Construct a box plot and include the values
of the five number summary ( Q1 , Q3 , median, min and max value). Also determine if there are any outliers.
`
34, 36, 39, 43, 51, 53, 62, 63, 73, 79