Download Math 123- Statistics Name_______________________________

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Math 123- Statistics
Chapter 2 Notes
Name_______________________________
2.1 Frequency Distributions and Their Graphs
Def- A frequency distribution is a table that shows classes or intervals of data with a count of the
number of entries of each class.
Def- The frequency of a class is the number of data entries in the class.
ExClass Limits
1–4
5–8
9 – 12
13 – 16
Frequency
2
4
1
7
The lower class limits are __________________.
The upper class limits are __________________.
The class width is _____.
The range is _____.
To Construct a Frequency Distribution:
1. Decide on the number of classes to use.
2. Find the class width by determining the range of the data set and dividing this by the number of
classes, then round up to the next whole number.
3. Find the class limits.
4. Make tally marks for each data entry.
5. Count tally marks to find the frequency.
Ex- Construct a frequency distribution (frequency table) for the number of siblings of each member
of the class. Use four classes.
Class Limits
Frequency
Midpoint
Relative Frequency
Cumulative Frequency
Def- The midpoint of a class is the average of the lower and upper class limits.
Midpoint=
Def- The relative frequency is the portion (or percent) of data that falls into that class.
Relative frequency=
Def- The cumulative frequency of a class is the sum of the frequencies for that class and all
previous classes.
Ex- For the class sibling data, find the midpoints, relative frequency, and cumulative frequency.
(Table is on previous page.)
Def- A frequency histogram is a bar graph that represents the frequency distribution of a data set
with the following properties:
1. The horizontal scale is quantitative and measures the data values.
2. The vertical scale measures the frequency.
3. Consecutive bars must touch.
Def- Class boundaries are the numbers that separate classes without forming gaps between them.
To find and upper class boundary, add .5 to the upper class limit.
Ex- Make a frequency histogram for the following data. The data is income (in thousands of dollars)
of 20 employees at a small business.
30, 28, 26, 39, 34, 33, 20, 39, 28, 33, 26, 39, 32, 28, 31, 39, 33, 31, 33, 32
Def- A frequency polygon is a line graph that emphasizes the continuous changes in frequency.
1. Make a sketch where the x-axis contains the midpoints and the y-axis contains the frequency.
2. Plot points using midpoints, then connect the dots.
3. Draw extra dots on both ends to extend to the x-axis.
Ex- Create a frequency polygon using the income data.
Def- A relative frequency histogram has the same shape and horizontal scale as the corresponding
frequency histogram, but the y-axis is the relative frequency.
Ex- Draw a relative frequency histogram for the income data.
Def- A cumulative frequency graph or ogive is a line graph that displays the cumulative frequency on
the y-axis and the class boundaries on the x-axis. The first point is always on the x-axis.
Ex- Construct an ogive for the income data.
2.2 More Graphs and Displays
Def- A stem-and-leaf plot is another way to display quantitative data. The stem contains the leftmost digits and the leaf contains the right-most digit.
Def- A split-stemplot is a type of stem-and-leaf plot where two or more lines are used for each stem.
Ex- Make a stem-and-leaf plot for the following data. The data is the daily high temperature during
January 2000 in Chicago, IL.
33, 31, 25, 22, 38, 51, 32, 23, 23, 34, 44, 43, 47, 37, 27, 25, 28, 35, 21, 24, 20, 19, 23, 27, 24, 13,
18, 28, 17, 25, 31
Ex- Make a split-stemplot using two lines for each stem.
Def- In a dotplot, each data entry is plotted using a point above the x-axis.
Ex- Using the previous example, construct a dotplot of the data.
Def- A pie chart is a graph that shows relationships of parts to a whole. To do this, find the relative
frequency for each data entry.
Ex- Make a pie chart for the following data.
Breed
Labrador Golden
German
Retriever Retriever
Shepard
# Registered 173
46
44
(thousands)
Dachshund
Beagle
Poodle
66
52
55
Yorkshire
Terrier
58
Def- A Pareto chart is a vertical bar graph in which the height of each bar represents the frequency
or relative frequency. The bars are positioned in order of decreasing height.
Note: A Pareto chart is a type of bar graph. Bar graphs are graphs where the bars do not touch and
are used for qualitative data.
Ex- Construct a Pareto chart for the example above.
Def- If two data sets have the same number of entries and each entry in the first data set
corresponds to one entry in the second data set, then the sets are called paired data sets.
Ex- When a plant’s age is compared to a plant’s height, this is a paired data set because each plant
has an age and a height.
Def- A scatterplot is used to graph paired data sets. The ordered pairs are graphed as points on a
coordinate plane.
Ex- The following are the height and number of stories of several buildings in Miami. Create a
scatterplot using the data.
Height (feet)
764 625 520 510 484 480 450 430 410
Number of Stories 55
47
21
28
35
40
33
31
40
Def- A data set that has entries at regular intervals of time is called a time series.
Def- A time series chart is used to graph a time series.
Ex- Make a time series plot using the data below. The data is based on the number of classes that
Miss Sutter taught during the specified year.
Year
# Classes Taught
2000
2001
2002
2003
2004
2005
2006
2007
Ex- Find the class limits and class boundaries for the set of data. Use five classes. Check your
answers with others around you.
Data: 23, 23, 25, 26, 27, 28, 28, 29, 29, 29, 29, 33, 33, 33, 33, 33, 35, 36, 36, 36, 37, 37, 38, 38, 39,
39, 39, 42, 42, 42, 42, 43, 43, 43, 44, 44, 44, 44, 45, 46, 46, 46, 47, 49, 49, 49
Ex- Use your knowledge of graphs to fill in the following chart.
Type of Graph
Uses Class Uses Class Has Bars Qualitative Quantitative
Boundaries Midpoints
Data
Data
Freq Histogram
Freq Polygon
Rel Freq Hist
Ogive(Cum Freq)
Stem-and-leaf Plot
Dotplot
Pie Chart
Pareto Chart
Scatterplot
Time-Series Plot
2.3 Measures of Central Tendency
Def- A measure of central tendency is a value that represents a typical or central entry of a data set.
The most common types are mean, median, and mode.
Def- The mean of a data set is the sum of the data entries divided by the number of entries.
Population Mean =
Sample Mean =
Def- The median of a data set is the middle entry when the data set is ordered in ascending order.
If the data set has an even number of entries, the median is the mean of the two middle data
entries.
Def- The mode of a data set is the data entry that occurs with the greatest frequency. If no entry is
repeated, then the data set has no mode. If two entries occur with the same greatest frequency,
each entry is a mode and the data set is called bimodal.
Ex- Find the mean, median, and mode of each set of data.
a) The price of gas per gallon in 1995. $1.75, $1.35, $1.39, $1.69, $1.27, $1.53
b) The length of a child’s foot measured in centimeters. 9, 9, 10, 10, 10, 11, 12, 12, 13, 14, 14, 15
Def- An outlier is a data entry that is far removed from the other entries in the data set.
Def- A weighted mean is the mean of a data set whose entries have varying weights. A weighted
 ( xw) where w is the weight of each entry x.
mean is given by the formula x 
w
Ex- Six test are given in a class. The first five tests are each worth 15% of the grade and the last
test is worth 25% of the grade. Find the weighted mean of the test scores where the scores are 75,
67, 86, 77, 79, 88.
Def- The mean of a frequency distribution for a sample is approximated by the formula x 
 xf
where x= midpoint of the class, f is the frequency of a class, and n   f .
n
Ex- Find the mean of the frequency distribution. The data is the height of male students enrolled in
a P.E. class.
Height (in inches) Frequency Midpoint
of Class
63 – 65
2
66 – 68
4
69 – 71
8
72 – 74
5
75 – 77
2
Shapes of Distributions
Symmetric, Uniform (rectangular), Skewed Left (negatively skewed), Skewed Right (positively
skewed)
Def- A frequency distribution is symmetric when a vertical line can be drawn through the middle of
the graph and both halves look the same.
Def- A frequency distribution is uniform or rectangular if all classes have equal frequencies.
Def- A frequency distribution is skewed left or negatively skewed if its tail extends to the left.
Def- A frequency distribution is skewed right or positively skewed if its tail extends to the right.
Ex- Match the distribution with one of the graphs stated in the definitions above.
a) The frequency distribution of all ages of people who have played baseball.
b) The frequency distribution of scores on a test where a majority of the class receives low scores.
Ex- Find the mean of the data set shown below.
2.4 Measures of Variation
Ways to Measure the Variation of a Data Set
Range, Deviation, Population Variance, Population Standard Deviation, Sample Variance, Sample
Standard Deviation
Def- The range of a data set is the difference between the maximum and minimum data entries.
Range=
Def- The deviation of an entry x in a population data set is the difference between the entry and the
mean of the data set.
Deviation of x=
Def- The population variance of a population data set of N entries is  2 
 (x  )
N
2
.
Def- The population standard deviation of a population of N entries is the square root of the
population variance.

 (x  )
2
N
Visual Representation of a Standard Deviation
Data set for the population: 5, 6, 7, 8, 9, 10, 11
Distances from  2 : 32 ,2 2 ,12 ,0 2 ,12 ,2 2 ,32
Average of distances from  2 :
9  4  1  0  1  4  9 28

 4 (variance)
7
7
Standard deviation:
42
Ex- Find the population variance and the population standard deviation of the data set. The data
consists of the number of inches of snow on Mt. Shasta during a 10 day period.
15, 10, 11, 4, 17, 19, 28, 35, 9, 18
Def- The sample variance of a set of n entries is s 2 
 (x  x)
n 1
2
.
Def- The sample standard deviation of a set of n entries is the square root of the sample variance.
s
 (x  x)
2
n 1
Ex- The following data is a sample of teacher’s salaries. Find the variance and standard deviation.
$46098, $36259, $35084, $38617, $42690, $26202, $47169, $37109
Note: The standard deviation tells you the average distance that the data values are from the mean.
A large standard deviation indicates that the data is spread out while a small standard deviation
indicates that the data is clumped together.
Empirical Rule (68%-95%-99.7% Rule)
For data with a symmetric bell-shaped distribution, 68% of the data lies within 1 standard deviation
of the mean, 95% of the data lies within 2 standard deviations of the mean, and 99.7% of the data
lies within 3 standard deviations of the mean.
Ex- The average rate for cable TV from a sample of households was $29 per month with a sample
standard deviation of $2.50 per month. Assume that the distribution of cable TV prices is bellshaped.
a) Between which two values does 99.7% of the data lie?
b) Estimate the portion of cable TV bills that are between $26.50 and $31.50.
c) Estimate the proportion of cable TV bills that are between $26.50 and $36.50.
Chebychev’s Theorem (Applies to all distributions)
The portion of any data set lying within k standard deviations of the mean is at least 1 
1
.
k2
For k = 2, at least _____ of the data lies within 2 std. dev. of the mean.
For k = 3, at least _____ of the data lies within 2 std. dev. of the mean.
Ex- In a sample of 40 customers, each customer spends an average of $23 at the grocery store with
a sample std. dev. of $6. Using Chebychev’s Theorem, at least what percent of customers spend
between $11 and $35 at the grocery store? Note: You will never be tested on Chebyshev’s
Theorem.
Def- The sample standard deviation for a frequency distribution (grouped data) is s 
where x= midpoint of the class, x 
 xf
n
 (x  x)
and n   f .
Ex- Find the mean and standard deviation for the following data. A sample of 45 pigs were
weighed. The data is shown in the table below.
Weight of Pig (lbs) Frequency
Midpts
8 – 12
7
13 – 17
11
18 – 22
24
23 – 27
3
n 1
2
f
Def- The coefficient of variation, CV, of a data set describes the standard deviation as a percent of
the mean.

s
Population: CV  (100%)
Sample: CV  (100%)
x

Note: The coefficient of variation measures the variation of a data set relative to the mean of the
data set.
Ex- Find the coefficient of variation for the two data sets, then compare the results. The following is
a sample of home prices in California and Kentucky (in thousands of dollars).
CA: 760, 317, 490, 324, 568, 736, 488, 172, 224, 374
KY: 270, 399, 230, 499, 140, 120, 99, 250, 325, 300, 235
2.5 Measures of Position
Def- Fractiles are numbers that partition an ordered data set into equal parts.
Def- The quartiles Q1 , Q2 , and Q3 divide an ordered set into four equal parts.
_____ of the data fall at or below Q1 .
_____ of the data fall at or below Q2 .
_____ of the data fall at or below Q3 .
Ex- Find the quartiles of each data set.
a) 1, 1, 2, 3, 5, 6, 7, 7, 7, 9, 10
b) 2, 3, 6, 6, 7, 8, 9, 14
Def- The interquartile range, IQR, is the difference between the first and third quartiles.
IQR=
Five Number Summary
Minimum (smallest data value), Q1 , Q2 (median), Q3 , and Maximum (largest data value)
Def- A box-and-whisker plot or boxplot uses the five number summary and includes all data values.
Ex- Draw a boxplot for each set of data.
a) 11, 12, 14, 17, 17, 18, 18, 18, 20, 31
b) 9, 9, 9, 11, 13, 14, 14, 15, 16, 16, 17
Def- For whole numbers P such that 1  P  99 , the P th percentile of a distribution is a value such
that P% of the data fall at or below it and (100 – P)% of the data fall at or above it.
Ex- If George’s test score was at the 54th percentile, then 54% of the class scored at or below
George’s score and 46% of the class scored at or above George’s score.
Ex- If Mary got a 95% on her test, what percentile rank does her score correspond to?
Note: The following topic will be covered in Chapter 5.
Def- The standard score or z-score represents the number of standard deviations a given data value, x, falls from the
mean  .
z-score z=
Properties of z-scores
If z<0, then x   .
If z=0, then x 
If z>0, then x 
.
.
A data value that has a z-score between -2 and 2 (inclusive) is normal, not an outlier.
A data value that has a z-score that is not between -2 and 2 (inclusive) is abnormal, an outlier.
Ex- The weights of 19 high school football players have a mean of 192 and a standard deviation of 24 pounds. Use zscores to determine if the weights of the following players are unusual.
a) 251 pounds
b) 162 pounds
Note: The player weighing 251 pounds has a weight that is 2.46 std. dev. above the mean weight of the team while the
162 pound player weighs has a weight that is 1.25 standard deviations below the mean weight of the team.