Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Time series wikipedia, lookup

Student's t-test wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
```Probability and Statistics – Mrs. Leahy
1
Unit 2: Averages and Variation
Day 1: Mode, Median, Mean, Trimmed Mean, Distribution of Averages
“While the individual man is an insolvable puzzle, in the aggregate he becomes a mathematical certainty. You
can, for example, never foretell what any one man will do, but you can say with precision what an average number
will be up to.” -- Arthur Conan Doyle, The Sign of Four
Measures of Central Tendency: Mode, Median, and Mean
What does it mean to be average?
The average price of gold is \$920 per ounce. My car averages 28 miles per gallon. The average shoe size for
women is a size 8. The average test score was an 85%.
An average is a way to describe an ______________ data set using only _______________________.
Sometimes we will be working with a population and sometimes with a sample.
A population is ______ possible cases in the situation we are studying.
A sample is a ________ set of cases from the population use to represent the entire population.
Example:
Population:
Every carton of orange juice manufactured by Tropicana this year.
Sample:
Ten cartons of Tropicana orange juice randomly selected from a grocery store shelf.
Most Commonly Used “Averages”: Mode, Median, and Mean
MODE
The mode of a set of data is the number that occurs ___________ frequently.
It is possible to have more than one mode. It is possible to have no mode.
Example 1: Find the mode of this data: {1, 7, 8, 4, 4, 4, 6, 3, 8, 7}
Example 2: Sixteen students are asked how many college math classes they
have completed. There responses are shown at the right. What is the mode?
Probability and Statistics – Mrs. Leahy
2
MEDIAN
The median of a set of data is the number that is exactly in the _______________ of a set of ordered
values. Sometimes the median is called the “central value.”
To find the median:
1. Order the data from smallest to largest
2. For an odd number of data values:
Mean = Middle value
3. For an even number of data values:
Mean = “average” of two middle values
(Sum two middle values, then ÷ 2)
Example 3a:
Find the median of this data: {10, 5, 1, 6, 10, 2, 5, 7, 3}
Example 3b:
Find the median of this data: {1, 7, 8, 4, 4, 4, 6, 3, 8, 7}
Example 4: Belleview College must make a report to the
budget committee about the average credit hour load a fulltime student carries. A 12-credit-hour load is the minimum
requirement for full-time status. For the same tuition,
students may take up to 20 credit hours. A random sample of
40 students yielded the following information (in credit hours)
17
12
15
18
a) Organize the data from smallest to largest number of credit hours
12
17
12
14
14
16
15
16
17
15
16
17
13
14
12
15
16
12
18
19
18
12
20
12
20
13
19
13
13
17
12
12
b) What is the mode?
c) What is the median?
d) If the budget committee is going to fund the school based on the average student credit hour load, which of
these two averages should be reported to the committee?
12
14
15
15
Probability and Statistics – Mrs. Leahy
3
Mean and Trimmed Mean
Mean
The mean of a set of data is the arithmetic ________________ of all data values.
Symbols:
∑ = Summation, “the sum of the following”
n = the number of values in the sample
N = the number of values in the population
Mean of a sample:
x
x
n
“x-bar”
Mean of a population:

x
N
“mew”
Example 5: Find the mean of the following data set: {3, 8, 5, 4, 8, 4, 10 }
“How many hours of television do most people watch each week?”
You conduct a survey where participants are asked “How many hours of television did you watch last week?”
A random sample of the results are displayed below:
1 3
5
5
5 7 7 10 10 42
What should you say?
Probability and Statistics – Mrs. Leahy
4
Example 7: To graduate, Steve must have at least a C in history. He did fairly well on the first four tests;
however, he failed the last one. Here are his scores:
73
80
69
72
35
a) Find the mean score and determine
if Steve will get a C or better (70%+).
b)
c) What is the median?
What if Steve’s school allows him to
drop the highest and lowest scores?
That is, what if he is allowed to trim off
20% of his scores… What is the mean of
his remaining scores?
Does trimming change the median?
d) Which mean is closer to the median?
Trimmed Mean:
The mean of the data values left after “trimming” a specified percentage of the smallest
and largest data values from a data set. Common trims are 5% or 10%.
Example 8: A sample of 20 colleges showed class sizes for intro
courses to be:
a) What is the mean for the entire sample.
c) Which mean is closer to the median?
14 20 20 20 20 23 25 30 30 30
35 35 35 40 40 42 50 50 80 80
b) Compute a 5% trimmed mean for the sample.
Probability and Statistics – Mrs. Leahy
5
Distributions of Averages
Mean = Median = Mode
(Or very nearly)
Mean < Median < Mode
Mean>Median>Mode
Weighted Averages (Weighted Mean)
weighting:
Homework 17%
Tests 68%
Final Exam 15%
You currently have a 100% on homework and think you got an
85% on the Unit 1 test. What is your current grade in the class?
This type of mean is called a “weighted average” or “weighted mean” because some values are considered more
important than others.
Probability and Statistics – Mrs. Leahy
Example 9: Suppose your midterm test score is 80,
98. Suppose the weights are 30% for midterm, 30%
for projects, and 40% for final. If the minimum
average of an A is 90%, will you earn an A?
6
Example 10: Suppose you enter a comedy script
writing contest and are given a score of 1 to 10 (10
being the best) in categories of humor, originality,
and presentation. Humor is given a weight of 5,
originality is given a weight of 3, and presentation
is given a weight of 2. You receive an 8 in humor, a
5 in originality, and a 7 in presentation. What is
Weighted GROUPED Data (by classes)
Example 10:
a)
b)
Find the weighted (group) average for the following data.
For a group of data:
Use x = midpoint (median)
of class
Use w = frequency of class
Probability and Statistics – Mrs. Leahy
7
Day 2: Range, Variance, Standard Deviation
Any set of measurements has two important properties:
The central/typical/average value (Day 1)
Spread tells us how far from the center the data ranges.
Example: You survey 2 groups of 50 students asking them to report their weight.
Group 1: Mean weight 145lbs
Group 2: Mean weight 145lbs.
On day 4, we will be talking about interquartile range, a measurement of spread about the ________________.
Measures of Variation
range = largest value – smallest value
Range
Example 1: A large bakery regularly orders cartons of Maine blueberries. The average weight of the cartons is
supposed to be 22 ounces. Random samples of cartons from two suppliers were weighed. The weights in ounces
per cartons were:
Supplier I:
17
22
22
22
27
a) What is the range of each set of data?
Supplier II: 17
19
20
27
b) What is the mean of each set of data?
c) The bakery uses 1 carton of blueberries per blueberry muffin recipe.
Which supplier should they choose?
27
Probability and Statistics – Mrs. Leahy
8
Range unfortunately does not tell us how much the other values vary from one another or from the mean.
Variance and Standard Deviation of a Sample
(Ungrouped Data --- no classes….)
Symbols/Explanations:
DEFINING FORMULAS
2
∑(𝑥−𝑥̅ )2
Sample Variance:
𝑠 =
Sample Standard Deviation:
𝑠 = √𝑠 2
𝑛−1
𝑥 = a data value or outcome
𝑛 = the sample size
𝑥̅ = the mean of a sample
𝑥̅ − 𝑥 = the difference between what you expected
to happen and what actually happened; “the
deviation”
(from example 1)
Supplier I:
17
Supplier II:
17
22
19
22
20
22
27
27
27
∑(𝑥 − 𝑥̅ )2 = the sum of squares
Example 2: Use the data from Example 1 to find the sample variance (s2) and sample standard deviation (s)
Supplier I:
𝑥
𝑠2 =
𝑠=
𝑥̅ =
Supplier II:
(𝑥 − 𝑥̅ )2
𝑥 − 𝑥̅
∑(𝑥−𝑥̅ )2
𝑛−1
=
𝑥̅ =
𝑥
𝑠2 =
𝑠=
𝑥 − 𝑥̅
∑(𝑥−𝑥̅ )2
𝑛−1
(𝑥 − 𝑥̅ )2
=
Probability and Statistics – Mrs. Leahy
9
Step 1: Compute the mean 𝑥̅
Step 2: List out your data values (the x’s)
Step 3: Find how far off each data value is from
the mean: 𝑥 − 𝑥̅
Step 4: Square this difference: (𝑥 − 𝑥̅ )2
Step 5: Find the sum of the squares: ∑(𝑥 − 𝑥̅ )2
Step 6: Substitute this value into your formulas
Example 3:
Big Blossom Greenhouse measured a sample
of rose blooms for diameters in inches.
2
3
3
8
10
10
Compute the sample variance and the sample
standard deviation.
Variance and Standard Deviation of a
POPULATION
In most statistics applications, we work with a random sample of
data rather than the entire population. If you have the data for a
population, you can determine the population mean, population
variance, and population standard deviation.
∑(𝑥−𝑥̅ )2
Sample Variance:
𝑠2 =
Sample Standard Deviation:
𝑠 = √𝑠 2
𝑛−1
N = size of the population
𝜇 = the mean of a population
Population Mean = 𝜇 =
∑𝑥
𝑁
Population Variance = 𝜎 2 =
∑(𝑥−𝜇)2
𝑁
Population Standard Deviation = 𝜎 = √𝜎 2
Example 4: For the population of five values {1, 4, 4, 3, 5},
find the population variance and population standard deviation.
𝑥
𝑥−𝜇
(𝑥 − 𝜇)2
Probability and Statistics – Mrs. Leahy
10
Day 3: Variance, Standard Deviation, Grouped/Class Data and don’t you wish
there was an easier formula to use….
The formulas we used yesterday were called “defining formulas.”
You can get the same answers for sample variance and sample standard deviation by using the “computational”
formulas.
COMPUTATIONAL FORMULAS
Example 1: For the sample {1, 3, 2, 6}, find the standard
deviation, s.
Sample Variance (Computational)
𝑠2 =
∑ 𝑥 2 − (∑ 𝑥)2 /𝑛
𝑛−1
Sample Standard Deviation (Computational)
𝑠 = √𝑠 2
Benefits:
Don’t have to find the mean first.
Don’t have to find the difference between mean and data value. Less steps!
Example 2: For the sample below, find the sample variance
and the sample standard deviation using the
computational formulas.
A study examining the health risks of smoking measured
cholesterol levels of people who had smoked for at
least 25 years.
Probability and Statistics – Mrs. Leahy
11
Example 3: American League baseball teams
play their games with the designator hitter rule,
meaning that pitchers do not bat. The League
believes that replacing the pitcher, typically a
weak hitter, with another player in the batting
order produces more runs and generates more
interest among fans. Following are the average
number of runs scored in the American League
and National League stadiums for the first half of
the 2001 season.
Find the mean, sample variation, and sample
standard deviation for each League’s set of
data.
Probability and Statistics – Mrs. Leahy
12
Standard Deviation for
Grouped Data (Classes)
To find the standard deviation for a set of
grouped (class) data ----------------->
Example 4: Find the sample
standard deviation for the
following grouped data.
Use
s
x
2
f    xf  / n
2
n 1
Sample mean for grouped data:
X
 xf
n
X = midpoint of a class
n = sum of the frequencies
Sample standard deviation for grouped data:
s
 X  X 
n 1
2
f

x
2
f    xf  / n
2
n 1
(a weighted ave.)
Probability and Statistics – Mrs. Leahy
13
Example 5: Find the sample standard deviation for the following grouped data.
Coefficient of Variation
The Coefficient of Variation expresses standard
deviation as a percentage of the sample or
population mean. This allows us to compare
data from different populations that may use
different units of measurement.
CV = standard deviation ÷ mean x 100
Example 6: Mrs. Leahy and Mrs. Whitham decide to compare the heights of the students in their classes. Mrs.
Leahy’s class had a mean height of 67 inches, with a standard deviation of 2.13 inches. Mrs. Whitham’s class had
a average height of 165cm with a standard deviation of 5cm. Use the coefficient of variation to compare the two
classes.
Probability and Statistics – Mrs. Leahy
14
Chebyshev’s Theorem
Data within:
2 standard deviations of the mean
𝑥̅ ± 2𝑠
3 standard deviations of the mean
𝑥̅ ± 3𝑠
4 standard deviations of the mean
𝑥̅ ± 4𝑠
.
Example 8: For a sample with mean 𝑥̅ = 5 and a standard deviation s = 1.5
a) Find an interval A to B such that at least 75% of the data will lie within this interval.
b) Find an interval A to B such that at least 88.9% of the data fall within this interval.
Outliers: Occur at _________ standard deviations from the mean.
Probability and Statistics – Mrs. Leahy
15
Day 4: Percentiles, 5-Number Summaries:
I took my son to the doctor and was told that he was in the “85th percentile for height and the 56th percentile for
weight”. You took a standardized test and received notice that you scored in the “90th percentile. On the website
for the college you want to attend, you see they are accepting applications from students in the “75th percentile of
WHAT DOES THIS MEAN?
Percentiles
There are 100 percentiles.
If P = the Pth Percentile then P% of the data is _____to P and (100 – P)% of the data is _____ to P.
Example 1: You took the English achievement test to obtain college credit in freshman English by examination.
a) If your score is at the 89th percentile, what percentage of scores are at or below yours?
b) What percentage of scores are higher than yours?
c) If the scores ranged from 1 to 100 and your raw score is 95, does this necessarily mean that your score is at the
95th percentile?
Quartiles/Interquartile Range
Quartiles divide the data into ____________.
Q1
first quartile = _______ percentile.
Q2
second quartile = _______ percentile
and is also the ___________ of the data
Q3
third quartile = ________ percentile
Probability and Statistics – Mrs. Leahy
16
The Interquartile Range (IQR) is the difference between Q3 and Q1.
The IQR you the range of values in the middle ______ of the data set.
The Interquartile Range is a measure of SPREAD about the MEDIAN of a set of data.
Example:
Data Set 1: Median = 6, IQR = 4
Data Set 2: Median = 6, IQR = 10
Example 2: For the sample
{10,14,11,19,15,21,21,16,20}
Example 3: For the sample
a) Find the Quartiles
b) Find the Interquartile Range
a) Find the Quartiles
b) Find the Interquartile Range.
{42,77,19,53,95,34,94,86}
Probability and Statistics – Mrs. Leahy
17
Box-and-Whisker Plots
Smallest Value
Largest Value
Quartiles: Q1, Q2, Q3
give us a very useful _________________ summary of the data and their spread
A box-and-whisker plot is a graphical representation of these values:
Outliers occur when any value is beyond 1.5 x IQR
PROCEDURE:
How to make a box-and-whisker
plot
Steps 1, 2, 3:
1. Draw a horizontal (or vertical)
scale to include the lowest and
highest values.
2. Above (or to the right) of the scale,
draw a box from Q1 to Q3.
3. Include a solid line through the
box at the median level.
Step 4:
4. Check for outlier and draw
them in as individual points.
5. Draw horizontal (or vertical) lines
(called whiskers), from Q1 to the
lowest value and from Q3 to the
highest value.
Step 5:
Probability and Statistics – Mrs. Leahy
18
Example 4: For the sample {42,77,19,53,95,34,94,86}
a) Find the five-number summary
b) Make a box and whisker plot.
Example 6: Consider the data {1, 2, 3, 3, 5, 6, 7, 7, 10, 20}
a) Find the five-number summary for the data.
b) Draw a box and whisker plot.
Example 7: Three classes (A, B, C) took the same test
Probability and Statistics – Mrs. Leahy
19
Example 8:
a) What is the range of the sugar
content of these cereals?
b) Describe the shape of the
distribution of the:
histogram: