Download Measures of Central Tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Transcript
Measures of Central Tendency
(and normal distribution)
_
Average ( x )
Arithmetic mean, Expected value
_
Def. Given values x1, x2, x3 we define the arithmetic mean (average), x , by
_
x1  x2  x3  ...  xn
x1  x 2  x3
or in general x =
=
3
n
_
x =
Here, the symbol

x
i
n
represents the idea of adding up all values (sum of all xi )
For example:
1.
During the last five years an employee at a company has received the following pay raises with respect to previous
year’s salary;
2 %, 3 %, 2 %, 0%, and 3 %.
In terms of percents, what is the average raise that this employee has received during the last five years.
---- ( in terms of money: is the answer representative the increase ? )
2. In order to keep costs down
a company charges $5 for shipping within the city, $10 within the state but outside of the city, and $25 for out of
state shipping.
During the last week 100 orders were shipped;
35 were within the city, 40 were in state but outside the city, and the rest were out of state.
What was the average amount charged during the week ? ____________
3.
A student’s grade will be determined by exam grades ( each exam counts twice and there are three exams), HW
average (counts once ) , final exam ( counts three times ).
Find the average if the student has the following grades. _________________
Exam: 72, 85, 82
HW: 90
Final Exam: 85
4.
A student takes four exams during the semester and a final exam. No other grades will be used to calculate the
end of semester grade.
His four exams were; 80, 74, 90, 60. If the final exam counts twice as much as a regular exam, what score does
he have to make on the final exam to have an average of at least an 80 ?
__________
5. A student is currently enrolled in a class in which he has a 20 % chance of getting an A, a 40 % of getting a B,
a 25 % chance of getting a C, and a 10 % of getting a D. What grade do you expect him to get ? Explain your answer.
6.
You want to find the average height of a person in this classroom.
Process:
7. You want to find the average height of a college student at Angelo State University.
Process:
Population –vs- Sample
population: you have everybody ( all data is available in usable form) that will be used to calculate values. (example 6)
sample: only a small portion of the data is available and you will use it to find ( “estimate” ) values for the entire population.
(example 7 and example 8)
ex 8. You are asked to determine the # of a type C organisms in a pond.
Process:
ex 9. A test is given to 30 students – the class average is calculated. Is this a population or sample mean ?
A recent graduate has two options to choose from. They both advertise an average salary of
$35,000.
a) average of $40, 000: small company with salaries
b) average of $35,000:
20 K, 20K, 20K, 20K, 20K, 20K , 35K, 35K, 95K, 115K
15K, 15K, 30K, 30K, 35K, 35K, 40K, 40K, 55K, 55K
Which one would be the best fit for you and why ?
We talked about one of several measures of central tendency.
1. Arithmetic mean
_
Given values x1, x2, x3 we define the arithmetic average, x , by
_
x =
_
x1  x2  x3  ...  xn
x1  x 2  x3
or in general x =
=
3
n
x
i
n
In the event that the data is large and can be grouped together in different classes, then we can use the following formula.
_
individually listed
x
=
x1  x2  x3  x4  ....xn
n
given frequencies we can write as
_
x =
x1 f1  x2 f 2  x3 f 3  ...  xk f k
=
n
x
i
fi
n
f1 represents the # of times that value x1 occurs, f2 represents the # of times that x2 occurs,... n represents the total
number of data values ( sum of the frequencies)
ex. A large class of 100 students has met for 5 five days. Here is a description of the number of times a student has been
absent. What is the average number of days that a student has missed.
0 absences → 24 students
1 absence →
32 students
2 absences → 35 students
3 absences →
7 students
4 absences →
2 students
MODE
value (or expression) that appears with the largest frequency (response that is given the most number of times)
Examples:
1.
30 people are asked for their favorite color;
20 said blue, 8 said red, 1 said black, and 1 said yellow
What was the modal response ? _______
2.
The number of times that a driver has been pulled over during the last five years
0, 1, 0, 3, 0, 2, 1, 1, 2, 1, 1
3.
A class of 40 students is asked the number of times they ate out during a five day period.
1 → 0 times
20 → 1 time
15 → 2 times
3 → 3 times
1 → 4 times
NOTE: The mode must be one of the given values – the arithmetic mean does not.
MEDIAN
the middle value of a given set of data. If no middle value exists, then we choose the average of the two middle
values.
1) . Salary within a six-member department in terms of thousands of dollars
20, 40, 30, 20, 30, 15
2) The number of miles that a person walks per month during a 7 month period.
100, 40, 200, 200, 120, 80, 200,
3) the median value of a home in San Angelo is said to be $67,000
if there were 36,000 homes in San Angelo then explain the median value
Range:
The difference of the largest and smallest value.
1) Ten people weighed themselves one week after starting a weight loss program.
The following values indicate the amount lost in terms of lbs.
½, 2, 1, ½ , 3, 1, 1, 2, 3 ¼, - ½
What is the range of this data ? → _____________
2) A person is dealt a 5-card hand. The player counts how many diamonds are in his hand.
If there are five players sitting in, then what is the smallest and largest possible range ?
give me an example of the smallest range: ________________________
give me an example of the largest possible range: _________________________
Additional Examples:
A class of 8 eight students are asked the number of times that they ate out during this past week.
Here are their responses:
0, 1, 0, 3, 1, 1, 2, 4, 3, 5
What is the arithmetic mean ?
What is the mode ?
What is the median ?
A quiz is given to five students. The grades were all identical; 85, 85, 85, 85, 85
What was the arithmetic mean ? __________
Think of the distance of each value from the calculated mean:
What about the average distance from the mean ? _______________
What is the range ?
Define another term.
Find an average of the distances from the arithmetic mean – average deviation.
Examples:
A company believes that on the average a bottle of pills contains 50 pills.
Five bottles are selected at random and the pills are counted; 50, 48, 50, 50, 52
Is the average 50 ? _______
Describe the deviation of each bottle ( # of pills in each bottle) and then find an average of this deviation.
Problem?
A second example:
Three questions are given in class to a group of five students. The data below represents the number of problems
missed.
1, 0, 0, 3, 1
Find the arithmetic mean : __________________________
Find the average deviation : ___________
Problem ?
Instead of finding the average of the deviation why not find the average of the squared-deviation.
Example 1:
Example 2:
Average Squared Deviation:
Asd =
 (x
_
i
 x) 2
n
and if the data is in terms of frequencies →
 (x
_
i
 x) 2 f i
n
This will eliminate the problem we had.
This gives two more ways to look at the distribution of data values.
Variance:
 (x
_
i
 x) 2 f i
Standard deviation:
n
var iance
We use the formulas above if the entire population is known.
In most cases we do not know the entire population – so we use sample variance and sample standard deviation.
Sample Variance:
 (x
_
i
 x) 2 f i
n 1
sample standard deviation = s =
sample _ var iance
Another Example
A class of ten students meets five times per week. The following represents the number of times that each student
attended during the week. Find the arithmetic mean and the average of the deviations.
0
1
0
3
1
1
2
4
3
5
A study is done to determine the number of accidents that a student has been involved in.
A sample of 50 students is done with the results that follow
6 have been in zero accidents
29 have been in 1 accident
12 have been in 2 accidents
3 have been in 3 accidents
None have been in more than 3.
Find the sample standard deviation.
Histograms are graphs of data in which rectangles are used to represent the frequency of each value ( later: the probability )
The rectangles are of width one unit, centered at each value.
ex. 1, 1, 1, 2, 2, 3, 3, 3, 3, 3
f
54321x
1
2
3
Use the following histograms to find which data has the largest mean and which one appears to have the largest standard
deviation.
a)
b)
Can you find the standard deviation of each group of data values.
c)
Discrete Data –vs- Continuous Data
Discrete Data:
You meet five traffic lights – how many were red ?
Ten employees were hired five years ago. How many are still with the company ?
A company serviced 20,000 accounts this year . How many were from female customers ?
Continuous Data:
A company plans to test to see how much sugar has to be added to lemonade before a customer is satisfied.
Provide exact amounts that will please each of five customers. Assume there is a minimum of zero sugar added and a
maximum of 5 teaspoons.
A sleep depravation test will measure the exact amount of time that five individuals will be able to stay awake.
List the possible amounts. Assume that every individual managed to stay awake at least 10 hours and at most 48 hours.
A room has 10 lights. The exact lifespan of each is found ( when the light ceases to work). Write down each possible
lifespan.
Normal distribution refers to data that can be modeled by continuous data.
we get the following curve to represent it; a normal curve.
When data has a normal distribution-
______________________________________
A normal curve has a high point – and it occurs at the mean µ. The area under the curve will add up to 1 square unit.
We know a few curves whose areas are easily found – this is not one of them.
Normal Distributions
Normal Curves - - mean (µ ) , standard deviation ( ) , inflection points, area under a curve,
_______________________________________
Standard normal curve
If the normal curve has  = 0 and a variance = standard deviation = 1, we call it a standard normal curve.
Do the following values represent standard normal curves ? Why or why not
a) µ = 0,  = 2
b) µ = 0 ,  = - 3 ( ? )
c) µ = - 1 ,  = 1
Do they represent normal curves ? ____________
How many normal curves could you create ? _________
We use tables to find area under a curve. Notice that half of the area is to the right of the mean, half to the left (symmetric ).
We have a function that expresses the curve and there are ways of finding the area under a curve.
f(x) =
1
 2
( x )2
e
2 2
ex. f(x) = 4 . Find the area under the curve between x = –2 and 2
ex. f(x) = 2x. Find the area under the curve between x = 0 and 4
ex. f(x) = x2. Find the area under the curve between x = -1 and x = 2
It is not as easy to find the area under a normal curve.
Consider the following functions: (see page 610)
f(x) =
1
 2
( x )2
e
2 2
This is the function that we would try to work with when finding the area under a
a normal curve. You can see the problem that we would have.
A table is constructed for a standard normal curve by using techniques that are for the moment out of reach.
If we are given a table with values that represent areas under a standard normal curve
we use the following formula and this table to find areas under a normal curve.
z=
x

Table
z
0
1
2
3
4 ….
9
------------------------------------------------------------------------------------------------------------------------:
:
:
2.1
ex. Find the area to the left of - 2.00 under a standard normal curve.
ex. Find the area to the left of 12 under a normal curve with  = 20 and variance = 16.
ex. Find the area to the right of 190 under a normal curve with  = 200 and variance = 81.
ex. Find the area between 20 and 30 under a normal curve with mean = 28 and variance = 25