Download 2 - heatherchafe

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
1
2.4 Numerical Techniques
2.4.1 Use the summation notation representation to sum numbers
The Greek letter (), capital Sigma, is used to denote the summation of a selection of numbers.
If we have a quantitative data set consisting of x1, x2. x3, .....xn this means that x1 is the first
measurement in the data set, x2 is the second, and xn is the nth and last measurement in the
group. If we have five measurements in a set and they are:
x1 = 5, x2 = 3, x3 = 8, x4 = 5, and x5 = 4, then in order to add up this set, we use the Symbol
Sigma () as
x = x 1 + x 2 + x 3 + x 4 + x 5
= 5 + 3 + 8 + 5 + 4 = 25
2.4.2 Measures of Central Tendency
2.4.2.1/2.4.2.2 Discuss the roles of mean, median, midrange and mode as ways of
measuring central tendency in data and calculate them (readings: pg 59: “The population
Mean”, pg64-65: “The Median”, and pg65-66: “The Mode”)
Measures of Central Tendency:
The purpose of a measure of central tendency is to determine the "center" of your data
values or possibly the "most typical" data value. Some measures of central tendency are
the mean, median, midrange, & mode.
The mean:
The mean is the most popular measure of central tendency. It is merely the average of
the data.
The mean is equal to the sum of the data values divided by the number of data values.
Mathematically the mean is given as follows:
mean = x =
x
N
where N (or n) is the number of values in the data set
2
Take for example: The number of accidents reported over a particular 5 month period
was: 6, 9, 7, 23, 5
So the mean of this sample is:
x=
 x  6  9  7  23  5  50  10
N
5
5
Special note:
When we talk of mean, we can have a population mean or sample mean:
the symbol for population mean is the Greek Symbol (u)
the symbol for sample mean is: x
The median:
The median of a set of data is the value in the center of the data values when they are
arranged from smallest to largest. Consequently, it is in the center of the ordered array.
Using the accident data set, the median (Md) is found by first constructing an ordered
array:
5, 6, 7, 9, 23
so the median here is 7.
If there is an even amount of data like 3, 8, 12, 14 then Md is the average of the two
center values thus the median for these numbers is (8 + 12)/2 = 10
Note: In our accident data set, one of the five values (23) is much larger than the
remaining values - it is what we call an outliner.(an out of whack data value) Notice
that the median (Median = 7) was much less affected by this value than was the mean
( x = 10). When dealing with data that are likely to contain outliners (for example,
personal incomes or prices of residential housing), the median usually is preferred to
the mean as a measure of central tendency, since the median provides a more "typical" or
"representative" value for these situations.
The Midrange:
Although less popular than the mean and median, the midrange (Mr) provides an easy to
grasp measure of central tendency. Notice that it is also severely affected by the presence
of an outliner in the data. The midrange is:
Midrange = (smallest value) + (largest value)
2
3
For our accident data: Midrange = 5 + 23 = 28/2 = 14.0
2
The mode:
The mode of a data set is the value that occurs more than once and the most often. The
mode is not always a measure of central tendency; this value need not occur in the center
of your data. One situation in which the mode is the value of interest is the
manufacturing of clothing. The most common hat size is what you would like to know,
not the average hat size.
For our accident data there is no mode since all values occur only once but let’s consider
this data set: 4, 8, 7, 6, 9, 8, 10, 5, 8
Here 8 occurs three times which is most often so Mode = 8.
Note: There can be more than one mode in a set of data.
For example: 1, 1, 3, 5, 7, 7, 9
There re two modes for the data above. (1 and 7)
Example: A sample of ten was taken to determine the typical completion time (in
months) for the construction of a particular model of Brockwood Homes:
4.1, 3.2, 2.8, 2.6, 3.7, 3.2, 9.4, 2.5, 3.5, 3.8
Find the
a. mean
b. median
c. midrange
d. mode
4
2.4.3 Measures of variation (pg74:”Measures of Dispersion: The Range”,pg77-78:
“Variance and Standard Deviation”)
Measures of Variation:
Variability:
Variability provides a quantitative measure of the degree to which scores in a
distribution are spread out or clustered together. The purpose of measuring
variability is to determine how spread out a distribution of scores is. Are the scores all
clustered together, or are they scattered over a wide range of values?
The range:
The range is the numerical difference between the largest value and the smallest value in
a data set. If the number of accidents reported over a 5 month period was 6, 9, 7, 23, &
5. The range for this data set is:
range = (largest value) - (smallest value) = 23 - 5 = 18
The range is a rather crude measure of variation, but it is an easy number to calculate and
contains valuable information for many situations. Stock reports generally give prices in
terms of ranges, citing the high and low prices of the day.
Note: The value of the range is strongly influenced by an outliner in the data set.
Standard deviation:
The standard deviation is the most commonly used and the most important measure
of variability. Standard deviation uses the mean of the distribution as a reference point
and measures variability by considering the distance between each score and the mean. It
determines whether the scores are generally near or far from the mean. That is, are
the scores clustered together or scattered? In simple terms, the standard deviation
approximates the average distance from the mean.
The standard deviation is the square root of the variance.
The standard deviation is a measure of the average distance between the values of
the data in the set and the mean. If the data points are all similar, then the standard
deviation will be low (closer to zero). If the data points are highly variable, then the
standard variation is high (further from zero).
5
Calculating the variance and standard deviation given a set a numbers representing a
population. (Note: the formulas for a sample is different than for a population)
Step One: Calculate the mean of the population. (represented as u)
Step Two: Calculate ( X   ) for each number (X represents a number from the
population)
Step Three: Square each difference from Step 2. i.e. ( X   ) 2
Step Four: Get the mean of each of the squares from Step 3. (i.e.
 ( X  )
N
2
, where N
is the number of numbers in the data. This is your variance.
Step Five: If the question asked for the standard deviation of the population, then you
would just take the square root of your variance. i.e.
( X   )2
N
It is helpful to do the calculations in a chart such as the one used below to find variance
and standard deviation.
Example#1: A stores sells the following numbers of TV’s over a week.
3, 2, 5, 0, 7
a. Find the variance in the data.
6
b. Find the standard deviation.
2.4.3.2 Discuss the interpretation of standard deviation.
As mentioned above, the standard deviation is a good measure of dispersion, or how
spread out the data is. The bigger the standard deviation the more variation there is in the
data and the lower the standard deviation, the lower the variation in the data.
The standard deviation tells us, on average, how far a given number is from the mean.
2.4.3.3 Discuss Chebyshev’s Theorem (pg 82: “Chebyshev’s Theorem)
For any set of observations, the proportion of the values that like within k standard
1
deviations of the mean is at least 1  2 , where k is any constant greater than 1.
k
Examples:
1. What proportion of data will lie within 2 standard deviations of the mean?
2. What proportion of data will lie within 1.4 standard deviations of the mean?
recommended extra problems:page 84#49,#50
7
2.4.4 Measures of Position
2.4.4.2/2.4.4.2 Define and calculate percentile and quartile. (pg 97-99:”Quartiles,
Deciles, and Percentiles”—can omit deciles though!)
Percentiles divide a set of numbers into 100 equal parts. For example, if your GPA was in
the 66th percentile, than 66 percent of the students had a lower GPA.
A formula can be applied to determine the location in a list of numbers for a given
percentile.
L p  ( n  1)
P
100
L p represents the location of the desired percentile
n is the number of observations(numbers)
P is the desire percentile.
Examples:
1. Use the following data to answer the questions.
1, 3, 4, 5, 6, 7, 7
a. Find the 25th percentile of the above data.
8
b. Find the 50th percentile of the above data.
c. Fine the 75th percentile of the above data.
9
Note: Percentile problems do not always work out perfectly! This is how you do
them if your L p does not work out to a whole number.
#2. Use the data below to answer the following questions.
20, 4, 7, 22, 11, 14, 1, 8
a. Find the 36th percentile.
b. Find the 62nd percentile.
10
c. Find the 83rd percentile.
Finding quartiles
Quartiles divide a set of observations(numbers) into four equal parts. (quarters)
The first quartile, usually labeled Q1 , is the value below which 25 percent of the
observations occur. This would be the 25th percentile as well.
The second quartile, Q2 , is the value below which 50 percent of the observations occur.
This is also the median. This would be the 50th percentile as well.
The third quartile, usually labeled Q3 , is the value below which 75 percent of the
observations occur. This would be the 75th percentile as well.
So if you are asked to calculate the first, second, or third quartile, just use the appropriate
P
percentile value in the formula L p  ( n  1)
.
100
11
Examples
3. Use the data set below to answer the following questions.
2, 5, 50, 11, 46, 16, 22, 36, 15, 8
a. Find the first quartile.
b. Find the third quartile.
c. Find the second quartile.
12
Math Worksheet #3
1. The data below represents the number of times an item is returned in a
store each day.
4
4
5
6
8
10
10
15
15
17
18
20
22
a. Find the mean.
b. Find the median.
c. Find the mode.
d. Find the midrange.
2. The standard deviation for class A is 2.4 and the standard deviation for
class B is 13.5. Which class has a greater variety of test scores?
3. If a set of data has a standard deviation of 10, what, on average, is the
distance of a given number from the mean?
4. What is the standard deviation if the variance is 5.8?
5. What is the variance if the standard deviation is 15.6?
6. Use the data below to answer the following questions.
2
17
18
9
9
18
a. Calculate the mean.
b. Calculate the variance.
c. Calculate the standard deviation.
7. On average, the numbers in the data set in #6 fall within ___________
units of the mean.
8. A set of data has a standard deviation of 9.2. Does this set of data have
more or less variation than the data in #6? Why?
9. Why is the standard deviation a better measure of dispersion than the
range?
13
10. List four measures of central tendency and three measures of dispersion.
11. Use the data below to answer the following:
2
8
3
10
6
17
29
40
36
a. Find the 70th percentile.
b. Find the 27th percentile.
c. Find the third quartile.
d. Find the first quartile.
12. a. Lisa scored higher than 72% of her class. What would be her
percentile?
b. Janice scored lower than 11% of her class. What would be her percentile?
13. What percentage of data will lie within 1.56 standard deviations of the
mean? (Hint: Use Chebyshev’s Theorem)