Download Stat 1793

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Stat 1793
Chapter 3 – Statistics for Describing, Exploring and Comparing Data
In this chapter we are going to look at descriptive statistics, numbers calculated
for sample data to summarize or describe the data. The set of methods used to make
conclusions about populations is called inferential statistics and we will be looking at
these towards the end of the course.
What type of descriptive measures will we look at? Basically three types:



Measures of central tendency (mean, median, mode, mid-range)
Measures of variation ( range, variance, standard deviation, ….etc)
Measures of relative standing (z-scores, percentiles)
Section 3-1 Measures of Centre
The word average is used in phrases common to everyday conversation. For
example,batting average, average life expectancy of a battery or a human being. The
word average is derived from the French word avarie, which refers to the money that
shippers contributed to help compensate for losses suffered by other shippers whose
cargo did not arrive safely, ie. The losses were shared, with everyone contributing an
average amount.
Here, we plan on measuring the centre of the distribution of data in four ways:
1.
Mean
 This is found by adding items in a set and dividing by the number of items
and is also known as the arithmetic mean or average.
 It is the most common measure of central tendency. The population mean
is denoted  (mu), and the sample mean is denoted X (x-bar). The
N
mathematical formula for the mean can be written as  
X
i 1
N
i
, where N
n
is the number of elements in the population, or as X 
X
i 1
the sample size.
Eg.
Find the mean of the following measurements: 2, 5, 7, 10, 11, 13
n=6,  X i  48, X  8
n
i
where n is
2.
Median
 The middle value of a data set arranged in numerical order when there are
an odd number of observations. With an even number of observations, the
median equals the average of the 2 middle values. For example, to find the
median of 5,3,2,7,4, we would first order the data: 2,3,4,5,7, the median
would be x = 4. With a data set of an even number of measurements,
10,8,13,14,9,8, we would again order the data: 8,8,9,10,13,14 and here the
median would be the average of the two middle values, 9.5.
 The word is derived from the Latin word medius which means middle.
Note that 50 % of the observations in the data set are smaller than median
and 50 % are greater than the median.
 In certain situations the median provides the quickest and most
economical way to locate the centre of a distribution. For example,
suppose 10 000 lightbulbs are installed in a factory. The easiest way to
find a central number to describe life expectancy for the bulbs can be
found by noting how much times elapses before exactly 50 % of them
must be replaced
3.
Mode
 The most frequently occurring number in the data set. With the data set:
10,8,13,14,9,8, the mode equals 8. With the data set 5,3,2,7,4, there is no
mode. When two values in a data set occur with the same greatest
frequency we call the data set bimodal.
4.
Midrange
 Is located halfway between the maximum and minimum value and is
found by taking the average of those two values.
Find the mean, median, mode and midrange for the cholesterol example:
A doctor testing cholesterol levels for 20 young patients found the following readings
(mg/ml)
210
209
212
208
217
207
210
203
208
210
210
199
215
221
213
218
202
218
200
214
4204
Ordered:
199
208
210
215
 X i  4204, X  20  210.2
200
208
210
217
202
209
212
218
Median = 210
203
210
213
218
Mode = 21
207
210
214
221
Both the median and mean are good measures of central tendency, but the median is less
sensitive to extreme values. If the distribution is symmetric, all 4 measures are
equivalent. In a distribution that is positively skewed (tail is to the right), the median will
be less than the mean (Example: income in an Irving household). In a negatively skewed
distribution, the median will be greater than the mean.
The mean is of greater importance in statistical inference, because X has certain
properties that make for more powerful and robust results.
Finding the Mean from A Frequency Distribution
X
fx
f
i
i
where f i is the frequency of the ith category and x i is the midpoint of the
i
ith category
Page 90
28.
Temperature
Frequency
Midpoints
f i xi
96.65
776.40
1634.3
2152.7
1866.75
3156.8
594.3
397.8
10405.7
1
96.65
96.5-96.8
8
97.05
96.9-97.2
14
97.45
97.3-97.6
22
97.85
97.7-98.0
19
98.25
98.1-98.4
32
98.65
98.5-98.8
6
99.05
98.9-99.2
4
99.45
99.3-99.6
106
Total
10405.7
X
 98.17 
106
This is close to the 98.2F found using the original data, and it appears to be significantly
less than the commonly assumed population mean 98.6.
Also do 3-1 #2, 4, 12, 18