Download Descriptive Statistics: Numerical Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 3
Descriptive Statistics: Numerical
Methods
Measures of Location
The Mean (A.M, G.M and H. M)
The Median
The Mode
Percentiles
Quartiles
Summary Measures
Describing Data Numerically
Center and Location
Mean
Variation
Range
Weighted Mean
Median
Variance
Mode
Standard Deviation
Percentiles
Coefficient of
Variation
Quartiles
Mean



The Mean is the average of data values
The most common measure of central tendency
Mean = sum of values divided by the number of values
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1  2  3  4  10 20

4
5
5
Mean
The mean (or average) is
the basic measure of
location or “central
tendency” of the data.
•The sample mean
sample statistic.
x
is a
•The population mean  is a
population statistic.
Mean

Sample mean
n = Sample Size
n
x

x
x1  x 2    x n

n
i
i1
n
N = Population Size
Population mean
N
x
x1  x 2    x N


N
N
i1
i
Example: College Class Size
We have the following sample of data
for 5 college classes:
46 54 42 46 32
We use the notation x1, x2, x3, x4, and x5 to represent the
number of students in each of the 5 classes:
X1 = 46
x2 = 54 x3 = 42
x4 = 46
x5 = 32
Thus we have:
 xi x1  x2  x3  x4  x5 46  54  42  46  32
x


 44
n
5
5
The average class size is 44 students
Median
The median is the value in the
middle when the data are arranged in
ascending order (from smallest value
to largest value).
a. For an odd number of observations the median
is the middle value.
b. For an even number of observations the
median is the average of the two middle values.
The College Class Size example
First, arrange the data in ascending order:
32 42 46 46 54
Notice than n = 5, an odd number. Thus the
median is given by the middle value.
32 42 46 46 54
The median class
size is 46
Median Starting Salary For a Sample of 12
Business School Graduates
A college placement office has obtained the
following data for 12 recent graduates:
Graduate
Starting Salary
Graduate
Starting Salary
1
2850
7
2890
2
2950
8
3130
3
3050
9
2940
4
2880
10
3325
5
2755
11
2920
6
2710
12
2880
First we arrange
the data in
ascending order
2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325
Notice that n = 12, an even number. Thus we take an
average of the middle 2 observations:
2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325
Middle two values
Thus
2890  2920
Median 
 2905
2
Mode
The mode is the value that occurs with
greatest frequency




A measure of central tendency
Value that occurs most often
There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
0 1 2 3 4 5 6
No Mode
The Mode
MODE The value of the observation that appears most frequently.
Characteristics of the Mean
1.
2.
3.
The most widely used measure of
location.
Major characteristics:

All values are used.

It is unique.

It is calculated by summing the
values and dividing by the
number of values.
Weakness: Its value can be unclear
when extremely large or extremely
small data compared to the majority of
data are present.
Properties and Uses of the Median
1.
There is a unique median for each data
set.
2.
Not affected by extremely large or small
values and is therefore a valuable
measure of central tendency when such
values occur.
Characteristics of the Mode
1.
2.
3.
Mode: the value of the
observation that appears most
frequently.
Advantage: Not affected by
extremely high or low values.
Disadvantages:

For many sets of data,
there is no mode because
no value appears more
than once.

For some data sets there is
more than one mode.
Weighted Mean



When the mean is computed by giving
each data value a weight that reflects its
importance, it is referred to as a weighted
mean.
In the computation of a grade point
average (GPA), the weights are the
number of credit hours earned for each
grade.
When data values vary in importance, the
analyst must choose the weight that best
reflects the importance of each value.
Weighted Mean
x =  wi x i
 wi
where:
xi = value of observation i
wi = weight for observation i
Mean for Grouped Data

Sample Data
fM

x
f
i
fM


i
i
i

Population Data
i
N
where:
fi = frequency of class i
Mi = midpoint of class i
Weighted Mean

Used when values are grouped by frequency
or relative importance
Example: Sample of 26
Repair Projects
Days to
Complete
Frequency
5
4
6
12
7
8
8
2
Weighted Mean Days to
Complete:
XW 
w x
w
i
i

(4  5)  (12  6)  (8  7)  (2  8)
4  12  8  2

164
 6.31 days
26
i
Example: Apartment Rents
Given below is the previous sample of
monthly rents
for one-bedroom apartments presented here
as grouped
data in the form of a frequency distribution.
Rent ($) Frequency
420-439
8
440-459
17
460-479
12
480-499
8
500-519
7
520-539
4
540-559
2
560-579
4
Example: Apartment Rents

Mean for Grouped Data
f M
Rent ($)
fi
Mi
420-439
8
429.5
440-459
17
449.5
460-479
12
469.5
480-499
8
489.5
500-519
7
509.5
520-539
4
529.5
540-559
2
549.5
approximation
560-579
4
569.5
differs
by
$2.41
580-599
2
589.5
600-619
6
609.5
Total
70
i
i
3436.0
7641.5
5634.0
3916.0
3566.5
2118.0
1099.0
2278.0
from
1179.0
3657.0
34525.0
sample
mean of $490.80.
x
34, 525
 493. 21
70
This
the actual
Review Example

Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000

Mean:

Median: middle value of ranked data
= $300,000

Mode: most frequent value
= $100,000
Sum 3,000,000
($3,000,000/5)
= $600,000
Percentiles
The pth percentile is a value such that at least p
percent of the observations are less than or equal to
this value and at least (100 – p) percent of the
observations are greater than or equal to this value.
I scored in the 70th
percentile on the
Graduate Record Exam
(GRE)—meaning I
scored higher than 70
percent of those who
took the exam
Calculating the pth Percentile
•Step 1: Arrange the data in ascendingorder
(smallest value to largest value).
•Step 2: Compute an index i
 p 
i
n
 100 
where p is the percentile of interest and n in the number
of observations.
•Step 3: (a) If i is not an integer, round up. The next
integer greater than i denotes the position of the
pth percentile.
(b) If i is an integer, the pth percentile is the
average of values in i and i + 1
Example: Starting Salaries of Business Grads
Let’s compute the 85th
percentile using the starting
salary data. First arrange
the data in ascending order.
Step 1:
2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325
Step 2: i   p n   85 12  10.2
 100 
 100 
Step 3: Since 10.2 in not an integer, round up to
11.The 85thpercentile is the 11th position (3130)
Quartiles
Quartiles are just specific percentiles
Let:
Q1 = first quartile, or 25th percentile
Q2 = second quartile, or 50th percentile (also the median)
Q3 = third quartile, or 75th percentile
Let’s compute the 1st and
3rd quartiles using the
starting salary data. Note we
already computed the
median for this sample—so
we know the 2nd quartile
2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325
Now find the 25th percentile:
 p 
 25 
i
n  
12  3
 100 
 100 
Note that 3 is an integer, so to find the 25th percentile we must
average together the 3rd and 4th values:
Q1 = (2850 + 2880)/2 = 2865
 p 
 75 
n



12  9
 100 
 100 
Now find the 75th percentile: i  
Note that 9 is an integer, so to find the 75th percentile we must
average together the 9th and 10th values:
Q1 = (2950 + 3050)/2 = 3000
Quartiles for the Starting Salary Data
2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325
Q1 = 2865
Q1 = 2905
(Median)
Q3 = 3000