Download The geometric mean is

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
STATISTICS AND PROBABILITY
IN CIVIL ENGINEERING
TS4512
Doddy Prayogo, Ph.D.
1
2. Using Numerical Measures to Describe Data
 2.1. Measures of Central Tendency
• Measures of central tendency provide numerical
information about typical observation in the data.
• The mean, median, mode, range, and geometric mean
 2.1.1. Mean (Arithmetic mean)
• The arithmetic mean is the sum of the data values
divided by the number of observations
• If the data set is the entire population of data, then the
population mean, µ, is a parameter. If the data set is
from a sample, then the sample mean, , is a statistic.
• Mean formula:
𝜇=
𝑋=
𝑁
𝑖=1 𝑥 𝑖
𝑁
𝑛
𝑖=1 𝑥 𝑖
𝑛
=
𝑋1 +𝑋2 +𝑋3 +⋯+𝑋 𝑁
𝑁
where N = population size
where n = sample size
2.1.2. Median
• The median is the middle observation of a set of
observations that are arranged in increasing (or
decreasing) order.
• If n is odd, the sample median is the number in
position (n +1):2
• If n is even, the sample median is the average of
the number in positions n/2 and ( n/2 + 1)
2.1.3. Mode and the range
• The mode, if one exists, is the most frequently
occuring value. If several values occur with equal
frequency, each one is a mode.
• The range is the difference between the largest
and smallest values in a sample. It is a measure of
spread, but it is rarely used, because it depends
only on the two extreme values and provides no
information about the rest of the sample.
 2.1.4. Geometric mean
• The geometric mean is the n th root of the product of n numbers
• The geometric mean rate of return is the geometric mean.
𝑋𝑔 =
𝑛
1
𝑛
𝑋1 x 𝑋2 x 𝑋3 x .....x 𝑋𝑛 ) = (X1 x X2 x X3 x … … x Xn )
1
𝑛
𝑟𝑔 = (X1 x X2 x X3 x … … x Xn ) - 1
 2.1.5. The trimmed mean
• The trimmed mean is a measure of center that is designed to be
unaffected by outlier. It is computed by arranging the sample
values in order, ‘trimming’ an equal number of them from each
end, and computing the mean of those remaining.
Example
An investor invests $100 and receives the
following returns:
Year 1: 3%
Year 2: 5%
Year 3: 8%
Year 4: -1%
Year 5: 10%
Find the annual growth rate of his investment!
The $100 grew each year as follows:
Year 1: $100 x 1.03 = $103.00
Year 2: $103 x 1.05 = $108.15
Year 3: $108.15 x 1.08 = $116.80
Year 4: $116.80 x 0.99 = $115.63
Year 5: $115.63 x 1.10 = $127.20
The geometric mean is: [(1.03*1.05*1.08*.99*1.10) ^ (1/5
or .2)]-1= 4.93%.
The average return per year is 4.93%, slightly less than the
5% computed using the arithmetic mean. Actually as a
mathematical rule, the geometric mean will always be equal
to or less than the arithmetic mean.
Exercise
An investor holds a stock that has been volatile with
returns that varied significantly from year to year.
His initial investment was $100 in stock A, and it
returned the following:
Year 1: 10%
Year 2: 150%
Year 3: -30%
Year 4: 10%
Find the annual growth rate of his investment
Solution
Year 1: $100 x 1.10 = $110.00
Year 2: $110 x 2.5 = $275.00
Year 3: $275 x 0.7 = $192.50
Year 4: $192.50 x 1.10 = $211.75
The resulting geometric mean, or a compounded
annual growth rate (CAGR), is 20.6%, much lower
than the 35% calculated using the arithmetic mean.
Exercise
The values of fracture stress (in Mpa) were measured for
a sample of 24 mixtures of hot- mixed asphalt such as:
30; 75; 79; 80; 80; 105; 126; 138; 149; 179; 179;
191; 223; 232; 232; 236; 240; 242; 245; 247; 254;
274; 384; 470.
Compute the mean, median, and the 5%, 10%, and 20%
trimmed mean.
 2.1.6. Shape of a distribution
• The shape of distribution reveals whether data are
evently spread from its middle or center.
• Symmetry. The shape of a distribution is said to be
symmetry if the observations are balanced, or
approximately evently distributed, about its middle.
• Skewness. A distribution is skewed, or asymmetric, if
the observations are not symmetrically distributed on
either side of the middle. A positively skewed
distribution has a tail that extends to the right. A
negatively skewed distribution has a tail that extends
to the left.
• Negatively skewed distribution:
mean<median
• Symetric distribution:
mean = median
• Positively skewed distribution:
mean>median
Solution:
*The mean is found by averaging together all 24
numbers, wich produces a value of 195.42.
*The median is the average of the 12th and 13 th
numbers, which is (191+223):2 = 207.00
*To compute the 5% trimmed mean, it must be dropped
5% of the data from each end. This come to (0.05) (24)
= 1.2 observation. It is rounded 1.2 to 1, and trim one
observation off each end. The 5% trimmed is the
average of the remaining 22 numbers:
(75+79+......+274+384):22= 190.45
• To compute the 10% trimmed mean, round off
(0.1)(24)=2.4 to 2. Drop 2 observations from
each end, and then average the remaining20:
(79+80+.......+254+274):20 = 186.55
• To compute the 20% trimmed mean, round off
(0.2)(24)=4.8 to 5. Drop 5 observations from
each end, and then average the remaining14:
(105+126+....+242+245):14 = 194.07
2.2.4. Variance and standard deviation
*The population variance , is the sum of the
squared differences between each observation
and the population mean divided by the
population size, N.
𝜎2=
𝑛
2
𝑖=1 (𝑥 𝑖 −𝜇 )
𝑁
*The sample variance , is the sum of the
squared differences between each observation
and the sample mean divided by the sample size
n minus 1.
2
𝑠 =
𝑛
2
𝑖=1 (𝑥 𝑖 −𝑥 )
𝑛−1
*The population standard deviation , is the
(positive) square root of the population
variance and is defined as follows:
𝜎=
𝜎2 =
𝑛 (𝑥 −𝜇 )2
𝑖=1 𝑖
𝑁
*The sample standard deviation, s, is as
follows:
S = 𝑠2 =
𝑛 (𝑥 −𝑥 )2
𝑖=1 𝑖
𝑛−1
• Example:
• Compute the variance and standard deviation of the ten
observations of executive exercise time listed here:
20;35;28;22;10; 40;23;32; 28;30.
𝑛
𝑖−1 𝑥𝑖
=
10
𝑖−1(𝑥𝑖
=268  𝑥 =26.8
− 𝑥 )= 0.
10
𝑖−1(𝑥𝑖
s=
10
𝑖−1 𝑥𝑖
1
𝑛−1
− 𝑥 )2 =
𝑛
𝑖=1(𝑥𝑖−
647.6
𝑥)2 = 647.6/9 = 8.5
Exercise
𝑛
𝑖−1 𝑥𝑖
=
10
𝑖−1(𝑥𝑖
=268  𝑥 =26.8
− 𝑥 )= 0.
10
𝑖−1(𝑥𝑖
s=
10
𝑖−1 𝑥𝑖
1
𝑛−1
− 𝑥 )2 =
𝑛
𝑖=1(𝑥𝑖−
647.6
𝑥)2 = 647.6/9 = 8.5
• Executive exercise time:
Times (Minutes)
xi
20
35
28
22
10
40
23
32
28
30
Deviation about the
mean (xi - xˉ)
-6.8
8.2
1.2
-4.8
-16.8
13.2
-3.8
5.2
1.2
3.2
Squared Deviation about the
mean (xi - xˉ). (xi - xˉ)
46.24
67.24
1.44
23.04
282.24
174.24
14.44
27.04
1.44
10.24
* Coefficient of Variation
The coefficient of variation expresses the
standard deviation as a percentage of the
mean.
The population coefficient of variation is
𝜎
CV = x 100% if 𝜇 > 0
𝜇
The sample coefficient of variation is
𝑠
CV = x 100% if 𝑥 > 0
𝑥
* Weighted Mean and Measures of Grouped
Data
The Weighted Mean of a set of data is
𝑥=
𝑛
𝑖=1 𝑤 𝑖
𝑛
𝑥𝑖
=
𝑤 1 𝑥 1 +𝑤 2 𝑥 2 +⋯….+𝑤 𝑛 𝑥 𝑛
𝑛
Where 𝑤1 = weight of ith observation and n = 𝑤𝑖
2.2.5. Measures of Relationships Between
Variables
* Covariance
Covariance (Cov) is a measure of the linear relationship
between two variables. A positive value indicates a direct or
increasing linearreationship, and negative value indicates a
decreasing linear relationship
*A population covariance
is
𝑛
Cov (x,y) = 𝜎𝑥𝑦 =
𝑖=1
𝑥 𝑖 −𝜇 𝑥 (𝑦 𝑖 −𝜇 𝑦 )
𝑁
where xi and yi are the observed values, µx , µ y are the
population means, and N is the population size.
* A sample covariance is:
Cov (x,y) = 𝑠𝑥𝑦 =
𝑛
𝑖=1 𝑥 𝑖 −𝑥 (𝑦 𝑖 −𝑦 )
𝑛 −1
Where xi and yi are the observed values, and
are the sample means, and n is the sample size.
* Correlation coefficient
The correlation coefficient is computed by dividing
the covariance by the product of the standard
deviations of the two variables.
* A population correlation coeeficient, ρ, is
ρ=
𝐶𝑜𝑣 (𝑥,𝑦)
𝜎 𝑥 𝜎𝑦
where and are the population standard
deviation of the two variables, and Cov (x,y) is
the population covariance.
*A sample correlation coefficient r, is
r=
𝐶𝑜𝑣 (𝑥 ,𝑦 )
𝑠𝑥 𝑠𝑦
where sx and sy are the sample standard deviations
of the two variables, and Cov(x,y) is the sample
covariance.
A useful rule to remenber is that a relationship exists
if │r│≥ 2
𝑛
x
12
30
15
24
14
18
28
26
19
27
213
y
20
60
27
50
21
30
61
54
32
57
412
Example: Aptitude test score and sales
(𝑥𝑖 -𝑥)
(𝑥𝑖 − 𝑥)2 (𝑦𝑖 -𝑦)
(𝑦𝑖 − 𝑦)2
-9,3
8.7
-6.3
2.7
-7.3
-3.3
6.7
4.7
-2.3
5.7
86.49
75.69
39.69
7.29
53.29
10.89
44.89
22.09
5.20
32.49
378.1
-21.2
18.8
-14.2
8.8
-20.2
-11.2
19.8
12.8
-9.2
15.8
449.44
353.44
201.64
77.44
408.04
125.44
392.04
163.84
84.64
249.64
2505.6
(𝑥𝑖 -𝑥) (𝑦𝑖 -𝑦)
197.16
163.56
89/46
23.76
147.46
36.96
132.66
60.16
21.16
90.06
962.4
Cov (x,y) = 𝑠𝑥𝑦 =
r=
𝐶𝑜𝑣 (𝑥,𝑦)
𝑠𝑥 𝑠𝑦
r│≥
2
𝑛
=
𝑛
𝑖=1
𝑥 𝑖 −𝑥 (𝑦 𝑖 −𝑦 )
𝑛−1
106.93
42.01 278.4
│0.989│>
2
10
=
= 0.989
= 0.63
962.4
10−1
= 106.93