Download Chapter 4: Continuous probability functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 4:
Continuous probability functions
4.1 Introduction
A continuous variable is a variable that can adopt any given value within a given area. Or,
between two values of a continuous variable there is always a third possible value. A
continuous value is always measured in a quantitative level (interval or ratio). Examples of
continuous variables can be: time, length, weight and so on. With a continuous variable, one
does not look at the probability of one particular value (which namely equals 0), but the
probability of a multitude of values (smaller than a certain value or larger than a certain
value). The most important continuous probability function, the normal distribution, is
explained in this chapter. We will also briefly pay attention to the exponential probability
function.
4.2 Normal probability function
A normal distribution is characterized by her mean µ, also called expectation value E[X],
and her standard deviation σ.
Graph 4.1: Normal distributions
Probability density
0.45
A
0.4
0.35
0.3
0.25
B
0.2
0.15
0.1
C
0.05
0
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 16
17
18
X
Graph A and B have an mean (µ) of 4 and a standard deviation (σ) of 1, respectively 2,
whereas the mean (µ) of graph 2 is 6, and the standard deviation (σ)is 3.
Graphs A, B and C are different but have certain similarities, since they are all the graph of a
normal distribution.
Characteristics of normal distributions are:
1. The probability density is bell-shaped and symmetrical. Values that are more than σ over
µ, occur just as often as values that are more than σ under µ.
2. The mathematical mean, the median and the mode are equal to each other.
3. A strong central tendency. Values close to the mean are the most occurring.
4. Relatively large and relatively small values seldom occur. 95.4% of all observations lie in
between two standard deviations of µ and 99.7% of them even lie between three
standard deviations of µ .
The probability density function f(X) of a normal distribution can be calculated with the
following formula:
f(X)
=
σ
1
e
2 π
−
1
2
(
X −µ
σ
)
2
Where:
σ: standard deviation of the population
µ: Arithmetical mean of the population
π: Mathematical constant approximately equal to 3.14159
e: : Mathematical constant approximately equal to 2.71828
When you want to calculate the probability that value X of a normal distributions lies between
the values a and b, P(a<X<b), you have to determine the surface area under the probability
density function between a and b.
f(X)
Graph 4.2:
Normal distribution with mean µ
and standard deviation σ
a
b
X
This surface area can be calculated by integrating the probability density function on the
interval [a,b] or by standardizing and using the standard normal probability table.
A normal distribution can be transformed to a standard normal distribution. A standard
normal distribution is a normal distribution where the mean µ = 0 and the standard deviation
σ = 1. This provides you with a so-called z-value, of which the probability can be found in a
standard normal table. The term z-value is further explained below.
Converting a normal distribution to a standard normal distribution, also called standardizing,
goes in two steps, which will be clarified by the normal distribution of graph B, with µ = 4 en σ
= 2.
1. The mean is subtracted from all the X-values (X- µ). This means that the graph will move
so that the mean will lie close to 0.
Graph
7.3:
distribution
with
mean
4
Grafiek
7.4: Normal
Normale
verdeling
met
gemiddelde
0
and
standard devisation 22
en standaardafwijking
Probability density
kansdichtheid
0,25
0,25
0,2
0,2
0,15
0,15
0,1
0,1
0,05
0,05
00
-8
-8
-7
-7
-6
-6
-5
-5
-4
-4
-3
-3
-2
-2
-1
-1
-0
-0
11
22
33
44
55
66
77
88
99
10 11
11 12
12
10
X
X
The new X-values are now divided by the standard deviation σ. So (X - µ) / σ. This way
you will get the standard normal distribution, where the mean equals 0 and the standard
deviation equals 1. With the standard normal distribution we no longer speak of X-values.
We now call them z-values.
The value of z indicated the number of times the standard deviation that the value X lies
separated from the value µ.
When the mean (µ) of a normal distribution equals 4 and the standard deviation (σ) equals 2,
the following applies:
With X = 6 the z-value equals 1 (6 lies 1 time the standard deviation above the mean of 4);
With X = 0 the z-value equals -2 (2 lies 2 times the standard deviation under the mean of 4);
With X = 3 the z-value equals – ½ (3 lies ½ times the standard deviation under the mean of
4);
With X = 8 the z-value equals 2 (8 lies 2 times the standard deviation above the mean of 4);
So the chance that X < 6, with a normal distribution with a mean of 4 and a standard
deviation of 2:
P(X < 6 | µ = 4 ; σ = 2) = P(z < 1) = 0.841310
4.2.1 Probabilities of normal distributions with Excel
Excel knows five functions concerning a (standard) normal probability distribution.
1. NORMDIST(x;mean;stand_dev;cummulative)
This function calculates the probability of a value smaller than X with a normal distribution
with parameters µ and σ.
When you fill in false at the last step you will get the probability density.
Working method:
8. Select the cell where you would like to let the normal probability be calculated;
9. Press Insert in the menu bar and press Function in the scroll menu.
10. With Or select a category, press Statistical, with Select a function press NORMDIST,
and then press OK;
You will get the following screen:
10
To be looked up in a table with probabilities of left crossover of the standard normal
distribution.
11. With X you fill in the value for which you would like the (left crossover)probability to be
calculated.
With Mean you fill in the population mean;
With Standard_dev you fill in the standard deviation of the population;
With Cumulative you fill in TRUE (you want to calculate the probability and not the
probability density)
Example
With a normal distribution with a mean of 8 and a standard deviation of 3, how big is the
probability that X is smaller than 6?
kansdichtheid
Grafiek 7.6:
Normale verdeling met gemiddelde 8
en standaardafwijking 3
0,14
0,12
0,1
0,08
0,06
0,04
0,02
0
-4 -3 -2 -1
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
X
P(X < 6 | µ = 8 ; σ = 3)
NORMDIST (6;8;3;TRUE) = 0.2525
The upper probability can be determined in two steps in Excel, by standardizing the
probability distribution first (determining the z-value, see function mentioned at II) and then
determining the left crossover probability of this z-value (see function mentioned at III).
1. STANDARIZE(x;mean;stand_dev)
This function can calculate the z-value of a normal distribution with parameters µ and σ.
Working method:
12. Select the cell where you would like to let the normal chance be calculated;
13. Press Insert in the menu bar and press Function in the scroll menu.
14. With Or select a category, press Statistical, with Select a function press
STANDARDIZE, and then press OK;
You will get the following insert screen.
d. With X you fill in the value for which you would like the z-value to be calculated;
With Mean you fill in the population mean;
With Standard_dev you fill in the standard deviation of the population;
Example
What is the z-value belonging to an X-value of 6 of a normal distribution with a mean of 8
and a standard deviation of 3?
NORMALIZING (6;8;3) = -0.6667
1. NORMSDIST(z)
This function calculates the probability of a value smaller than z (the left crossover
probability).
Working method:
3. Select the cell where you would like the left crossover probability of the z-value to be
calculated;
4. Press Insert in the menu bar and press Function in the scroll menu.
5. With Or select a category, press Statistical, with Select a function press
NORMSDIST, and then press OK;
You will get the following screen
6. With Z you fill in the value of z for which you would like the left crossover probability
to be calculated.
Example
What is the probability that z is smaller than -0.6667?
NORMSDIST (-0.6667) = 0.2525
1. NORMINV(probability;mean;standard_dev)
This function calculates the X-value belonging to the given chance with a normal
distribution with parameters µ and σ. So this function is the opposite of the function
NORMDIST (X;µ;σ), where the probability belonging to an X-value is calculated.
Working method:
1. Select the cell where you would like the X-value to be calculated;
e. Press Insert in the menu bar and press Function in the scroll menu.
f. With Or select a category, press Statistical, with Select a function press NORMINV,
and then press OK;
You will get the following screen
1. With probability you fill in a number between 0 and 1, for which probability you would
like to calculate the X-value belonging to this left crossover probability;
With mean you fill in the population mean;
With Standard_dev you fill in the standard deviation of the population.
Example
Which X-value belongs to a normal distribution with a mean of 25, a standard deviation of
4 and a chance of 0.75?
kansdichtheid
Grafiek 7.7:
Normale verdeling met gemiddelde 25
en standaardafwijking 4
0,12
0,1
0,08
0,06
0,04
75%
0,02
0
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
X
P(X < ? | µ = 25 ; σ = 4) = 0.75
NORMINV (0.75;25;4) = 27.70
1. NORMSINV(probability)
This function calculated the z-value belonging to the given chance. So the function is the
opposite of the function NORMSDIST (Z), where the probability belonging to a Z-value is
calculated.
Working method
1. Select the cell where you would like the Z-value to be calculated
2. Press Insert in the menu bar and then press Function in the scroll menu.
3. With Or select a category, press Statistical, with Select a function press NORMSINV,
and then press OK;
You will get the following screen.
Example
What is the z-value belonging to a probability of 0.75?
NORMSINV(0.75) = 0.6745
Don't forget to use the following probability rules with normal distributions.
a. P(X < a) = P(X < a)
b. P(X > a) = 1 – P(X < a)
c. P(a < X < b) = P(X < b) – P(X < a)
P(X = a) = 0
complement rule
Below you will find an Excel worksheet that you can use for the calculation of left- and right
crossover probabilities, interval chances and the determining of the X-value with a given
probabilities, with normal distributions. In cell A7, amongst others, you will encounter the sign
“&”. This means that when the X-value (in cell B5) changes, it will change in this cell too.
Calculating normal probabilities
Arithmetic mean
Standard deviation
First X-value
Left crossover probability
="P(X<="&B5&")"
Right crossover probability
="P(X>="&B5&")"
Interval
Second X-value
="P(X<="&B11&")"
="P("&B5&"<X<&B11&")"
Find X-value
Cumulative percentage
X-value
xxx
xxx
xxx
=NORMDIST(B5,B3,B4,TRUE)
=1-B7
Xxx
=NORMDIST(B11,B3,B4,TRUE)
=ABS(B12-B7)
.xx
=NORMINV(B15;B3;B4)
ABS is a mathematical function and takes the absolute value of a number.
When inserting a arithmetical mean of 75, a standard deviation of 6, a first X-value of 69, a
second X-value of 81 and a cumulative percentage of 10%, you will get the following
execution in Excel.
Calculating normal probabilities
Arithmetic mean
Standard deviation
First X-value
Left crossover probability
P(X<=69)
Right crossover probability
P(X>=69)
Interval
Second X-value
P(X<=81)
P(69<X<81)
Find X-value
Cumulative percentage
X-value
75
6
69
0.158655254
0.841344746
81
0.841344746
0.682689492
0.1
67.31069061
When you change the value in cell B5 to 72, the following will appear in cell A7: P(X<=72)
4.3 Exponential probability distributions
The exponential probability distribution is used with waiting time theories to model the time
between two arrivals.
To calculate the probability with an exponential distribution that a next arrival takes place in
between a certain time X, you can make use of the following formula:
P(arrival time < X) = 1 – e
-λ x
Where:
e: mathematical constant when approached equal to 2.71828
λ: the average of arrivals of the population
X: a continuous variable where 0 < X < + ∞
The exponential probability distribution is only determined by one parameter, the mean λ,
which equals the average number of arrivals per time unit. The average time between two
arrivals will subsequently equal 1/λ. For instance when the average number of arrivals equals
5 per hour, the time between two arrivals will be 1/5 of an hour, or 12 minutes.
Graph 4.9: Exponential distribution using different means
P(< X)
1,2
1
0,8
Mean 5
0,6
Mean 10
Mean 20
0,4
0,2
0
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
X
`
Example
When an average of 30 customers per hour come to an office window at an NS station, how
big is the probability that, when one customer has just arrived at the office window, a next
customer will arrive within three minutes?
Make sure you use the same time units!
λ (lambda) = 30 per hour = 0.5 per minute
X = 3 minutes = 0.05 hour
Unit minutes:
Unit hours:
P(X < 3) = 1 – e-3*0,5 = 1 – e-1,5 = 1 – 0.2213 = 0.7769
P(X < 0.05) = 1 – e-0.05*30 = 1 – e-1,5 = 1 – 0.2213 = 0.7769
4.3.1 Probabilities of exponential distributions with Excel
Excel uses the following function to calculate exponential probabilities
EXPONDIST(x;lambda;cumulative)
Working method:
1. Select the cell where you would like to calculate the exponential chance,
2. Press Insert in the menu bar and press functions in the scroll menu.
3. With Or select a category, press Statistical, with Select a function press EXPONDIST,
and then press OK;
You will get the following screen:
4. With X you fill in the value for which you would like to calculate the (left
crossover)probability.
With Lambda you fill in the mean of the exponential distribution.
With Cumulative you fill in TRUE.
Example
When an average of 20 people per hour come to an office window of an NS station, how big
is the chance that the next customer will arrive within 6 minutes?
Make sure you use the same time units!
λ (lambda) = 20 per hour
x = 6 minutes = 0.1 hour
P(X < 0,1 | λ = 20) = EXPONDIST(0,1;20;TRUE) = 0.8647
Again, use the formerly mentioned probability rules when you want to calculate other
probabilities than the left cross over probability.
Example
We will continue with the previous example of an office window at an NS station. How big is
the probability now, that it takes longer than six minutes for the next customer to arrive?
P(X > 0.1) = 1 - P(X < 0.1) = 1 – 0.8647 = 0.1353
Below you will find an Excel worksheet that can be used for the calculation of left- and right
crossover probabilities with exponential distributions.
Calculating exponential probabilities
Arithmetic mean
X-value
xx
xx
Left crossover probability
="P(X<="&B4&")"
=EXPONDIST(B4,B3,TRUE)
Right crossover probability
="P(X>="&B4&")"
=1-B7
When you fill in 20 as a mean in cell B3, and you fill in 0.1 as an X-value in cell B4, you will
get the following result:
Calculating exponential probabilities
Arithmetic mean
X-value
20
0.1
Left crossover probability
P(X<=0.1)
0.864664717
Right crossover probability
P(X>=0.1)
0.135335283
4.4 Assignments
A private delivery service, that delivers “Metro” to a number of NS stations in the morning,
takes an average of two hours to do this, with a standard deviation of ten minutes.
Assuming that the delivery time is normally distributed, determine:
a. The probability that, at any given day, the private delivery service finishes deliveries
within 1 ½ hours.
b. The probability that, at any given day, the private delivery service takes longer than 2
¼ hours to deliver “Metro”
c. 'The probability that, at any given day, the delivery of “Metro” takes a minimum of 1 ¾
hours and a maximum of 2 ¼ hours;
d. The duration where the probability equals 75%,
e. Also that the delivery happens within this time span.
An average of 23 travelers per hour during morning rush hours, come to a ticket machine
at a certain NS station. When the arrival of the train travelers to this ticket machine is
exponentially distributed, determine the probability that:
4. The next traveler will arrive within five minutes
5. The next traveler will arrive after ten minutes
6. It takes a minimum of three minutes, and a maximum of seven minutes, for the next
traveler to arrive
The times that train travelers take to get a ticket out of the ticket machine is normally
distributed with a mean of two minutes and a standard deviation of 15 seconds.
Determine:
a. The probability that a random traveler finishes within 1 ½ minutes at the ticket
machine;
b. The probability that a random traveler needs more than two minutes and twenty
seconds to get a ticket out of the ticket machine;
c. the probability that a random traveler needs a minimum of two minutes and a
maximum of 2 ½ minutes to get a ticket out of the ticket machine;
d. 'The time between which 95% of the train travelers get a ticket out of the ticket
machine;
The central information service of the NS about travelling times, is on average being
called 18 times during the morning rush hour. The morning rush hour takes one hour.
When the arrival of phone calls at the central information service is exponentially
distributed, determine:
a. The probability that the next phone call will come in within three minutes;
b. the probability that it takes more than five minutes for the next phone call to come in;
c. The probability that the next phone call comes in between two and four minutes;
The company VeelInvest BV has two investment projects P1 and P2. They both have an
expected profitability of 15% i.e. 0.15. The standard deviation for P1 equals 0.02 and for
P2 it equals 0.06.
1. How big is the probability for project P1 that the actual profitability is lower than 10%?
2. How big is the probability for project P2 that the actual profitability is lower than 10%?
3. Which project should be favored when the risk needs to be controlled as much as
possible?
4. VeelInvest would want to take a risk of 80%. What is the profitability of both projects
with that risk? Which project is preferabele in this case?
Chapter 5: Singular linear regression- and correlation analysis
5.1 Introduction
Regression-analysis is about developing models, which explain the relation between a
variable to be clarified, and one or several clarifying variables, with the purpose of being able
to make a prediction for a variable to be clarified with the help of the clarifying variable (n).
A variable to be clarified in a regression model is also called the dependent variable, and the
clarifying variable (n) the independent variable (n).
So the “Price” which advertisers want to pay for an ad in “Spits” will depend on the “number
of prints” of “Spits”. The “Price” in this case, is the dependent (to be clarified) variable, and
the “Number of prints” of “Spits” is the independent (clarifying) variable.
In practice, the variable to be clarified will often depend on more than one clarifying variable.
If more than one clarifying variable is incorporated in the investigation, we speak of multiple
regression-analysis.
When the investigation confines itself to one clarifying variable, we speak of singular
regression-analysis. One then works by the ceteris paribus condition, which means that all
the remaining clarifying variables that are not incorporated in the investigation, are assumed
to be constant.
Correlation analysis is about investigating the strength of the relation between two
variables. With correlation- as well as with regression-analysis, the variables should be
quantitative. When one or several variables are qualitative, another statistical technique will
have to be used for the investigation of the relation between the variables.
5.2 Regression models
In Fiction2000 there is a relation between the variable “Age” and the variable “Income”. The
“Income” is dependent on the “Age”, so the variable “Income” is shown on the vertical (Y)
axis, and the variable “Age” on the horizontal (X) axis.
The relation between variables can be described by very simple as well as very complicated
mathematical functions. The most simple form is a straight line. The mathematical function of
a straight line is:
y=a*x+b
Where:
x : Independent (clarifying) variable
y : dependent variable (to be clarified)
a : slope (the increase of units Y when X increases with one unit)
b : intersection with the Y-axis (the value of Y, when X equals 0)
In graph 5.1 we see that the linear regression line between the variables 'Age' and 'Income'
can be presented as follows:
y = 0.4935 x – 2.2733
When someone is one year older the income will be 0.4935 * €100 = € 49.35 higher.
Graph 5.1: Dot diagram of age and income
Income (* € 100)
40
35
y = 0.4935x – 2.2733
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
Age
Whether you should use a linear function for your model or a more complex mathematical
function depends on the distribution of the (x,y)-values in your dot diagram. Examples of
different forms of regression (relations) between x and y can be seen in the following graphs.
With graph A the values of y increase more or less linearly proportional to the increase of the
values of x. An example of this can be seen in graph 8.1, where the relation between 'Age'
and 'Income' is presented.
With graph B the values of y decrease more or less linearly in proportion with the increases
in the values of x. A declining regression line means that the relation is negative. An example
of a negative linear relation is for instance the relation between the 'Price of a product' and
the 'Sales of that product'.
With graph C there is no relation. High and low y-values can be found with all the x-values.
With graph C the values of y increase, when the x-values increase. In the beginning these
increases are more than proportional, at the end less than proportional. An example of a
positive skew-lined relation is 'Expenses for the commercials for a product' and the 'Sales of
that product. In the beginning the sales will increase considerably until one point where you
have reached your entire target group with your commercial, and the sales stay more or less
constant.
With graph F the values of y decrease, when the x-values increase. In the beginning these
decreases are more than proportional and at the end they are less than proportional. As an
example you may think of the relation between the variables 'Depreciations' and 'Years'. The
reduction in value of cars is considerably more in the first years, than later on.
With graph F the values of y initially decrease, when the x-values increase. Subsequently the
y values increase, when the x-values increase. Here you may think of the relation between
the variables 'Time' and the 'Number of mistakes someone makes at a certain job'. By
practice you will quickly make less mistakes, but is you practice a certain task for too long the
number of mistakes will increase again, by fatigue or boredom that may occur.
Graph A: Positive linear correlation
Y
Y
Graph B: Negative linear correlation
X
X
Graph
D: Positive
nonlinear correlation
Grafiek
C: Geen
verband
Y
Y
X
X
Graph E: Negative nonlinear correlation
Graph F: U-shape correlation
Y
Y
X
X
5.3 The linear regression model
When you look at graph 5.1 of graph A, not all points seem to be on one straight line.
However, you can still draw a line in such a way, that many of these points lie fairly close to
this line. The line that describes the relation between x and y best, is called the regression
line.
The regression line is determined by the smallest squares method. According to this
method, the regression line is that line, where the sum of the squares of the vertical
distances of the points from the dot diagram to that line is minimal. Just like with variance, we
look at the squared distances, because there are points above as well as below the line.
The vertical distance of a point from the dot diagram to the regression line is called a residue.
These residues are squared and added up. The regression line therefore, is the line for
which the residual squares sum is minimal.
Graph 5.2: Dot diagram with regression line
Y
y = ax + b
y5
r5
y2
r4
y4
r2
r3
y3
r1
y1
X
The vertical distance between a point from the dot diagram (yi) and the regression line (ax +
b) are presented in graph 8.2 by ri. The following counts:
r1 = y1 – ax1 – b ; r2 = y2 – ax2 – b ; r3 = y3 – ax3 – b ; and so on.
All these residues then have to be squared and added up.
n
2
S = ∑ ( y i - ax i - b)
i=1
S is a function of two variables a and b. To find a solution for the minimum of S, you will have
to partially differentiate the function. This is very mathematical and therefore left out. The
minimum has to be found for the following values of a and b.
a =
1 n
∑ xiyi − x * y
n i=1
2
σx
b = y −a* x
5.4 Clarified and nonclarified variance.
Before you can use the model to do predictions for the dependent variable by using the
independent variable, you will have to investigate whether the model is fit for this purpose. A
linear model is fit, when the observed points lie not too for from the linear regression line. For
this you will need to use a measure that is fit to measure the distance of these different
points to the regression line. Such a measure is called the determination coefficient (r2) and
is defined as follows:
2
r
=
clarified variance
total variance
With the total variance, we look at the sum of the squared distances between the observed
y-values and the mean of y ( y ).
With the clarified variance we look at the sum of the squared distances between the values
of y on the regression line and the mean of y ( y ).
Next to this there is also the sum of the squared distances between the values of y on the
regression line and the observed y-values, where you look at the nonclarified variance. Or,
the reason that not all observed values of y are equal to the mean of y, you can partially
explain by the relation between x and y by using the regression line. But not all points are on
the regression line, at the cause of other (nonclarified) causes.
Graph 5.3: Levels of variance with
regression
Y
yi
y
Unclarified variance
Clarified variance
Total variance
y
xi
X
The SST or sum of squares total, equals the sum of the squared distances between yi and
y.
n
SST =
∑ ( y i− y )
i=1
2
The SSR or sum of squares regression equals the sum of the squared distances between y^
(point on the regression line) and the mean of y( y ).
n
SSR
^
= ∑ ( y i− y )
2
i= 1
The SSE or sum of squares error, equals the sum of the squared distances between the
observed y-values (yi) and the y-values on the regression line ( y^ ).
n
SSE
=
^
∑ ( y i − y i)
2
i= 1
For the squares sums, the following applies:
SST = SSR + SSE
When you divide the squares sums by n-1, you speak of variance, and the following applies:
Total variance = clarified variance + nonclarified variance
In other words: the variance of y can be split up in a part clarified variance, so variance
caused by the relation with x, and a part own (nonclarified) variance dependent on the
relation with x and probably caused by other factors that were not incorporated into the
investigation.
How much value you should attach to the determination coefficient depends on the number
of researched points. When you only incorporate two points in your investigation, this will
produce an r2 of 100%, because there always goes a straight line through only two points.
5.5 Linear regression with Excel
With Excel you can determine the equation of the linear regression line by three different
methods:
1. By Functions;
2. By Graphs with dot diagram;
3. By Data analysis with Regression;
5.5.1 Linear regression with functions SLOPE and INTERSECTION
Singular linear regression analysis is a technique to determine the linear (straight-lined)
relation between two quantitative variables with the aim of predicting the size of an
dependent variable (y) by the size of another independent variable (x).
This linear relation can then be described by the function:
y = ax + b
We will use the variables 'Age' and 'Income' from the file Fiction2000 as our example. You
could expect that one gets a higher income once one gets older. The independent variable
(x) therefore is 'Age' and the dependent variable (y) is 'Income'.
To determine the regression line, the slope (a) and the intersection with the y-axis (b) have to
be determined.
Working method:
1. Open the file “Fiction2000”;
2. Open an empty worksheet and type Slope (a) in cell A1 and Intersection (b) in cell A2.
3.
Type the following in cell A3: ="y="&ROUND(B1,2)&"*x+"&ROUND(B2,2)&"" 11
4. The value of the slope is calculated in cell B1, by the statistical function SLOPE (y,x);
In this example y is the 'Income' (dependent on 'Age) and x is 'Age'(independent variable)
in this example.
Select cell B1. Press Insert in the menu bar and then press Functions in the scroll menu.
With Or select a category, press Statistics, with Select a function press DIRECTION and
finally press OK.
11
line.
The mathematical function ROUND is used here to avoid too many decimals in the regression
With Y, select the matrix of the dependent variable 'Income' so Data!F2:F301
With X, select the matrix of the independent variable 'Age' so Data!C2:C301
In cell B1, the value of a will appear: 1.000109128;
5. In cell B2, the value of the intersection with the y-axis is calculated with the statistical
function: INTERCEPT(y,x).
Select cell B2. Press Insert in the menu bar and press Functions in the scroll menu.
Press Statistical with Or select a category, and select INTERCEPT with Select a function.
Then press OK. You will get the same completion screen as with SLOPE (4).
With Y, select the matrix of the dependent variable 'Income' so Data!F2:F301
With X, select the matrix of the independent variable 'Age' so Data!C2:C301
In cell B2, the value of b will appear: -5.126899282;
6. In cell A3, you will get the function that described the linear relation between the income
and the age: y=1*x+-5.13 or y = 1x – 5,13.
An increase of age with 1 year, means an increase of income with 1 * € 100.
Slope (a)
Intersection
(b)
1.000109
-5.1269
y=1*x+-5.13
Linear regression is used to be able to, by use of a function, make a prediction for the
dependent variable y (here that's 'Income') using the independent variable x (here that's
'Age')
Excel knows the following function for making predictions: FORECAST(x; known_y;
known_x);
For instance if you want to predict the income when a train traveler in the morning rush hour
has an age of 34 years old, you can do this with Excel as follows:
1. In cell A5 type Age, and in cell A6 type Predicted income (* € 100);
2. Select cell B6. Press Insert in the menu bar and then press Functions in the scroll menu.
With Or select a category, press Statistics, with Select a function press FORECAST and
finally press OK. You will get the following completion screen.
3. With X, fill in: B5, with known_Y F2:F301 and with known_X C2:C301 and press OK
4. Select B5 and fill in 34. You will get the following result
Slope (a)
Intersection (b)
1.000109
-5.1269
y=1*x+-5.13
Age
Predicted
income
34
28.87681
The age in cell B5 can be changed, whilst the predicted value of the income in cell B6 will change at the same
time.
5.5.2 Linear regression analysis by using the graph; dot diagram
Also when using a dot diagram you can perceive the equationof the regression line, which
desribes the relation between 'Age' and 'Income'.
Working method:
1. Open the file “Fiction2000”;
2. Determine which one of the two variables is the dependent variable;
the dependent variable is 'Income';
3. Select the values of the variable 'Income', which means Data!$F$2:$F$301;
4. Press Insert and then press Chart;
5. In Step 1 of 4: From all the different chart types, select Scatter, then select the first
Subtype, and press Next;
6. In Step 2 of 4: Press on the tab of Series and with X-values, select the values of the
(independent) variable 'Age', this means = Data!$C$2:$C$301 and press Next;
7. With Step 3 of 4: Press in the tab or Titles and fill in the titles. Press the tab of Legend
and uncheck the presented legend, press Next;
8. With Step 4 of 4: Select the place where you would like the chart. You will get the
following chart:
Income (* € 100)
Graph 5.4: Dot diagram of income and age
40
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
Age
Source: “Fiction 2000”
To get the regression line you will have to press Chart in the menu bar and then press Add
Trendline on the scroll menu.
You can also get this by a right click on one of the dots presented in the chart. You will get
the following screen:
Move your cursor to Linear and press the tab Options, you will get the following screen:
Check Display equation and Display R-squared value, and press OK. You will get the
following result:
Income (* € 100)
Graph 5.4: Dot diagram of income and age
40
y = 0.4935x – 2.2733
R2 = 0.6635
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
Age
Source: “Fiction2000”
R2 is the determination coefficient12. This is a measure for the fraction of the variance of the
'Income' that is dependent on 'Age'. Or a change in the income can be 66.35% explained by
the change in ages. 33.65% of the change in income then depends on other factors that are
not investigated in this case, for example education, the branch where one is active etc.
5.5.3 Linear regression analysis by using Data analysis: Regression
A third method to receive the regression line is by Data Analysis from the menu Tools. Press
the tab Data of the file Fiction2000. With Tools, Data Analysis, choose Regression.
12
The determination coefficient can also be calculated by the statistical function R.SQUARE(y,x)
You will get a screen, which you should complete as follows:
The SUMMARY OUTPUT on the next page can be divided up in 4 parts:
I
Data for the regression: From this you can deduce a.o. the correlation coefficient ®
and the determination coefficient (R-square).
II
Variance-analysis
III
Regression line: here you can see the intersection with the y-axis (b) and the slope
(a) of the regression line.
IV
Malfunctions: here you can see the predicted income for the 300 different
observations based on the regression line, and how far they lie from the regression
line (malfunctions).
Furthermore you will get a dot diagram and a Chart of the malfunctions. See below:
Leeftijd Grafiek met storingen
40
20
35
15
10
30
Inkomen (* € 100)
25
20
Voorspeld
Inkomen (* € 100)
15
10
Storingen
Inkomen (* € 100)
Leeftijd Grafiek voor regressielijn
5
0
-5 0
20
40
-10
-15
5
-20
0
0
50
Leeftijd
100
-25
Leeftijd
60
80
5.6 Correlation-Analysis
The strength of the relation between two variables in a population is generally measured by
the correlation coefficient r. The meaning of r can be read in the table below:
Value of r
-0,2 < r < 0,2
0,2 < r < 0,4 or
0,2
0,4 < r < 0,7 or
0,4
0,7 < r < 0,9 or
0,7
0,9 < r < 1 or
r =1
or
–0,4 < r < -
Strength
Ignored correlation
weak correlation
–0,7 < r < -
average correlation
–0,9 < r < -
Strong correlation
–1 < r < -0,9 Very strong correlation
r = -1
Entire correlation
A positive correlation means that an increase of the independent (clarifying) variable has an
increase of the dependent (to be clarified) variable as its consequence. A negative
correlation that an increase of the independent variable has a decrease of the dependent
variable as its consequence. This corresponds to a positive- respectively negative slope of
the regression line. With entire correlation, all points of the dot diagram are situated on the
regression line. The strength of the correlation is of course, also determined by the reliability
of r, or by the number of points of the dot diagram.
5.6.1 Correlation-analysis with Excel
To calculate the correlation coefficient (r) Excel knows the statistical function CORREL.
Other than the linear regression analysis, it doesn't matter for the calculation of the
correlation coefficient which you select as your independent or dependent variable. To
calculate the correlation coefficient in the example of the ages and incomes, you can choose
for matrix1 Data!C2:C301 and for matrix2 Data1!F2:F301.
This will produce an r-value of 0.8289 (strong positive correlation). Switching the two
matrices has no influence on the value of r.
When you execute a singular linear regression analysis by using a dot diagram, the
correlation coefficient is not displayed, but it can be calculated by calculating the root of the
determination coefficient.
With regression analysis by Data analysis, the correlation coefficient is displayed. See
chapter 8.5.3
5.7 Linear regression with time ranges
With time range analysis as well, linear regression is often applied, where the time (for
instance the year) is seen as the independent variable.
Take the data of the turnover of inland transport over the years 1983-1996 as an example.
The Dutch Railways expect a linear relation between the years (independent variable) and
the achieved turnover (dependent variable). To find this linear relation you could perhaps
construct a dot diagram.
Working method:
1. Open the file “N.S. Traveler Transport”;
2. Press Insert and press Chart (or the chart icon);
3. In Step 1 of 4: Choose Scatter from all the different chart types, then select the first
Subtype and press Next.
4. In Step 2 of 4: Select =Data!$D$2:$D$15 with Data reach (the dependent variable
'Turnover'). Press the tab of Series and select =Data$A$2:$A$15 as X-values
(the independent variable 'Year') and press Next;
5. In Step 3 of 4: Fill in the titles, uncheck show legend and press Next;
6. In Step 4 of 4: Select Chart1 As new sheet, and press Finish.
7. Right click on one of the dots, press Add Trendline, select Linear from the types, and
press the tab Options. With options, check Display equation in chart and Display Rsquare in chart and press OK.
Turnover* 1 million
Graph 5.5: Dot diagram of the turnover of the N.S. over the years 1983-1996
2500
y = 87.27x - 172265
R2 = 0.9378
2000
1500
1000
500
0
1982
1984
1986
1988
1990
1992
1994
1996
1998
Source: Dutch Railways N.V.
By using the statistical function TREND(known_y’s; known_x’s ;new_x’s ;nst)) you can make
predictions for the coming years based on the (linear) regression line all at once.
When, for instance, you want to make a prediction for the turnover of the national transport
for the years 1997 until 2002, first insert the concerning years in the cells A16:A21 (so in
A16: 1997, in A17: 1998, and so on). Now select the cells for which you would like to
calculate the predicted turnover. So for example you could select the cells D16:D21.
Press Insert on the menu bar and press Function in the scroll menu. Press Statistics and
then press TREND.
With known_Y fill in D2:D15, with known_X fill in A2:A15 and with new_X A16:A21. With
Const, you can fill in nothing, or TRUE. See the following screen:
Do not press OK, but simultaneously press the Ctrl- Shift- and the Enter key. You will get the
following result:
Turnover national
Year Number of trips Kilometers traveled transport
1983
200
8886
1984
203
8997
1985
206
9007
1986
210
8919
1987
222
9396
1988
230
9664
935
970
1022
1033
1072
1126
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
240
256
330
333
320
312
305
306
10162
11060
15195
14980
14788
14439
13977
14091
1184
1285
1449
1553
1631
1755
1986
2033
2014.098901
2101.369231
2188.63956
2275.90989
2363.18022
2450.450549
Obviously, it is better you round the values in the cells D16:D21 to whole numbers,
5.8 Non linear regression models
With all the described relations you have presumed a straight-lined (linear) relation between
two variables. Often you deal with a non-linear relation. Like for instance, the chart of the
product life cycle (introduction, growth, maturity, saturation, downfall), where the turnover is
dependent on the time. This chart is certainly not straight-lined, but it shows an obvious
polynomial character.
Using Excel it is relatively simple to, when using a dot diagram, find other non linear
regression models (polynomial, exponential, logarithmic, power). When making up the trend
line you will then have to choose a different Type.
As an example you can see a polynomial relation between the variable 'Age' and 'Income' below, taken from the
Graph 5.6: Dot diagram of age and income from 300 respondents
y = -0.0133x2 + 1.4551x – 17.444
R2 = 0.741
Income (* € 100)
40
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
-5
Age
data from the file “Fiction2000”.
Source: Fiction2000
When you compare the determination coefficient from this polynomial model with the one
from the linear model, you will come to the conclusion that this polynomial model gives a
better description of the relation between the variables 'Age' and 'Income'.
5.9 Predictions
The value of a prediction depends on:
1. Whether you are doing a prediction on a value that lies within the reach of x-values, so
between the lowest and the highest observation of x (interpolating), or that lies outside of
the reach (extrapolating). Interpolating can provide you with a good prediction.
Extrapolating, especially when the value is far out of the reach, generally does not. So
predicting an income of a train traveler in the morning rush hour with an age of 45 years
old (interpolating) based on the regression line is more reliable than predicting the
income of a train traveler in the morning rush hour with an age of 70 years old
(extrapolating).
2. The determination coefficient (r2). The closer it lies to 1, the more correct the prediction
is.
3. The number of points in the dot diagram. In the example the regression line is based on
300 observations (points). When you would for instance let the regression line be
determined based on the first 10 respondents (points) you will get a higher determination
coefficient, but the 95% reliability interval before the intersection with the y-axis and the
slope will become considerably larger.
Number of
observations
300
10
Determination Intersection
-coëfficiënt
Lowest
Hightest
95%
95%
0.663
-3.674
-0.873
0.831
-14.673
1.701
Slope
Lowest
95%
0.453
0.419
Highest
95%
0.534
0.905
5.10 Assignments
1. One wants to investigate a possible correlation between the variable 'Income' and the
variable “Traveling time” by using the date of the file Fiction2000. The expectation namely
is, that people who have a higher income are prepared to travel further for this job.
b. Investigate the correlation between 'Income' and 'Traveling time'.
c. Determine the linear regression line that describes the relation between the
(independent) variable 'Income' and the (dependent) variable 'Traveling Time'.
d. What traveling time would you expect for someone with an income of 20 (* € 100)
based on the regression line?
e. How do you feel about the reliability of the prediction done in section c?
2. One wants to investigate a possible correlation between the variable “Kilometers
traveled” and the variable 'Turnover national transport' by using the data over the years
1983 until 1996 of the file “N.S. traveler transport” The expectation is that when the
number of traveler kilometers rises, the turnover will rise as well.
a. Determine the correlation between 'Kilometers traveled' and 'Turnover national
transport'.
b. Determine the linear regression line that describes the connection between the
variable 'Kilometers traveled' and the variable 'Turnover national transport'.
c. Based on the correlation between the variable 'Year' and 'Kilometers traveled', one
expects travelers to travel 17.633 (* 1 million) kilometers with NS in the year 2000.
What turnover can be expected for 2000, based on the linear relation between the
variables 'Kilometers traveled' and 'Turnover national transport'?
3. One wants to do a prediction for the number of trips that is made with the N.S. in the
years 1997 until 2000 based on the possible linear relation between the variable 'Year'
and the variable 'Number of trips' based on the data over the years 1983 until 1996 from
the file “N.S. Traveler transport”,
a. Determine the correlation between 'Year' and 'Number of trips',
b. Determine the linear regression line that describes the relation between the variable
'Year' and the variable 'Number of trips'.
c. Make a prediction based on the linear relation between 'Year' and 'Number of trips'
for the years 1997 up and including 2000.
d. What comments can you make concerning the prediction?
The cost of capital i for investment projects is determined by:
ri = rf + ßi (rm - rf)
Here counts:
rf = risk free rate, the fee for risk free deposits like government bonds.
ßi = the systematic risk of investing
rm = profitability of the market wallet
The systematic risk ßi shows in what matter the demanded profitability ofa deposit reacts to
the chance of the profitability of the market wallet. Apply this to the following situation:
4. The profitability of the share Superinfest compared to the market data is presented in the
table below:
Profitability of the market wallet and share Superinfest
month
Profitability
1
2
3
4
5
6
Market wallet
in %
Superinfest
in %
7
8
9
10
11
12
13
14
15
11
9
11
14
16
12
8
10
7
11
6
4
5
8
10
13
12
11
13
18
15
10
13
10
14
10
8
7
8
9
Furthermore it is a given that the risk free rate equals 7%,
a. Determine the systematic risk ßi when the angle of inclination of the profitability of
Superinfest is put up against the one of the Market wallet. Note: what is the
dependent variable (the y-axis) and what is the independent variable (the x-axis)?
b. A project is started with the profitability when the market wallet equals 12%. What is
the cost of capital of the project?
c. Same scenario but when the profitability the market wallet equals 9%. What is the
cost of capital of the project?
d. Same scenario but when the profitability the market wallet equals 6%. What is the
cost of capital of the project?
e. What is the difference between the cost of capital with a profitability the market wallet
12%, 9% and 6%? What will the decision of the management be?
Bijlage: Antwoorden
Annex: Answers
Answers Chapter 2: Characteristics
1. Income
Arithmetic means
Mathematical average
Median
Mode
13.9
14
18
Measures of diversion
Quartile distance
Standard deviation
10
7.5
Form measures
Skewness
Kurtosis
0.11
-0.42
2. Traveling time
First sort the data in Fictie 2000 by gender
Arithmetic mean
Mathematical average
Median
Mode
Male Female
39.8
35.7
35
35
23
22
Measures of diversion
Quartile distance
Average absolute
deviation
Standard deviation
32
20.2
28
14.4
26.2
18.1
Form measures
Skewness
Kurtosis
1.18
1.56
0.56
-0.13
99
3. Number of travel days per train per week
Arithmetic means
Mathematical average
Median
Mode
Male Female
4.4
4.1
5
4
5
4
Measures of diversion
Minimum
Maximum
Variance width
First quartile
Third quartile
Quartile distance
Average absolute
deviation
Standard deviation
Variance
Variance coefficient
1
5
4
4
5
1
0.7
2
5
3
4
5
1
0.7
0.9
0.7
19%
0.9
0.7
21%
Form measures
Skewness
Kurtosis
-1.69
2.94
-0.56
-0.54
4.Traveling time
Traveling time by
train (in min.)
Class median
0 -< 20
20 -< 30
30 -< 40
40 -< 50
50 -< 60
60 -< 100
100 -< 150
Total
68
56
57
25
45
38
11
300
Mathematical average
Median
30+
First quartile
20+
Third quartile
50+
Quartile distance
40.3
34.6
21.3
67.6
33.0
Variance
Standard deviation
f*(median
average)^
2
Number
Cum.
reatief
f*median
10
#
22.67%
680
#
62430.12
25
#
41.33%
1400
#
13109.04
35
#
60.33%
1995
#
1601.13
45
#
68.67%
1125
#
552.25
55
#
83.67%
2475
#
9724.05
80
#
96.33%
3040
#
59891.42
125
#
100.00%
1375
#
78914.99
12090
226223
756.6
27.5
5 Risks of projects:
P1
P2
P3
Weighted average
€ 74,000
€ 53,000
€ 53,000
Variance
€ 3,684,000,000
€ 1,281,000,000
€ 201,000,000
Standard
deviation
€ 60,696
€ 35,791
€ 14,177
The first project could bring in the most but the revenues are not guaranteed.
It could bring in just as well €130,000 as 14,000.
The other two projects bring in the same amount of money; the only difference is that the
risk is smaller for the third project.
If the risk should be as low as possible, the third project would be favorite.
1
6 Risks of projects:
Weighted average
P1
P2
P3
€ 62,000
€ 60,000
€ 60,000
Variance
€ 2,676,000,000
€ 3,775,000,000
€ 5,488,000,000
Standard
deviation
€ 51,730
€ 61,441
€ 74,081
P2 and P3 bring in the same amount of money, but the uncertainty of P3 is bigger
because of the larger standard deviation. As far as risk management is concerned, a
combination of P1 and P2 is favorite.
1
Answers Chapter 3: Probabilities
Count of Daily
paper
Daily paper
Total
Algemeen Dagblad
8.3%
NRC
4.0%
Telegraaf
16.7%
Trouw
6.3%
Volkskrant
20.0%
Other
17.7%
None
27.0%
Total
100.0%
Count of Daily paper
Gender
Daily paper
Algemeen Dagblad
NRC
Telegraaf
Trouw
Volkskrant
Other
None
Total
Male
Female Total
6.0%
2.3%
8.3%
3.0%
1.0%
4.0%
14.7%
2.0%
16.7%
5.3%
1.0%
6.3%
13.3%
6.7%
20.0%
12.3%
5.3%
17.7%
19.7%
7.3%
27.0%
74.3%
25.7%
100.0%
Answer 1a
Answer 1b
Count of age
Age2
< 20
>= 20
Total
Answer 1c
20.0%
Total
15.0%
85.0%
100.0%
85.0%
14.7%
Count of income (* € 100)
Income (* € 100) 2
< 15
>= 15
Eindtotaal
Answer 1d
33.0%
Count of daily
paper
Daily paper
Total
Algemeen Dagblad
10.7%
NRC
2.0%
Telegraaf
25.3%
Trouw
3.3%
Volkskrant
7.3%
Other
12.0%
None
39.3%
Total
100.0%
Count of opinion Spits
Answer 2a
Answer 2b
Answer 2c
Count of age
Opinion Spits
Very bad
Bad
Reasonable
Good
Very good
Total
10.7%
36.0%
Age2
< 30
>= 30
4.0%
6.0%
21.3%
23.3%
8.7%
63.3%
Gender
Male Female Total
33.0% 17.7%
50.7%
41.3%
8.0%
49.3%
74.3% 25.7%
100.0%
Opinion Spits
Very bad
Bad
Reasonable
Good
Very good
Total
1
8.0%
11.3%
34.0%
34.7%
12.0%
100.0%
19.3%
Total
4.0%
5.3%
12.7%
11.3%
3.3%
36.7%
Total
8.0%
11.3%
34.0%
34.7%
12.0%
100.0%
Answer 2d
11.3%
1
3. a.
b.
c.
d.
P(Metro | Spits) = 73 / 150 = 48.7%
P(Spits | Metro) = 73 / 163 = 44.8%
P(Metro en Spits) = 73 / 300 = 24.3%
P(Metro of Spits) = (300-60) / 300 = 80.0% of 163 / 300 + 150 / 300 – 73 / 300 =
240 / 300 = 80.0%
e. No, because P(Metro) * P(Spits) = 163 / 300 * 150 / 300 = 27.2% and this is not
similar to the answer to c. P(Metro en Spits) = 24.3%.
7. a.
b.
c.
d.
P(Volkskrant | Female) = 20 / 77 = 26.0%
P(Volkskrant | Male) = 40 / 223 = 17.9%
P(Volkskrant) = 60 / 300 = 20%
No, the percentage of females owing a subscription to the Volkskrant is higher
than the percentage for males.
P(Volkskrant)* P(Female) = 60/300 77/300 = 5.1 %
This differs from P(Volkskrant | Female) = 26%
1
Answers Chapter 4: Continuous probability distribution
1. a.
b.
c.
d.
P(X < 90) = 0.0013
P(X > 135) = 0.0668
P(105 < X < 135) = 0.8664
126.7 minutes
2. a. P(X < 0.0833) = 0.8529
f. P(X > 0.1667) = 0.0216
g. P(0.050 < X < 0.1167) = 0.2483
3. a.
b.
c.
d.
P(X < 90) = 0.0228
P(X > 140) = 0.0912
P(120 < X < 150) = 0.4772
144.7 seconds
4. a. P(X < 0.0500) = 0.5934
b. P(X > 0.0833) = 0.2231
c. P(0.0333 < X < 0.0667) = 0.2476
5. a. P1: P(X < 0.10) = 0.0062= 0.62%
b. P2: P(X < 0.10) = 0.2023 = 20.23%
d. P1 is to preferable
c. P1: profitability = 16.68%
P2: profitability = 20.05%
1
Answers chapter 5: Singular linear regression- and correlation
analysis
5. a. Correlation (r) = 0.4952
b. y = 1.6073x + 16.51
c. Traveling time is 49 minutes
g. We speak of average correlation (0.4 < r < 0.7), while the number of observation points is
fairly large (n = 300). So the prediction is average..
Traveling time
(minutes)
Graph 5.1: Dot diagram of income and traveling time
160
y = 1.6073x + 16.51
R2 = 0.2452
140
120
100
80
60
40
20
0
0
5
10
15
20
Income (* € 100)
Bron: Fictie2000
6. a. Correlation (r) = 0.87
b. y = 0.1224x – 70.398
c. 2088 (* 1 million)
1
25
30
35
40
Turnover (* 1 million)
Graph 5.4: Dot diagram of kilometers traveled and turnover
2500
y = 0.1224x –70.398
R2 = 0.7569
2000
1500
1000
500
0
0
2000
4000
6000
8000
10000
Kilometers traveled (* 1 million)
Bron: Nederlandse Spoorwegen
Bron: Fictie2000
1
12000
14000
16000
7.
Correlation (r) = 0.89
b. y = 11.198x –22016
c.
Year Number of trips
(* 1 million)
1997
346
1998
358
1999
369
2000
380
d. The introduction of the OV-year card for students in 1991 represents a deviating trend in
comparison to the year before. It is better to execute the regression analysis concerning
the data from 1991 on. However you will only have a few data at your disposal.
Number of trips (* 1
million)
Graph 5.5: Dot diagram of year and number of trips
400
y = 11.198x
- 22016
R2 = 0.8008
350
300
250
200
150
100
50
0
1982
1984
1986
1988
1990
1992
1994
1996
1998
Source: Nederlandse Spoorwegen N.V.
8.
a.
b.
y = 0.7894x + 3.927so the systematic risk ßi equals 0.7894.
When the profitability the market wallet equals 12% the cost of capital ri=
7% + 0.7894*(12% - 7%) = 4.017
c.
When the profitability the market wallet equals 9% the cost of capital ri=
7% + 0.7894*(9% - 7%) = 1.6488
d.
When the profitability the market wallet equals 6% the cost of capital ri=
1
ri= 7% + 0.7894*(6% - 7%) = -0.7194.
e. The profitability of the company is lower than the market one. A larger
market profitability gives a smaller profitability of the Superinfest. The
investments aren't thát super after all. With a profitability of 6% it is better to
invest in bonds.
1