Download Descriptive statistics - City, University of London

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
MSc Regulation and Competition
Quantitative Techniques
QT week 4
In this lecture
Some other distributions
Descriptive statistics
some background on Visual Basic for applications
Some other distributions
The normal distribution and one or two others derived from it form the
backbone of statistics and econometrics.
However, there are some other discrete distributions you may come across,
so they are introduced briefly here. They are the Bernoulli distribution, the
binomial distribution, the Poisson distribution, and the exponential distribution
1. Bernoulli distribution
Applies to a strict dichotomy on a single trial. E.g. heads or tails, success or
failure. This is called a Bernoulli trial. One outcome (e.g. “success”) is
designated 1, the other 0.
The distribution can be written two ways
a) as a listing: P(1)= p and P(0)= 1-p;
b) as a formula: P(x) = px(1-p)x for x =0,1
As far as I know this has nothing to do with the Bernoulli effect, which some
people say helps keeps aircraft airborne.
2. Binomial distribution
Suppose you repeat the Bernoulli trial N times. The number of 1s is a
discrete random variable distributed as a binomial distribution. The sample
space is the set of integers 0,1,2,…N.
The expected value of the binomial distribution is Np. Its variance is Np(1-p).
Working out the probabilities of a particular number of outcomes is a tedious
exercise in the mathematics of combinations. Fortunately we do not have to
do this because:
a) people publish tables of the binomial distribution
b) for a large number of trials the probabilities of the number falling in a range
can be approximated using the normal distribution with the same mean and
variance.
3. Poisson distribution
Suppose the supermarket queue has stalled because someone’s credit card
is not accepted.
City University
1
MSc Regulation and Competition
Quantitative Techniques
How long will the queue be after one minute?
This is modelled with the Poisson distribution. It assumes that arrivals are
independent at a constant probability rate.
The probability of X arrivals is
P(X) = Xe-/X!
Table 4.1 The Poisson distribution

0.5
1
X!
3
4
0.0498
0.1494
0.2240
0.2240
0.1680
0.1008
0.0183
0.0733
0.1465
0.1954
0.1954
0.1563
P(X) = Xe-/X!
X
1
1
2
6
24
120
2
0
1
2
3
4
5
0.6065
0.3033
0.0758
0.0126
0.0016
0.0002
0.3679
0.3679
0.1839
0.0613
0.0153
0.0031
0.1353
0.2707
0.2707
0.1804
0.0902
0.0361
I created this in Excel over the range A2:G9 ( I had the Table title in A1).
The Excel formula for this in Cell C4 looks something like =(C$2^$B4)*EXP(C$2)/$A4 .
Exponential distribution
How long before a piece of machinery breaks down or somebody finds a job?
(or before the first person joins the supermarket queue.)
This continuous distribution function is given by
f(x) =  e-x for and x>0
The degree of inefficiency of a company is also sometimes modelled as an
exponential distribution.
Descriptive statistics
We’ve just had a look at probability theory because we said data were
generated by casino-like processes.
For any particular data there may be a series of processes involved, including
distributions we have not covered.
Rather than list all our data a great deal of its information content can often be
conveyed by summary or descriptive statistics. These describe the
position and shape of the frequency distribution, which is itself a useful way of
summarising the data.
Measures of central tendency
….. were introduced last week
City University
2
MSc Regulation and Competition
Quantitative Techniques
Measures of dispersion
The variance. This is the average value of the squared deviation from the
mean. For a probability distribution this is usually written as 2.
The standard deviation. This is basically the square root of the variance. It
has the same dimensions of measurement as the original data (e.g. £, metres,
etc.)
NOTE: You need to be aware that there are two concepts of the standard
deviation in Excel. If we take the random numbers 0.879, 0.001, 0.652,
0.389, 0.464 and 0.269 and apply the formula STDEV(…) we get 0.304
rather than 0.278.
To get the value according to the formula we have to use STDEVP(…).
Technically, STDEVP treats the data you have as a population in its own right,
whereas STDEV treats the data as a sample from an infinite population, and
is an estimate of the standard deviation within that larger population.
The coefficient of variation. This is the ratio of the standard deviation to the
mean. It is a pure number, so is good for comparing apples and pears.
The range. Difference between highest and lowest values
Inter quartile range. The upper quartile and lower quartile are similar in spirit
to the median except they identify the cut off points for the top and bottom
25%.
The inter-quartile range is the difference between these cut off points.
The interdecile range. Like the Interquartile range, but with the top and
bottom 10%.
Skewness
Skewness is the lack of symmetry in the distribution. It is calculated as a
function of the cube of the deviation from the mean:
Skewness = pi(xi - )3 /3
(By the way , E[(xi - )3] or pi(xi - )3 is sometimes referred to as the third
central moment, or third moment abut the mean, E(xn) is the nth moment
about the origin.)
Amongst theoretical distributions, the uniform distribution and the normal
distribution are both unskewed.
What about the Poisson, the exponential and the Binomial?
The distribution below is positively skewed.
City University
3
MSc Regulation and Competition
Quantitative Techniques
Figure 4.2 A positively skewed distribution
14
12
Mean =
6.06
10
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
11
A simple test of positive skewness is whether the mean is greater or less than
the mode. In the above case the mean is 6.06, which is bigger than the
modal value of 4.
Kurtosis
This is written as a function of the fourth moment of the distribution pi(xi - )4.
Kurtosis = pi(xi - )4/ 2
It measures the fatness of the tail of the distribution, relative to the normal
distribution, which has zero kurtosis.
Some background on VBA
Basic was a programming language originally developed for personal
computers, and widely used by amateurs. It was an interpreted language,
and users tended to produce some horrible code (“spaghetti”) that was almost
impossible to debug.
However, it got better:
1. It became a structured language:
a. you could call subroutines to do repeated tasks
b. the semantic rules (e.g. you can’t jump into or out of loops) were
tightened to prevent some of the worst spaghetti code abuses
City University
4
MSc Regulation and Competition
Quantitative Techniques
2. It became a compiled language, which meant it could run quicker
3. It became object – oriented (see below)
4. It became integrated into the visual aspects of windows (Mouse, Icons,
etc.)
5. In Visual Basic for applications, it became integrated into Microsoft
Office etc. and automatic code generation became possible – as you
have seen in recording macros.
6. The library routines which enable all sorts of things to done including
complex math calculations (optimisation, etc.) were improved.
VBA, as embedded in Excel, is still probably slower than a dedicated
programming language like C++, but improvements in computing speed
makes this less of a handicap.
Objects, Values, methods, etc.
Some features of object-oriented programming. Object oriented programming
uses some rather abstract concepts. This can make it a bit confusing at first,
but this is the price we pay for being able to do a much wider range of things,
including manipulation of graphical objects directly as well as simply
numerical data.
Take the following macro written in VBA
Sub CopyBlockDown()
'
' CopyBlockDown Macro
' copies block down
'
' Keyboard Shortcut: Ctrl+k
'
Selection.CurrentRegion.Select
Selection.Copy
Selection.End(xlDown).Select
ActiveCell.Offset(1, 0).Range("A1").Select
ActiveSheet.Paste
End Sub
Selection. is an object.
Objects are arranged in a hierarchy, so Selection.CurrentRegion
is a subset of Selection.
Another of putting this is that Objects come in Collections.
Objects have Properties which describe the object. e.g.
Range(“b23”).Name= “Student ID”
Range(“b23”).Value= 20040796
Objects have methods, e.g. Select, Copy, and Paste
City University
5
MSc Regulation and Competition
Quantitative Techniques
can be applied to Range objects. These methods typically change a property,
such as a Value, or whether it is Selected, etc.
Let’s look at the properties of some objects in a spreadsheet…
Reading
Salvatore and Reagle chapter 2.
Mary Jackson and Mike Staunton (2001) Advanced modelling in finance using
Excel and VBA, chapters 2 and 3
Exercise
Salvatore and Reagle pages 34-35 supplementary questions 2.28, 2.29, 2.33,
2.35a, 2.37, 2.40
City University
6