Download Descriptive statistics - City, University of London

MSc Regulation and Competition Quantitative Techniques QT week 4 In this lecture Some other distributions Descriptive statistics some background on Visual Basic for applications Some other distributions The normal distribution and one or two others derived from it form the backbone of statistics and econometrics. However, there are some other discrete distributions you may come across, so they are introduced briefly here. They are the Bernoulli distribution, the binomial distribution, the Poisson distribution, and the exponential distribution 1. Bernoulli distribution Applies to a strict dichotomy on a single trial. E.g. heads or tails, success or failure. This is called a Bernoulli trial. One outcome (e.g. “success”) is designated 1, the other 0. The distribution can be written two ways a) as a listing: P(1)= p and P(0)= 1-p; b) as a formula: P(x) = px(1-p)x for x =0,1 As far as I know this has nothing to do with the Bernoulli effect, which some people say helps keeps aircraft airborne. 2. Binomial distribution Suppose you repeat the Bernoulli trial N times. The number of 1s is a discrete random variable distributed as a binomial distribution. The sample space is the set of integers 0,1,2,…N. The expected value of the binomial distribution is Np. Its variance is Np(1-p). Working out the probabilities of a particular number of outcomes is a tedious exercise in the mathematics of combinations. Fortunately we do not have to do this because: a) people publish tables of the binomial distribution b) for a large number of trials the probabilities of the number falling in a range can be approximated using the normal distribution with the same mean and variance. 3. Poisson distribution Suppose the supermarket queue has stalled because someone’s credit card is not accepted. City University 1 MSc Regulation and Competition Quantitative Techniques How long will the queue be after one minute? This is modelled with the Poisson distribution. It assumes that arrivals are independent at a constant probability rate. The probability of X arrivals is P(X) = Xe-/X! Table 4.1 The Poisson distribution  0.5 1 X! 3 4 0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0183 0.0733 0.1465 0.1954 0.1954 0.1563 P(X) = Xe-/X! X 1 1 2 6 24 120 2 0 1 2 3 4 5 0.6065 0.3033 0.0758 0.0126 0.0016 0.0002 0.3679 0.3679 0.1839 0.0613 0.0153 0.0031 0.1353 0.2707 0.2707 0.1804 0.0902 0.0361 I created this in Excel over the range A2:G9 ( I had the Table title in A1). The Excel formula for this in Cell C4 looks something like =(C$2^$B4)*EXP(C$2)/$A4 . Exponential distribution How long before a piece of machinery breaks down or somebody finds a job? (or before the first person joins the supermarket queue.) This continuous distribution function is given by f(x) =  e-x for and x>0 The degree of inefficiency of a company is also sometimes modelled as an exponential distribution. Descriptive statistics We’ve just had a look at probability theory because we said data were generated by casino-like processes. For any particular data there may be a series of processes involved, including distributions we have not covered. Rather than list all our data a great deal of its information content can often be conveyed by summary or descriptive statistics. These describe the position and shape of the frequency distribution, which is itself a useful way of summarising the data. Measures of central tendency ….. were introduced last week City University 2 MSc Regulation and Competition Quantitative Techniques Measures of dispersion The variance. This is the average value of the squared deviation from the mean. For a probability distribution this is usually written as 2. The standard deviation. This is basically the square root of the variance. It has the same dimensions of measurement as the original data (e.g. £, metres, etc.) NOTE: You need to be aware that there are two concepts of the standard deviation in Excel. If we take the random numbers 0.879, 0.001, 0.652, 0.389, 0.464 and 0.269 and apply the formula STDEV(…) we get 0.304 rather than 0.278. To get the value according to the formula we have to use STDEVP(…). Technically, STDEVP treats the data you have as a population in its own right, whereas STDEV treats the data as a sample from an infinite population, and is an estimate of the standard deviation within that larger population. The coefficient of variation. This is the ratio of the standard deviation to the mean. It is a pure number, so is good for comparing apples and pears. The range. Difference between highest and lowest values Inter quartile range. The upper quartile and lower quartile are similar in spirit to the median except they identify the cut off points for the top and bottom 25%. The inter-quartile range is the difference between these cut off points. The interdecile range. Like the Interquartile range, but with the top and bottom 10%. Skewness Skewness is the lack of symmetry in the distribution. It is calculated as a function of the cube of the deviation from the mean: Skewness = pi(xi - )3 /3 (By the way , E[(xi - )3] or pi(xi - )3 is sometimes referred to as the third central moment, or third moment abut the mean, E(xn) is the nth moment about the origin.) Amongst theoretical distributions, the uniform distribution and the normal distribution are both unskewed. What about the Poisson, the exponential and the Binomial? The distribution below is positively skewed. City University 3 MSc Regulation and Competition Quantitative Techniques Figure 4.2 A positively skewed distribution 14 12 Mean = 6.06 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 A simple test of positive skewness is whether the mean is greater or less than the mode. In the above case the mean is 6.06, which is bigger than the modal value of 4. Kurtosis This is written as a function of the fourth moment of the distribution pi(xi - )4. Kurtosis = pi(xi - )4/ 2 It measures the fatness of the tail of the distribution, relative to the normal distribution, which has zero kurtosis. Some background on VBA Basic was a programming language originally developed for personal computers, and widely used by amateurs. It was an interpreted language, and users tended to produce some horrible code (“spaghetti”) that was almost impossible to debug. However, it got better: 1. It became a structured language: a. you could call subroutines to do repeated tasks b. the semantic rules (e.g. you can’t jump into or out of loops) were tightened to prevent some of the worst spaghetti code abuses City University 4 MSc Regulation and Competition Quantitative Techniques 2. It became a compiled language, which meant it could run quicker 3. It became object – oriented (see below) 4. It became integrated into the visual aspects of windows (Mouse, Icons, etc.) 5. In Visual Basic for applications, it became integrated into Microsoft Office etc. and automatic code generation became possible – as you have seen in recording macros. 6. The library routines which enable all sorts of things to done including complex math calculations (optimisation, etc.) were improved. VBA, as embedded in Excel, is still probably slower than a dedicated programming language like C++, but improvements in computing speed makes this less of a handicap. Objects, Values, methods, etc. Some features of object-oriented programming. Object oriented programming uses some rather abstract concepts. This can make it a bit confusing at first, but this is the price we pay for being able to do a much wider range of things, including manipulation of graphical objects directly as well as simply numerical data. Take the following macro written in VBA Sub CopyBlockDown() ' ' CopyBlockDown Macro ' copies block down ' ' Keyboard Shortcut: Ctrl+k ' Selection.CurrentRegion.Select Selection.Copy Selection.End(xlDown).Select ActiveCell.Offset(1, 0).Range("A1").Select ActiveSheet.Paste End Sub Selection. is an object. Objects are arranged in a hierarchy, so Selection.CurrentRegion is a subset of Selection. Another of putting this is that Objects come in Collections. Objects have Properties which describe the object. e.g. Range(“b23”).Name= “Student ID” Range(“b23”).Value= 20040796 Objects have methods, e.g. Select, Copy, and Paste City University 5 MSc Regulation and Competition Quantitative Techniques can be applied to Range objects. These methods typically change a property, such as a Value, or whether it is Selected, etc. Let’s look at the properties of some objects in a spreadsheet… Reading Salvatore and Reagle chapter 2. Mary Jackson and Mike Staunton (2001) Advanced modelling in finance using Excel and VBA, chapters 2 and 3 Exercise Salvatore and Reagle pages 34-35 supplementary questions 2.28, 2.29, 2.33, 2.35a, 2.37, 2.40 City University 6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Descriptive statistics - City, University of London