Download MAS1403 - School of Mathematics and Statistics

MAS1403 Quantitative Methods for Business Management Semester 1 Dr. Daniel Henderson School of Mathematics & Statistics MAS1403: Quantitative Methods for Business Management 2016/17 Lecturer: Dr. Daniel Henderson, Room 2.21 Herschel Building. Email: [email protected] www.mas.ncl.ac.uk/∼ndah6/teaching/MAS1403/ Lectures: Mondays at 12pm In the Curtis Auditorium, Herschel Building Tutorials: One per week There are 6 groups – check the module webpage to see which tutorial to attend. Practicals: Occasionally Check the full schedule overleaf for dates. These will take place instead of the tutorials. Drop-in: Mon 1-2pm, Wed 1-2pm Optional “office hours” where I will be available in my office for any help with the work. Lecture notes and handouts You will be provided with a booklet containing lecture notes and tutorial exercises. You should bring your booklet to every class! There will often be gaps in the lecture notes for you to complete during the lecture, so make sure you’ve got them with you! All lecture notes, slides and solutions to tutorial exercises will be available to download from the course website (see above). There should be a link to this website from within Blackboard. Some additional handouts may only be available in lectures and tutorials. You will notice that my lecture slides are colour-coded: Green for announcements, Blue for “listen and learn” and Red for “write”! Assessment Assessment for this course is via examination (60% at end of Semester 2), assignments (10% each semester) and computer-based assessments (10% each semester). Ordinarily, if you fail this module you cannot proceed to Stage 2 of your degree! Exam: May/June 2017 A two hour, open-book, computer-based exam based on whole course: Answer all questions. Assignments: Dec 2016, May 2017 About three big questions in each, some of which will use your own personal datasets and some of which will require you to use the computer package Minitab. CBAs: Throughout the year Three CBAs in each Semester. Available in “practice mode” for one week and then “exam mode” the next week. Some multiple choice questions, but mainly data response/calculations. Every student will get a different set of questions from a bank of hundreds! Must be done in your own time. Late Work Policy: It is not possible to extend submission deadlines for coursework in this module and no late work can be accepted. For details of the policy (including procedures in the event of illness etc.) please look at the School web site: http://www.ncl.ac.uk/maths/students/resources/late-missed/ Other Stuff Email: Check your University email every day – announcements about the course will be made regularly! Calculator: There is no way around it, you must have a scientific calculator for this course, and it must be on the University’s approved list! I recommend the Casio fX-85GT PLUS (about £10). You can get advice on how to use the Statistics mode of your calculator in tutorials, and some video presentations on use of the calculator will be available from the module webpage. You should bring your calculator to every class. You will be stuck without one! MAS1403 - Provisional Schedule for Semester 1 Week 1 (week commencing 3/10/16) Topic 1: Data collection, display and summaries Mon Thu Thu Thu Fri Fri Fri Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 3rd October 6th October 6th October 6th October 7th October 7th October 7th October Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Week 2 (week commencing 10/10/16) Mon Thu Thu Thu Fri Fri Fri 10th October 13th October 13th October 13th October 14th October 14th October 14th October Lecture Practical Practical Practical Practical Practical Practical 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Armstrong Building, 2.96 (PC) King George VI Building, Lawn cluster King George VI Building, Lawn cluster Herschel Building, Blue Zone - Herschel cluster Armstrong Building, 2.96 (PC) King George VI Building, Lawn cluster Week 3 (week commencing 17/10/16) CBA1 opens in “practice mode” Mon Thu Thu Thu Fri Fri Fri 17th October 20th October 20th October 20th October 21st October 21st October 21st October Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Week 4 (week commencing 24/10/16) Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Topic 2: Probability and decision making CBA1 opens in “assessed mode” – deadline: midnight Friday 28th October Mon Thu Thu Thu Fri Fri Fri 24th October 27th October 27th October 27th October 28th October 28th October 28th October Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Week 5 (week commencing 31/10/16) Mon Thu Thu Thu Fri Fri Fri 31st October 3rd November 3rd November 3rd November 4th November 4th November 4th November Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Week 6 (week commencing 7/11/16) CBA2 opens in “practice mode” Mon Thu Thu Thu Fri Fri Fri 7th November 10th November 10th November 10th November 11th November 11th November 11th November Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Week 7 (week commencing 14/11/16) Topic 3: Probability models CBA2 opens in “assessed mode” – deadline: midnight Friday 18th November Assignment 1 available Mon Thu Thu Thu Fri Fri Fri 14th November 17th November 17th November 17th November 18th November 18th November 18th November Lecture Practical Practical Practical Practical Practical Practical 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Armstrong Building, 2.96 (PC) King George VI Building, Lawn cluster King George VI Building, Lawn cluster Herschel Building, Blue Zone - Herschel cluster Armstrong Building, 2.96 (PC) King George VI Building, Lawn cluster Week 8 (week commencing 21/11/16) Mon Thu Thu Thu Fri Fri Fri 21st November 24th November 24th November 24th November 25th November 25th November 25th November Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Week 9 (week commencing 28/11/16) Mon Thu Thu Thu Fri Fri Fri 28th November 1st December 1st December 1st December 2nd December 2nd December 2nd December Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Week 10 (week commencing 5/12/16) CBA3 opens in “practice mode” and “assessed mode” Mon Thu Thu Thu Fri Fri Fri 5th December 8th December 8th December 8th December 9th December 9th December 9th December Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Week 11 (week commencing 12/12/16) Assignment 1 deadline: 4pm, Thursday 15th December CBA3 deadline: midnight, Friday 16th December Mon Thu Thu Thu Fri Fri Fri 12th December 15th December 15th December 15th December 16th December 16th December 16th December Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 Christmas vacation! Week 12 (week commencing 9/1/17) – Revision week Mon Thu Thu Thu Fri Fri Fri 9th January 12th January 12th January 12th January 13th January 13th January 13th January Lecture Tutorial Tutorial Tutorial Tutorial Tutorial Tutorial 12 - 1 10 - 11 11 - 12 12 - 1 9 - 10 10 - 11 11 - 12 Herschel Building, Curtis Auditorium Herschel Building, Lecture Theatre 3 King George VI Building, Lecture Theatre 6 King George VI Building, Lecture Theatre 1 Percy Building, G.13 Percy Building, G.13 Herschel Building, Lecture Theatre 3 MAS1403 Quantitative Methods for Business Management 1 Collecting and presenting data 1.1 Definitions The quantities measured in a study are called random variables and a particular outcome is called an observation. A collection of observations is the data. The collection of all possible outcomes is the population. We can rarely observe the whole population. Instead, we observe some sub–set of this called the sample. The difficulty is in obtaining a representative sample. Data/random variables are of different types: • Qualitative (i.e. non-numerical) – Categorical ∗ Outcomes take values from a set of categories, e.g. mode of transport to Uni: {car, metro, bus, walk, other}. • Quantitative (i.e. numerical) – Discrete ∗ Things that are countable, e.g. number of people taking this module. ∗ Ordinal, e.g. response to questionnaire; 1 (strongly disagree) to 5 (strongly agree) – Continuous ∗ Things that we measure rather than count, e.g. height, weight, time. Example 1 Identify the type of data described in each of the following examples: (a) The time between emails arriving in your inbox is recorded. (b) An opinion poll was taken asking people what is their favourite chocolate bar. (c) The number of students attending a MAS1403 tutorial is recorded. 1 MAS1403 Quantitative Methods for Business Management 1.2 Sampling techniques We typically aim for the sample to be representative of the population. The larger the sample size the more precise information we have about the population. There are three main types of sampling: random, quasi-random, non-random. • Simple random sampling (random) – Each element in the population is equally likely to be drawn into the sample. – All elements are “put in a hat” and the sample is drawn from the “hat” at random. – Advantages – easy to implement; each element has an equal chance of being selected. – Disadvantages – often don’t have a complete list of the population; not all elements might be equally accessible; it is possible, purely by chance, to pick an unrepresentative sample. • Stratified sampling (random) – We take a simple random sample from each “strata”, or group, within the population. The sample sizes are usually proportional to the population sizes. – Advantages – sampling within each stratum ensures that that stratum is properly represented in the sample; simple random sampling within each stratum has the advantages listed under simple random sampling above. – Disadvantages – need information on the size and composition of each group; as with simple random sampling, we need a list of all elements within each strata. • Systematic sampling (quasi-random) – The first element from the population is selected at random, and then every kth item is chosen after this. This type of sampling is often used in a production line setting. – Advantages – its simplicity! – and so it’s easy to implement. – Disadvantages – not completely random; if there is a pattern in the production process it is easy to obtain a biased sample; only really suited to structured populations. • Judgemental sampling (non-random) – The person interested in obtaining the data decides who should be surveyed; for example, the head of a service department might suggest particular clients to survey based on his judgement, and they might be people who he thinks will give him the responses he wants! – Advantages – very focussed and aimed at the target population. – Disadvantages – relies on the judgement of the person conducting the questionnaire/survey, and so cannot be guaranteed to be representative; is prone to bias. 2 MAS1403 Quantitative Methods for Business Management • Accessibility sampling (non-random) – Here, the most easily accessible elements are sampled. – Advantages – easy to implement. – Disadvantages – prone to bias. • Quota sampling (non-random) – Similar to stratified sampling, but uses judgemental sampling within each strata instead of random sampling. We sample within each strata until our quotas have been reached. – Advantages – results can be very accurate as this technique is very targeted. – Disadvantages – the identification of appropriate quotas can be problematic; this sampling technique relies heavily on the judgement of the interviewer. Example 2 (a) A toy company, Toys 4 U, is to be inspected for the quality and safety of the toys it produces. The inspection team takes a sample of toys from the production line by choosing the first toy at random, and then selecting every 100th toy thereafter. What form of sampling are the team using? (b) Another inspection team is to investigate the quality of the smartphone covers made by a local factory. In a typical working day the factory produces 100 covers for the new i-Phone and 200 covers for the latest Samsung phone. Suggest a suitable form of sampling to check the quality of the smartphone covers produced. Solution 3 MAS1403 Quantitative Methods for Business Management 1.3 Frequency tables Once we have collected our data, often the first stage of any analysis is to present them in a simple and easily understood way. Tables are perhaps the simplest means of presenting data. The way we construct the table depends on the type of data. Example: discrete data The following table shows the raw data for car sales at a new car showroom over a two week period in July. Date Cars Sold 1st July 9 2nd July 8 3rd July 6 4th July 7 5th July 7 6th July 10 7th July 11 Date Cars Sold 8th July 10 9th July 5 10th July 8 11th July 4 12th July 6 13th July 8 14th July 9 Presenting these data in a relative frequency table by number of days on which different numbers of cars were sold, we get the following table: Cars Sold Tally Frequency Totals 4 Relative Frequency % MAS1403 Quantitative Methods for Business Management Example: continuous data The following data set represents the service time in seconds for callers to a credit card call centre. 196.3 199.7 206.7 203.8 203.1 200.8 201.3 205.6 181.6 201.7 180.2 193.3 188.2 199.9 204.7 We can present these data in a relative frequency as follows: Class Interval 180 ≤ time < 185 185 ≤ time < 190 190 ≤ time < 195 195 ≤ time < 200 200 ≤ time < 205 205 ≤ time < 210 Totals Tally || | | ||| |||| | || Frequency Relative Frequency % 2 13.33 1 6.67 1 6.67 3 20.00 6 40.00 2 13.33 15 100 5 MAS1403 Quantitative Methods for Business Management 1.4 Exercises 1. Identify the type of data described in each of the following examples: (a) An opinion poll was taken asking people which party they would vote for in a general election. (b) In a steel production process the temperature of the molten steel is measured and recorded every 60 seconds. (c) A market researcher stops you in Northumberland Street and asks you to rate between 1 (disagree strongly) and 5 (agree strongly) your response to opinions presented to you. (d) The hourly number of units produced by a beer bottling plant is recorded. 2. A credit card company wants to investigate the spending habits of its customers. From its lists, the first customer is selected at random; thereafter, every 30th customer is selected. (a) Is this an example of simple random sampling, stratified sampling, systematic sampling, or judgemental sampling? (b) Is this form of sampling random, quasi-random or non-random? 3. The number of telephone calls made by 20 students in a day is shown below. 3 5 1 0 0 2 1 0 3 1 4 3 2 0 1 1 1 2 0 4 Put these data into a relative frequency table. 4. The following data are the recorded length (in seconds) of 25 mobile phone calls made by one student. 281.4 312.7 270.7 304.1 305.4 293.4 327.7 293.9 320.7 317.9 306.5 311.5 310.9 283.6 289.5 286.6 314.8 346.4 337.5 286.9 298.4 303.3 304.6 259.6 300.5 Complete the following percentage relative frequency table for these data. Class Interval 250 ≤ time < 270 270 ≤ time < 290 290 ≤ time < 310 310 ≤ time < 330 330 ≤ time < 350 Totals Tally || | | ||| ||| Frequency Relative Frequency % 2 13.33 1 6.67 1 6.67 3 20.00 3 20.00 25 100 6 MAS1403 Quantitative Methods for Business Management 2 Graphical methods for presenting data Once we have collected our data, often the best way to summarise this data is through an appropriate graph. Graphs are more eye–catching than tables, and give us an “at–a–glance” picture of the main features of our data: its distribution, location, spread, outliers etc. 2.1 Stem–and–leaf plots Example 1 The observations below are the recorded time it takes to get through to an operator at a telephone call centre (in seconds). 54 45 30 56 50 67 55 51 47 53 29 39 65 54 44 38 49 45 39 42 44 61 51 54 72 65 58 50 50 62 Represent the data in a stem-and leaf plot. Stem Leaf n= stem unit = leaf unit = Some notes on stem–and–leaf plots. – Always show the stem units and the leaf units. – The stem unit will usually be either 10 or 1; the corresponding unit for the leaves is usually 1 and 0.1. – Order the leaves from smallest to largest. – If you have observations recorded to 2 d.p., always round down, e.g. 2.97 would become 2.9 rather than 3.0. 7 MAS1403 Quantitative Methods for Business Management 2.2 Bar charts A commonly–used and clear way of presenting categorical data or any ungrouped discrete data. Example 2 The following frequency table represents the modes of transport used daily by 30 students to get to university. Mode Frequency Car 10 Walk 7 Bike 4 Bus 4 Metro 4 Train 1 Total 30 This gives the following bar chart: 10 8 Frequency 6 4 2 Car Walk Bike Bus Metro Train This bar chart clearly shows that the most popular mode of transport is the car and the least popular is the train (in our small sample). 8 MAS1403 Quantitative Methods for Business Management 2.3 Histograms Histograms can be thought of as “bar charts for continuous data”. First construct a grouped frequency table then draw a bar for each class interval. Important point: unlike bar charts, there are no gaps between the bars in a histogram. Example 3 The following frequency table summarises the service times (in seconds) at a telephone call centre. Service time Frequency 175≤ time <180 1 180≤ time <185 3 185≤ time <190 3 190≤ time <195 6 195≤ time <200 10 200≤ time <205 12 205≤ time <210 8 210≤ time <215 3 215≤ time <220 3 220≤ time <225 1 Totals 50 Relative Frequency (%) 2 6 6 12 20 24 16 6 6 2 100 The histogram for these data is: 12 24 10 Frequency 8 6 20 Relative 16 frequency (%) 12 4 8 2 4 175 180 185 190 195 200 205 210 215 220 225 Time (s) 175 180 185 190 195 200 205 210 215 220 225 Time (s) We can also plot relative frequency (%) on the vertical axis: this gives a percentage relative frequency histogram. These are useful for comparing datasets of different sizes. 9 MAS1403 Quantitative Methods for Business Management 2.4 Relative frequency polygons The relative frequency polygon is exactly the same as the relative frequency histogram, but instead of having bars we join the mid–points of the top of each bar with a straight line. These are useful for illustrating the relative differences between two or more groups. Example 4 Consider the following data on gross weekly income (in £) collected from two sites in Newcastle. Weekly Income (£) West Road (%) 0 ≤ income < 100 9.3 100 ≤ income < 200 26.2 200 ≤ income < 300 21.3 300 ≤ income < 400 17.3 400 ≤ income < 500 11.3 500 ≤ income < 600 6.0 600 ≤ income < 700 4.0 700 ≤ income < 800 3.3 800 ≤ income < 900 1.3 900 ≤ income < 1000 0.0 Jesmond Road (%) 0.0 0.0 4.5 16.0 29.7 22.9 17.7 4.6 2.3 2.3 The following plot shows percentage relative frequency polygons for the two groups. Example comments: The distribution of incomes on West Road is skewed towards lower values, whilst those on Jesmond Road are more symmetric. The graph clearly shows that income in the Jesmond Road area is higher than that in the West Road area. The spread of incomes is roughly the same in the two areas. There are no obvious outliers. 10 MAS1403 Quantitative Methods for Business Management 2.5 Cumulative frequency polygons These are very useful for comparing datasets. – Construct a percentage relative frequency table for your data. – Add a “cumulative” column by adding up the percentages as you go along. – Plot the upper end–point of each class interval against the cumulative value. Example 5 The following plot contains the cumulative frequency polygons for the income data at both the West Road and Jesmond Road sites. It clearly shows the line for Jesmond Road is shifted to the right of that for West Road. This tells us that the surveyed incomes are higher on Jesmond Road. We can compare the percentages of people earning different income levels between the two sites quickly and easily. 11 MAS1403 Quantitative Methods for Business Management 2.6 Scatter plots Scatter plots are used to plot two variables which you believe might be related, for example, advertising expenditure and sales. Example 6 The following data represents monthly output and total costs at a factory. Total costs (£) 10,300 12,000 12,000 13,500 12,200 14,200 10,800 18,200 16,200 19,500 17,100 19,200 Monthly output (units) 2,400 3,900 3,100 4,500 4,100 5,400 1,100 7,800 7,200 9,500 6,400 8,300 For scatter plots, we comment on whether there is a linear association between the two variables? If so, is this positive (“uphill”) or negative (“downhill”)? Is the association strong? Or maybe moderate or weak? The plot above shows a clear positive, roughly linear, relationship between the two variables: the more units made, the more it costs in total. 12 MAS1403 Quantitative Methods for Business Management 2.7 Time Series Plots Data collected over time can be plotted by using a scatter plot, but with time as the (horizontal) x-axis, and where the points are connected by lines: a time series plot. Example 7 Consider the following data on the number of computers sold (in thousands) by quarter (JanuaryMarch, April-June, July-September, October-December) at a large warehouse outlet, starting in quarter 1 2000. 2000 2001 2002 2003 2004 Q1 86.7 105.9 113.7 126.3 136.4 Q2 94.9 102.4 108.0 119.4 124.6 Q3 94.2 103.1 113.5 128.9 127.9 Q4 106.5 115.2 132.9 142.3 The time series plot is: For time series plots, look out for trend and seasonal cycles in the data. Also look out for any outliers. The above plot clearly shows us two things: firstly, that there is an upwards trend to the data (sales increase over time), and secondly that there is some regular variation around this trend (sales are usually higher in quarters 1 and 4 than quarters 2 and 3. 13 MAS1403 Quantitative Methods for Business Management 2.8 Exercises 1. The following table shows the weight (in kilograms) of 50 sacks of potatoes leaving a farm shop (the data have been ordered from smallest to largest). 8.1 8.9 9.5 9.7 10.0 10.2 10.4 10.6 10.8 11.3 8.2 9.2 9.5 9.7 10.0 10.2 10.4 10.6 10.9 11.3 8.5 9.3 9.6 9.9 10.0 10.2 10.4 10.6 11.0 11.5 8.7 9.3 9.6 9.9 10.0 10.3 10.5 10.6 11.2 11.6 8.8 9.4 9.6 10.0 10.1 10.3 10.6 10.7 11.3 12.8 Display these data in a stem and leaf plot. State clearly both the stem and the leaf units. Comment on the distribution of the data. 2. Which is more suitable for representing the data from Question 1 (above), a bar chart or a histogram? Justify your answer. 3. A small clothes shop have records of daily sales both before and after a local radio advertising campaign. Relative frequency polygons of the sales data are shown below. Relative frequency polygons of sales (before and after) Rel. freq. (%) 30 Before After 20 10 0 2000 4000 6000 8000 10000 Daily sales (£) Comment, with justification, on the success, or otherwise, of the advertising campaign. 14 MAS1403 Quantitative Methods for Business Management 3 Numerical summaries for data Numerical summaries are numbers which summarise the main features of your data. You should use both a measure of location and a measure of spread to summarise your dataset. 3.1 Measures of location A measure of location is a value which is “typical” of the observations in our sample 1. The mean The sample mean is the “average” of our data: the total divided by the sample size. It’s given by the formula n 1X x̄ = xi , n i=1 which, put more simply, means “add them up and divide by how many you’ve got”. Example 1 Suppose we ask 7 Stage 2 Business Management students how many units of alcohol they drank last week and get: 16, 52, 0, 6, 10, 0, 21. The sample mean alcohol consumption of these n = 7 students is If your data are given in the form of a frequency table, then you “multiply each observation by its frequency, add these numbers together and then divide by how many you’ve got”. If you have a grouped frequency table, then you don’t know the value of each observation and so just use the midpoint of the class interval. 2. The median This is just the observation “in the middle”, when the data are put into order from smallest to largest: th n+1 median = smallest observation. 2 Example 2 Ordering the student alcohol data from the previous example gives 0, 0, 6, 10, 16, 21, 52. Clearly the middle value is 10, so the median is 10 units per week. Example 3 Suppose we also asked four Stage 2 Marketing and Management students how many units of alcohol they drank last week, and got: 21,0,12,14. Calculate the median. Solution The median is often used if the dataset has an asymmetric profile, since it is not distorted by extreme observations (“outliers”). 15 MAS1403 Quantitative Methods for Business Management 3. The mode The mode is simply the most frequently occurring observation. For example, consider the following data: 2, 2, 2, 3, 3, 4, 5. The mode is 2 as it occurs most often. The modal class is easily obtained from a grouped frequency table or a histogram; it’s the class with the highest frequency. 3.2 Measures of spread A measure of spread quantifies how “spread out” (or how “variable”) our data are. 1. The range Range = largest value − smallest value. For example, the range of the data: 2, 2, 2, 3, 3, 4, 5 is 5 − 2 = 3. • Advantage: very simple to calculate. • Disadvantages: sensitive to extreme observations; only suitable for comparing (roughly) equally sized samples. 2. The inter-quartile range (IQR) The IQR measures the range of the middle half of the data, and so is less affected by extreme observations. It is given by Q3 − Q1, where (n + 1) th smallest observation 4 3(n + 1) Q3 = th smallest observation 4 Q1 = (“lower quartile”) (“upper quartile”). Example 4 Calculate the inter-quartile range for the following data. 8.7, 9.0, 9.0, 9.2, 9.3, 9.3, 9.5, 9.6, 9.6, 9.6, 9.7, 9.7, 9.9, 10.3, 10.4, 10.5, 10.7, 10.8 Solution n = 18, so the position of Q1 is (18 + 1)/4 = 4.75, therefore Q1 = 9.2 + 0.75 × (9.3 − 9.2) = 9.2 + 0.075 = 9.275. Similarly, the position of Q3 is 3 × (18 + 1)/4 = 14.25, therefore Q3 = 10.3 + 0.25 × (10.4 − 10.3) = 10.3 + 0.025 = 10.325. And so IQR = Q3 − Q1 = 10.325 − 9.275 = 1.05. 16 MAS1403 Quantitative Methods for Business Management 3. The variance and standard deviation The sample variance is the standard measure of spread used in statistics. It can be thought of as “the average squared deviation from the mean”, and is given by n 1 X s = (xi − x̄)2 . n − 1 i=1 2 The following formula is easier for calculations ( n ) X 1 x2 − (n × x̄2 ) . s2 = n − 1 i=1 i In practice most people simply use the Statistics mode on their calculator (mode SD or Stat). The sample standard deviation is just the square root of the variance, and is often preferred as it is in the “original units of the data”. Example 5 Consider again the data on the number of units of alcohol consumed by a sample of 7 students last week: 16, 52, 0, 6, 10, 0, 21. Calculate the sample variance and the sample standard deviation. Solution We have already calculated the sample mean as x̄ = 15. Now X x2 = 162 + 522 + 02 + 62 + 102 + 02 + 212 = 3537 n(x̄)2 = 7 × 152 = 1575 and so the sample variance is s2 = 1 1962 (3537 − 1575) = = 327 7−1 6 and the sample standard deviation is s= √ s2 = √ 327 = 18.08 units per week. 17 MAS1403 Quantitative Methods for Business Management 3.3 Box plots Box plots (or “box and whisker” plots) are another graphical method for displaying data. Example 6 Suppose that, from our data, we obtain the following summary statistics: Minimum Lower Quartile (Q1) 10 40 Median (Q2) 43 Upper Quartile (Q3) 45 Maximum 50 A box plot is constructed as follows. Box plots are particularly useful for highlighting differences between groups. Example 7 It clearly shows that although there is overlap between the three sets of data, the first and second datasets contain roughly similar responses and that these are quite different from those in the third set. Note that the asterisks (*) at the ends of the whiskers is the way Minitab highlights outlying values. 18 MAS1403 Quantitative Methods for Business Management 3.4 Exercises 1. Recall the following data from Exercise 1 in Chapter 2 on the weight (in kg) of 50 sacks of potatoes leaving a farm shop. 8.1 8.9 9.5 9.7 10.0 10.2 10.4 10.6 10.8 11.3 8.2 9.2 9.5 9.7 10.0 10.2 10.4 10.6 10.9 11.3 8.5 9.3 9.6 9.9 10.0 10.2 10.4 10.6 11.0 11.5 8.7 9.3 9.6 9.9 10.0 10.3 10.5 10.6 11.2 11.6 8.8 9.4 9.6 10.0 10.1 10.3 10.6 10.7 11.3 12.8 (a) Calculate the mean of the data. (b) Calculate the median of the data. (c) Calculate the range of the data. (d) Calculate the inter–quartile range. (e) Calculate the sample standard deviation. (f) Draw a box plot for these data and comment on it. (g) Put the data in a grouped frequency table. (h) Find the modal class. 2. Chloe collected the following data on the weight, in grams, of “large” chocolate chip cookies produced by Millie’s Cookie Company. 27.1 22.4 26.5 23.4 25.6 26.3 51.3 24.9 26.0 25.4 To summarise, Chloe was going to calculate the mean and standard deviation for this sample. However, her friend Mark warned her that the mean and standard deviation might be inappropriate measures of location and spread for these data. (a) Do you agree with Mark? If so, why? (b) Calculate measures of location and spread that you feel are more suitable. 3. An internet marketing firm was interested in the amount of time customers spend on their website. They recorded the lengths of visits to the website for a sample of 100 customers and whether the customer was male or female. The standard deviations of the lengths of visits were 12.2 seconds for males and 18.5 seconds for females. Which group has the more variable visit lengths, based on this sample, males or females? 19 MAS1403 Quantitative Methods for Business Management 4 Introduction to Probability 4.1 Definitions An experiment is an activity where we do not know for certain what will happen, but we will observe what happens. An outcome is one of the possible things that can happen. The sample space is the set of all possible outcomes. An event is a set of outcomes. All probabilities are measured on a scale ranging from zero to one, and can be expressed as fractions, decimal numbers or percentages. Notation: P (A) represents the probability of the event A, e.g. P (Rain tomorrow). P (Ā) is the probability that A does not occur (“not A”). The collection of all possible outcomes, that is the sample space, has a probability of 1. Two events are said to be mutually exclusive if both cannot occur simultaneously. Two events are said to be independent if the occurrence of one does not affect the probability of the other occurring. Example 1 Do you think the following pairs of events are independent? • A: Molly plays table tennis, and B: Molly is good at maths • C: Henry gets over 60 in MAS1403, and D: Henry gets under 40 in MAS1403 4.2 Measuring probability 1. Classical interpretation Used when all possible outcomes are “equally likely”. In general, calculations follow from the formula Total number of outcomes in which event occurs P (Event) = . Total number of possible outcomes 2. Frequentist interpretation When the outcomes of an experiment are not equally likely, we can perform the same experiment a large number of times and observe the outcome. The probability of an event can be estimated using the following formula: P (Event) = Number of times an event occurs . Total number of times experiment performed 20 MAS1403 Quantitative Methods for Business Management 3. Subjective interpretation Probabilities are formulated subjectively using an individual’s (sometimes expert) opinion. (Useful when the experiment can’t be repeated.) For example, when we board an aeroplane, we judge the probability of it crashing to be sufficiently small that we are happy to undertake the journey. 4.3 Examples 1. Chicken King is a fast–food chain with 700 outlets in the UK. The geographic location of its restaurants is tabulated below: Region NE SE SW NW Under 10,000 35 42 21 70 Population 10,000–100,000 70 105 84 35 Over 100,000 175 28 35 0 Total 280 175 140 105 Total 168 294 238 700 A health and safety organisation selects a restaurant at random for a hygiene inspection. Assuming that each restaurant is equally likely to be selected, calculate the following probabilities. (a) P (NE restaurant chosen), (b) P (Restaurant chosen from a city with a population over 100,000), (c) P (SW and city with a population under 10,000). Solution 21 MAS1403 Quantitative Methods for Business Management 2. The spinner shown below is spun once. Assuming each sector on the board is the same size, calculate the following probabilities. (a) P (lands on a red shape) = (b) P (lands on a triangle) = (c) P (lands on a 4-sided shape) = 3. On the probability scale, how likely do you think it is that Newcastle United will be promoted this season? Which approach to probability would you use to estimate this? 22 MAS1403 Quantitative Methods for Business Management 4.4 The addition rule The addition rule describes the probability of any of two or more events occurring. The addition rule for two events A and B is P (A or B) = P (A) + P (B) − P (A and B). This describes the probability of either event A or event B happening. Example 2 Prospective interns at internet startup BlueFox face two aptitude tests. If 35 percent of applicants pass the first test, 25 percent pass the second test, and 15 percent pass both tests, what percentage of applicants pass at least one test? Solution We are told P (pass 1st test) = 0.35, P (pass 2nd test) = 0.25 and P (pass 1st and 2nd test) = 0.15. Therefore using the addition law P (pass at least one test) = P (pass 1st or 2nd test) = P (pass 1st test) + P (pass 2nd test) − P (pass 1st and 2nd test) = 0.35 + 0.25 − 0.15 = 0.45. So 45% of the applicants pass at least one of the tests. Note: if events A and B are mutually exclusive then P (A and B) = 0 since A and B can’t occur together. Therefore, P (A or B) = P (A) + P (B). 23 MAS1403 Quantitative Methods for Business Management 4.5 Exercises 1. Do you think the following pairs of events are independent or dependent? Explain. (a) E: An individual has a high IQ F : An individual is accepted for a University place (b) E1 : An individual has a large outstanding credit card debt E2 : An individual is allowed to extend his bank overdraft 2. The following data refer to a class of 18 students. Suppose that we will choose one student at random from this class. Student Number Sex 1 M 2 F 3 M 4 M 5 F 6 M 7 M 8 M 9 F Height Weight Shoe Student (m) (kg) Size Number 1.91 70 11.0 10 1.73 89 6.5 11 1.73 73 7.0 12 1.63 54 8.0 13 1.73 58 6.5 14 1.70 60 8.0 15 1.82 76 10.0 16 1.67 54 7.5 17 1.55 47 4.0 18 Height Weight Shoe Sex (m) (kg) Size M 1.78 76 8.5 M 1.88 64 9.0 M 1.88 83 9.0 M 1.70 55 8.0 M 1.76 57 8.0 M 1.78 60 8.0 F 1.52 45 3.5 M 1.80 67 7.5 M 1.92 83 12.0 Find the probabilities for the following events. (a) The student is female. (b) The student’s weight is greater than 70kg. (c) The student’s weight is greater than 70kg and the student’s shoe-size is greater than 8. (d) The student’s weight is greater than 70kg or the student’s shoe-size is greater than 8. 3. The regional manager of supermarket Freshco is interested in predicting sales patterns of breakfast cereal. If 85% of Freshco customers buy branded cereals (e.g. Kellogg’s etc), 60% of customers buy Freshco’s own-brand cereals, and 50% of customers buy both branded and Freshco’s own-brand cereal, what percentage of Freshco customers do not buy breakfast cereal? 24 MAS1403 Quantitative Methods for Business Management 5 Conditional probability 5.1 The multiplication rule The multiplication rule describes the probability of two (or more) events occurring. The probability of two events A and B both occurring is P (A and B) = P (A) × P (B|A), where P (B|A) is the conditional probability of B given that A has already happened. Example 1 A small company has 10 employees: 4 male and 6 female. You, as the manager, select two employees at random to attend a training session. What is the probability that you select two female employees? Solution Re-arranging the above expression for the multiplication rule gives a formula for calculating a conditional probability: P (A and B) P (B|A) = . P (A) Example 2 Recall that prospective interns at internet startup BlueFox face two aptitude tests. If 35 percent of applicants pass the first test, 25 percent pass the second test, and 15 percent pass both tests, what percentage of applicants pass the second test given that they passed the first test? Solution 25 MAS1403 Quantitative Methods for Business Management Independent events: two events A and B are independent if P (B|A) = P (B), in which case P (A and B) = P (A) × P (B). Example 3 Are the outcomes of the two aptitude tests at internet startup BlueFox independent? Justify your answer. Solution Example 4 Employees at a Marketing firm are classified by age and sex as follows: under 30 30 to 50 Male 0.275 0.125 Female 0.325 0.175 over 50 Total 0.025 0.075 So, for example, 27.5% of employees are Male and under 30 years of age. From this table, calculate (a) P (Male) (d) P (30 to 50|Male) (b) P (30 to 50) (e) Are the events “Male” and “30 to 50” independent? (c) P (Male|30 to 50) (f) P (Male) Solution 26 MAS1403 Quantitative Methods for Business Management 5.2 Tree diagrams Tree diagrams (or probability trees) are simple, clear ways of presenting probabilistic information. Example 5 Suppose we have a biased coin, with P (Head) = 0.75. Then the following tree diagram displays all outcomes, along with their associated probabilities, for two consecutive flips of the coin: 0.75 × 0.75 = 0.5625 H 0.75 0.25 H 0.75 T 0.75 × 0.25 = 0.1875 0.25 × 0.75 = 0.1875 0.25 H T 0.75 0.25 T 0.25 × 0.25 = 0.0625 Important: multiply probabilities along branches (multiplication rule); the probabilities at the ends of the branches should add up to 1. Example 6 A small company has 10 employees: 4 male and 6 female. You, as the manager, select two employees at random to attend a training session. What is the probability that you select one male and one female employee? Solution 27 MAS1403 Quantitative Methods for Business Management Example 7 Joe has a Business Management exam on Thursday morning. On Wednesday night he is free to choose one (and only one) of the following activities: (a) go to the cinema, (b) go to the pub, (c) stay home and watch TV, (d) stay home and study. The probabilities that he elects these alternatives are 0.14, 0.45, 0.25 and 0.16, respectively. His conditional probabilities of passing the exam given (a), (b), (c) and (d) are 0.4, 0.05, 0.5 and 0.8 respectively. Find (i) the probability that Joe goes to the pub and passes his exam; (ii) the probability that Joe passes his exam; (iii) the probability that Joe went to the pub, given that he passed his exam. Solution Use the space provided below to construct a tree diagram for this example. (i) P (Joe goes to Pub and passes exam) = (ii) P (Joe passes exam) = (iii) P (Joe went to Pub | Joe passed exam) = 28 MAS1403 Quantitative Methods for Business Management 5.3 Exercises 1. An on-line retailer conducts a survey of 200 customers and obtains the following results. Male Female Age Under 30 30 to 45 60 20 40 30 Over 45 40 10 A customer is selected at random. (a) What is the probability that the customer is male and aged 30 to 45? (b) Given that this customer is aged 30 to 45, what is the probability that they are male? (c) Given that this customer is female, what is the probability that they are 45 or under? (d) Now suppose that two customers are selected at random. What is the probability that both are Male? 2. If Vinny goes to the cinema, there is a 60% chance he will then also go to the bar afterwards. However, if he doesn’t go to the cinema, this reduces to just 30%. On Friday night, Vinny decides to go to the cinema only if his friend Julia also goes. Vinny has no idea about Julia’s intentions this Friday and so is just as likely to go to the cinema as he is to not go. Let C be the event that Vinny goes to the cinema, and B the event that Vinny goes to the bar, this Friday. Using a probability tree diagram, or otherwise, find (a) P (C) (b) P (C̄) (c) P (B̄|C) (d) P (B̄|C̄) (e) P (C and B) (f) P (B) 29 MAS1403 Quantitative Methods for Business Management 6 Decision–making using probability 6.1 Expected Monetary Value The Expected Monetary Value (EMV) of a single event is simply the probability of that event multiplied by its monetary value. Example 1 Suppose you win £5 if you pull an ace from a pack of cards, the EMV would be 4 × 5 = 0.38. 52 Your expected return would be 38 pence; if you repeated this bet a large number of times, you would come out, on average, 38 pence better off per bet. Therefore you would want to pay no more than 38p for such a bet. EMV (Ace) = P (Ace) × MonetaryValue(Ace) = In general, for more complicated problems involving several options, X EMV = {P (Event) × Monetary value of Event} where the sum is over all possible events. We choose the option with the largest EMV. Example 2 Synaptec is a small technology company with a new product that they wish to launch on to the market. It could go for • a direct approach, launching onto the domestic market through traditional channels, • it could launch only on the internet, • or it could license the product to a larger company through the payment of a licence fee irrespective of the success of the product. Initial market research suggests that demand for the product can be classed into three categories: high, medium or low, and these categories will occur with probabilities 0.2, 0.35 and 0.45. Likely profits (in £K) to be earned under each option are Direct Internet Licence High Medium Low 100 55 -25 46 25 15 20 20 20 How should the company launch the product? The EMV of each option can be calculated as follows: EMV (Direct) = (0.2 × 100) + (0.35 × 55) + (0.45 × (−25)) = £28K EMV (Internet) = (0.2 × 46) + (0.35 × 25) + (0.45 × 15) = £24.7K EMV (Licence) = (0.2 × 20) + (0.35 × 20) + (0.45 × 20) = £20K. On the basis of expected monetary value, the best choice is the Direct approach as this maximises EMV. 30 MAS1403 Quantitative Methods for Business Management 6.2 Decision trees When we include a decision in a tree diagram (see Chapter 5) we use a rectangular node, called a decision node to represent the decision. The diagram is then called a decision tree. Example 3 The decision tree for the last example (Example 2) would look like this: 100 H 0.2 M 0.35 55 L 0.45 Direct +28 -25 0.2 40 H Internet 0.35 M +24.7 L 25 0.45 15 Licence +20 0.2 20 H M L 0.35 20 0.45 20 Key points: • There are no probabilities at a decision node but we evaluate the expected monetary values of the options. • In a decision tree the first node (on the left) is always a decision node. • There may also be other decision nodes. • If there is another decision node then we evaluate the options there and choose the best one (based on EMV), and the expected monetary value of this option becomes the expected monetary value of the branch leading to the decision node. • We work “backwards” through the tree (from right to left), evaluating EMVs and making decisions at each decision node. 31 MAS1403 Quantitative Methods for Business Management Example 4 Charlotte Watson, the manager of a small sales company, has the opportunity to buy a fixed quantity of a new type of Android tablet which she can then offer for sale to clients. The decision to buy the product and offer it for sale would involve a fixed cost of £200,000. The number of tablets that will be sold is uncertain, but Charlotte judges that: • Sales will be “poor” with probability 0.2; this will result in an income of £100,000. • Sales will be “moderate” with probability 0.5; this will result in an income of £220,000. • Sales will be “good” with probability 0.3; this will result in an income of £350,000. For an additional fixed cost of £30,000, market research can be conducted to aid the decision– making process. The outcome of the market research can be either positive or negative, with probabilities 0.58 and 0.42, respectively. Knowing the outcome of the market research changes the probabilities for the main sales project as follows: Market research Positive Negative Main sales probabilities Poor Moderate Good 0.15 0.45 0.4 0.6 0.35 0.05 Charlotte has various options: • Buy the tablets, without market research. • Pay for the market research. • Do nothing. If she pays for the market research then, depending on the outcome, she can: • Buy the tablets. • Do nothing. (a) Draw a decision tree for this problem. (b) Use expected monetary value to determine the optimal course of action for Charlotte. The following page is left blank for your solution to this question 32 MAS1403 Quantitative Methods for Business Management 33 MAS1403 Quantitative Methods for Business Management 6.3 Exercises 1. Picoplex Technologies have developed a new manufacturing process which they believe will revolutionise the smartphone industry. They are, however, uncertain how they should go about exploiting this advance. Initial indications of the likely success of marketing the process are 55%, 30%, 15% for “high success”, “medium success” and “probable failure”, respectively. The company has three options; they can go ahead and develop the technology themselves, licence it or sell the rights to it. The financial outcomes (in £ millions) for each option are given in the table below. “high success” Develop 80 Licence 40 Sell 25 “medium success” 40 30 25 “failure” –100 0 25 (a) Draw a decision tree to represent the company’s problem. (b) Calculate the Expected Monetary Value for all possible decisions the company may take and hence determine the optimal decision for the company. 2. The manager of a small business has the opportunity to buy a fixed quantity of a new product and offer it for sale for a limited time. The decision to buy the product and offer it for sale would involve a fixed cost of £150,000. The amount that would be sold is uncertain but the manager judges that: • There is a probability of 0.3 that sales will be “poor” with an income of £80,000. • There is a probability of 0.5 that sales will be “medium” with an income of £160,000. • There is a probability of 0.2 that sales will be “good” with an income of £240,000. For an additional fixed cost of £20,000, the product can be sold for a trial period before a final decision is made. No income is made from this trial. The result of the trial will be “poor” with probability 0.33, “medium” with probability 0.40 or “good” with probability 0.27. Knowing the outcome of the trial changes the probabilities for the main sales project: Trial outcome Poor Medium Good Main sales probabilities Poor Medium Good 0.7 0.2 0.1 0.2 0.6 0.2 0.1 0.2 0.7 The manager also has the option to do nothing. (a) Draw a decision tree for this problem. (b) Use expected monetary value to determine the optimal course of action for this business. 34 MAS1403 Quantitative Methods for Business Management 7 Discrete probability models 7.1 Probability distributions The probability distribution of a discrete random variable X is the list of all possible values X can take and the probabilities associated with them. Example 1 If the random variable X is the outcome of a roll of a fair six-sided die then the probability distribution for X is: r 1 2 3 4 5 6 Sum P (X = r) 1/6 1/6 1/6 1/6 1/6 1/6 1 Key point: For a discrete random variable the probabilities of each possible value sum up to 1. 7.2 The binomial distribution Suppose the following statements hold: • There are a fixed number of trials or experiments (n). • There are only two possible outcomes for each trial (‘success’ or ‘failure’). • There is a constant probability of ‘success’, p. • The outcome of each trial is independent of any other trial. Then the number of successes, X, follows a binomial distribution. Example 2 Which of the following scenarios could be adequately modelled by a binomial distribution? • The number of sixes on 3 rolls of a fair six-sided die. • The number of students who pass MAS1403 this year. 7.2.1 Calculating probabilities If X follows a binomial distribution we write X ∼ Bin(n, p), and P (X = r) = n Cr × pr × (1 − p)n−r , r = 0, 1, . . . , n. Here, n Cr is the number of ways of getting r successes out of n trials, and is given by n Cr = n! , r!(n − r)! where r! = 1 × 2 × 3 × · · · × (r − 1) × r is known as “r factorial”. Important: most scientific calculators have an n Cr button! 35 MAS1403 Quantitative Methods for Business Management Example 3 What is the probability of getting 2 sixes from three rolls of a fair six-sided die? Solution Example 4 If X ∼ Bin(10, 0.2) calculate: (a) P (X = 2) (c) P (X < 3) (b) P (X ≤ 2) (d) P (X > 1) Solution 36 MAS1403 Quantitative Methods for Business Management 7.2.2 Mean and variance If X ∼ Bin(n, p), then its mean (or “expected value”) and variance are E[X] = n × p and Var(X) = n × p × (1 − p). Example 5 If X ∼ Bin(10, 0.2) calculate: (a) E[X] (b) Var(X) (c) SD(X) Solution Example 6 A salesperson has a 50% chance of making a sale on a customer visit and she arranges 6 visits in a day. (a) Assuming sales at each visit are independent, suggest an appropriate distribution for the number of sales she makes in a day. (b) Calculate her expected number of sales. Solution 37 MAS1403 Quantitative Methods for Business Management 7.3 Exercises 1. Consider the following probability distribution for the discrete random variable X. One of the values is missing. r P (X = r) -2 -1 0 1 0.1 0.2 ? 0.3 2 0.2 What is the missing value, P (X = 0)? 2. Let X be the number of sixes rolled on four rolls of a fair six-sided die. (a) Calculate the probability distribution of X, i.e. the values P (X = r) for r = 0, 1, 2, 3, 4. (b) Calculate P (X ≤ 2). (c) Calculate P (X > 2). (d) Calculate the mean and variance of X. (e) What is the most likely number of sixes from four rolls of the die? 38 MAS1403 Quantitative Methods for Business Management 8 More discrete probability models 8.1 The Poisson distribution Suppose the following hold: • Events occur independently, at a constant rate (λ); • There is no natural upper limit to the number of events. Then the number of events, X, occurring in a given interval, has a Poisson distribution with parameter λ. Example 1 Which of the following random variables could be modelled by a Poisson distribution? Suggest an alternative if the Poisson distribution is not appropriate, and state the values of any parameters. (a) Calls are received at a call centre at a constant rate of 3 per minute on average. Let X be the number of calls received in a 1 minute period. (b) An operator at a tele-sales marketing firm has 20 calls to make in an hour. History suggests that calls will be answered 55% of the time. Let Y be the number of answered calls in an hour. (c) Newcastle United score goals at a constant rate of 2.4 in 90 minutes, on average. Let Z be the number of goals scored in 45 minutes. Solution 39 MAS1403 Quantitative Methods for Business Management 8.1.1 Probabilities, means and variances If X follows a Poisson distribution we write X ∼ Po(λ), and P (X = r) = λr e−λ , r! r = 0, 1, . . . If X ∼ Po(λ), then its mean and variance are E[X] = λ Var(X) = λ. and [Approximation to binomial: If X ∼ Bin(n, p) with n large, p small and both np and n(1 − p) > 5 then X is approximately P o(np).] Example 2 If X ∼ P o(5) calculate: (a) P (X = 4) (d) E[X] (b) P (X ≤ 1) (e) SD(X) (c) P (X > 0) (f) SD(X) Solution 40 MAS1403 Quantitative Methods for Business Management Example 3 A new Mercedes–Benz car franchise forecasts that it will sell around three of its most expensive models each day. (a) What probability distribution might be reasonable to use to model the number of cars sold each day? (b) What is the expected number and standard deviation of the number of cars sold each day? (c) What is the probability that 3 cars are sold on a particular day? (d) What is the probability that no cars are sold on a particular day? (e) What is the probability that at least one car is sold on a particular day? (f) Sales will be monitored over the next seven days and the sales team at the franchise will receive a warning if they make no sales on at least 1 of the 7 days. What is the probability that they receive a warning? Solution 41 MAS1403 Quantitative Methods for Business Management 8.2 Exercises (on Chapters 7 & 8) 1. Which of the following random variables could be modelled with a binomial distribution and which could be modelled with a Poisson distribution? In each case state the value(s) of the parameter(s) of the distribution. (a) A salesperson has a 30% chance of making a sale on a customer visit. She arranges 10 visits in a day. Let X be the number of sales she makes in a day. (b) Calls to the British Passport Office in Durham occur at a rate of 7 per hour on average. Let Y be the number of calls at the passport office in a 1 hour period. (c) History suggests that 10% of eggs from a family-run farm are bad. Let Z be the number of bad eggs in a box of a dozen (i.e. 12) eggs. 2. An operator at a call centre has 8 calls to make in an hour. History suggests that they will be answered 40% of the time. Let X be the number of answered calls in an hour. (a) What probability distribution does X have? (b) What is the mean and standard deviation of X? (c) Calculate the probability of getting a response exactly 7 times. (d) Calculate the probability of getting fewer than 2 responses. 3. Calls are received at a telephone exchange at an average rate of 4 per minute. Let Y be the number of calls received in one minute. (a) What probability distribution does Y have? (b) What is the mean and standard deviation of Y ? (c) Calculate the probability that there are 6 calls in one minute. (d) Calculate the probability that there are no more than 2 calls in a minute. (e) Calculate the probability that there are more than 2 calls in a minute. 42 MAS1403 Quantitative Methods for Business Management 9 Continuous probability models 9.1 The Normal distribution The Normal distribution is possibly the best–known and most–used continuous probability distribution: you will use it a lot in Semester 2 of MAS1403. Its probability density function (pdf) has a symmetrical “bell shaped” profile: f (x) µ − 4σ µ µ − 2σ µ + 4σ µ + 2σ x We can think of the pdf as a smoothed percentage relative frequency histogram: the area under the curve is 1. The Normal distribution has two parameters: the mean, µ, and the standard deviation, σ. 0 10 20 30 40 50 60 0.08 0.04 Density 0.00 0.02 Density 0.00 0.02 0.00 Density Normal pdfs with mean 30 and sds 5, 10, 15 0.04 Normal pdfs with means 10, 30, 50 and sd 10 0.04 Normal pdf with mean 30 and sd 10 -20 0 x 20 40 60 x 80 -20 0 20 40 60 80 x If a random variable X has a Normal distribution with mean µ and variance σ 2 , then we write X ∼ N µ, σ 2 . 9.1.1 The standard Normal distribution The standard Normal distribution, usually denoted by Z ∼ N(0, 1), has a mean of zero and a variance of 1, and we have tables of probabilities for this particular Normal distribution; see page 51. 43 MAS1403 Quantitative Methods for Business Management Example 1 Find the following probabilities when Z ∼ N(0, 1). (a) P (Z ≤ −1.46) (d) P (−1.2 < Z ≤ 1.5) (b) P (Z ≤ 0.01) (e) P (Z < 1.5) (c) P (Z > 1.5) (f) P (Z = z) Solution 44 MAS1403 Quantitative Methods for Business Management 9.1.2 Probabilities from any Normal distribution Any Normally distributed random variable X ∼ N(µ, σ 2 ) can be transformed into the standard Normal distribution using the formula: X −µ , σ Z = therefore P (X ≤ x) = P x−µ Z≤ σ , which can be looked up in tables. Example 2 If X ∼ N(10, 22 ) calculate P (X ≤ 8). Solution Example 3 Suppose X is the IQ of a randomly selected 18–19 year old and that X follows a normal distribution with mean µ = 100 and standard deviation σ = 15. Thus, we have: X ∼ N 100, 152 . Find the following probabilities. (a) The probability that an 18–19 year old has an IQ less than 110. (b) The probability that an 18–19 year old has an IQ greater than 110. (c) The probability that an 18–19 year old has an IQ greater than 125. (d) The probability that an 18–19 year old has an IQ between 95 and 115. Solutions 45 MAS1403 Quantitative Methods for Business Management This page has been left blank for your solutions to the last example 46 MAS1403 Quantitative Methods for Business Management 9.2 Exercises 1. A company promises delivery within 20 working days of receipt of order. However, in reality, they deliver according to a normal distribution with a mean of 16 days and a standard deviation of 2.5 days. (a) What proportion of customers receive their order late? (b) What proportion of customers receive their orders between 10 and 15 days of placing their order? (c) A new order processing system promises to reduce the standard deviation of delivery times to 1.5 days. If this system is used, what proportion of customers will receive their deliveries within 20 days? 2. A drinks machine is regulated by its manufacturer so that it dispenses an average of 200ml per cup. However, the machine is not particularly accurate and actually dispenses an amount that has a normal distribution with standard deviation 15ml. (a) What percentage of cups contain below the minimum permissible volume of 170ml? (b) What percentage of cups contain over 225ml? (c) What percentage of cups contain between 175ml and 225ml? (d) How many cups would you expect to overflow if 240ml cups are used for the next 10000 drinks? 47 MAS1403 Quantitative Methods for Business Management 10 More continuous probability distributions 10.1 The normal distribution: using tables in reverse Suppose we are told that P (Z ≤ z) = 0.95. What is the value of z? From tables on page 51, we can see that P (Z ≤ 1.64) = 0.9495 P (Z ≤ 1.65) = 0.9505. and Therefore, z = 1.645. Now suppose that X ∼ N(100, 152), as in the IQ example from Chapter 9. Below what IQ are 95% of the population? We know that P (Z ≤ 1.645) = 0.95 and z = (x − µ)/σ so 1.645 = x−µ x − 100 = , σ 15 therefore x = 1.645 × 15 + 100 ≃ 124.7. In other words, 95% of IQs are less than about 125. 10.2 The uniform distribution The uniform distribution is the most simple continuous distribution. As the name suggests, it describes a variable for which all possible outcomes are equally likely. If the random variable X follows a uniform distribution, we write X ∼ U(a, b). Probabilities can be calculated using the formula   0  x − a P (X ≤ x) =  b−a   1 for x < a for a ≤ x ≤ b for x > b, and the mean and variance are given by a+b E[X] = , 2 (b − a)2 Var(X) = . 12 48 MAS1403 Quantitative Methods for Business Management 10.3 The exponential distribution The exponential distribution is another common distribution that is used to describe continuous random variables. It is often used to model lifetimes of products and times between “random” events such as arrivals of customers in a queueing system or arrivals of orders. The distribution has one parameter, λ. If our random variable X follows an exponential distribution, then we say X ∼ Exp(λ). Probabilities can be calculated using ( 1 − e−λx P (X ≤ x) = 0 for x ≥ 0 for x < 0, and the mean and variance are given by E[X] = 1 , λ Var(X) = 1 . λ2 10.3.1 Poisson process The exponential distribution and the Poisson distribution are related through the notion of events occurring randomly in time (at a constant average rate, λ). This is known as a Poisson process. Consider a series of randomly occurring events such as calls at a call centre. The times of calls might look like 0 × ×1 2 ×× 3 ×4 × 5 There are two ways of viewing these data. One is as the number of calls in each minute (here 2, 0, 2, 1 and 1) and the other is as the times between successive calls. For the Poisson process, • the number of calls in each one minute interval has a Poisson distribution with parameter λ, and • the time between successive calls has an exponential distribution with parameter λ. 49 MAS1403 Quantitative Methods for Business Management 10.4 Exercises 1. An express coach is due to arrive in Newcastle from London at 11pm. However, in practice, it is equally likely to arrive anywhere between 15 minutes early to 45 minutes late, depending on traffic conditions. Let the random variable X denote the amount of time (in minutes) that the coach is delayed. (a) Calculate the mean of the delay time. (b) What is the probability that the coach is less than 5 minutes late? (c) What is the probability that the coach is more than 20 minutes late? (d) What is the probability that the coach arrives between 10.55 and 11.20pm? (e) What is the probability that the coach arrives before 11pm? 2. The time (in minutes) between requests to a network server can be modelled by an exponential distribution with rate parameter λ = 2.5. (a) What is the expected time between requests? (b) What is the probability that the time between requests is less than 1 minute and 30 seconds? (c) What is the probability that the time between requests is greater than 1 minute? (d) What is the probability that the time between requests is between 1 minute and 1 minute and 30 seconds? (e) What is the probability that the time between requests is between 30 seconds and 50 seconds? 50 MAS1403 Quantitative Methods for Business Management Probability Tables for the Standard Normal Distribution The table contains values of P (Z ≤ z), where Z ∼ N(0, 1). z -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 -0.09 0.0014 0.0019 0.0026 0.0036 0.0048 0.0064 0.0084 0.0110 0.0143 0.0183 0.0233 0.0294 0.0367 0.0455 0.0559 0.0681 0.0823 0.0985 0.1170 0.1379 0.1611 0.1867 0.2148 0.2451 0.2776 0.3121 0.3483 0.3859 0.4247 0.4641 -0.08 0.0014 0.0020 0.0027 0.0037 0.0049 0.0066 0.0087 0.0113 0.0146 0.0188 0.0239 0.0301 0.0375 0.0465 0.0571 0.0694 0.0838 0.1003 0.1190 0.1401 0.1635 0.1894 0.2177 0.2483 0.2810 0.3156 0.3520 0.3897 0.4286 0.4681 -0.07 0.0015 0.0021 0.0028 0.0038 0.0051 0.0068 0.0089 0.0116 0.0150 0.0192 0.0244 0.0307 0.0384 0.0475 0.0582 0.0708 0.0853 0.1020 0.1210 0.1423 0.1660 0.1922 0.2206 0.2514 0.2843 0.3192 0.3557 0.3936 0.4325 0.4721 -0.06 0.0015 0.0021 0.0029 0.0039 0.0052 0.0069 0.0091 0.0119 0.0154 0.0197 0.0250 0.0314 0.0392 0.0485 0.0594 0.0721 0.0869 0.1038 0.1230 0.1446 0.1685 0.1949 0.2236 0.2546 0.2877 0.3228 0.3594 0.3974 0.4364 0.4761 -0.05 0.0016 0.0022 0.0030 0.0040 0.0054 0.0071 0.0094 0.0122 0.0158 0.0202 0.0256 0.0322 0.0401 0.0495 0.0606 0.0735 0.0885 0.1056 0.1251 0.1469 0.1711 0.1977 0.2266 0.2578 0.2912 0.3264 0.3632 0.4013 0.4404 0.4801 -0.04 0.0016 0.0023 0.0031 0.0041 0.0055 0.0073 0.0096 0.0125 0.0162 0.0207 0.0262 0.0329 0.0409 0.0505 0.0618 0.0749 0.0901 0.1075 0.1271 0.1492 0.1736 0.2005 0.2296 0.2611 0.2946 0.3300 0.3669 0.4052 0.4443 0.4840 -0.03 0.0017 0.0023 0.0032 0.0043 0.0057 0.0075 0.0099 0.0129 0.0166 0.0212 0.0268 0.0336 0.0418 0.0516 0.0630 0.0764 0.0918 0.1093 0.1292 0.1515 0.1762 0.2033 0.2327 0.2643 0.2981 0.3336 0.3707 0.4090 0.4483 0.4880 -0.02 0.0018 0.0024 0.0033 0.0044 0.0059 0.0078 0.0102 0.0132 0.0170 0.0217 0.0274 0.0344 0.0427 0.0526 0.0643 0.0778 0.0934 0.1112 0.1314 0.1539 0.1788 0.2061 0.2358 0.2676 0.3015 0.3372 0.3745 0.4129 0.4522 0.4920 -0.01 0.0018 0.0025 0.0034 0.0045 0.0060 0.0080 0.0104 0.0136 0.0174 0.0222 0.0281 0.0351 0.0436 0.0537 0.0655 0.0793 0.0951 0.1131 0.1335 0.1562 0.1814 0.2090 0.2389 0.2709 0.3050 0.3409 0.3783 0.4168 0.4562 0.4960 0.00 0.0019 0.0026 0.0035 0.0047 0.0062 0.0082 0.0107 0.0139 0.0179 0.0228 0.0287 0.0359 0.0446 0.0548 0.0668 0.0808 0.0968 0.1151 0.1357 0.1587 0.1841 0.2119 0.2420 0.2743 0.3085 0.3446 0.3821 0.4207 0.4602 0.5000 z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 51

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MAS1403 - School of Mathematics and Statistics