Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mini – Statistics Preparation Course For OPRE 202 and OPRE 504 Students This course is designed for students who are planning to take OPRE 202 or OPRE 504 here at the University of Baltimore. The main objective is to brush-up on the necessary basic statistic skills required for OPRE 202 or a good start for OPRE 504 students. If you have taken a basic statistics class here at UB or elsewhere you will be surprised at how easily you can regain your skills before the beginning of the actual class. Let’s start with some definitions: variable Simple Event – an event that can be described by a single characteristic Sample Space – the collection of all possible events 1 There are three approaches to assessing the probability of an uncertain event: 1. a priori classical probability X number of ways the event can occur T total number of elementary outcomes probabilit y of occurrence 2. empirical classical probability probabilit y of occurrence number of favorable outcomes observed total number of outcomes observed 3. subjective probability an individual judgment or opinion about the probability of occurrence an individual judgment or opinion about the probability of occurrence Example Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have a CD player (CD) and 20% of the cars have both. What is the probability that a car has a CD player, given that it has AC ? Hint: we want to find P(CD | AC) Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have a CD player (CD) and 20% of the cars have both. What is the probability that a car has a AC player, given that it has CD ? 2 3 4 5 Scales of Measurement In the statistical data analysis designed for the school of business we should get ourselves familiar with the two scales of measurement: qualitative and quantitative. Qualitative Scale of Measurement: When data are categorized we use this scale. For example to identify one’s gender we can use M for male and F for female. In this scale of measurement operations such as arithmetic average are usually meaningless. Quantitative Scale of Measurement: Continuous and discreet variables are used in this scale of measurement to quantify certain data. Examples of continuous variables are: time, money and measurements. Some examples of discreet variables are the number of students in this class and the number of goals in a soccer game. Measurs of Locations Measures of location are items that identify a characteristic of a population. They are known as the mean, the median and the mode. Mean is the most frequent measure of location used in Science. Another name for it is the arithmetic average. Generally we use the symbol µ for the mean of a population and X mean of a sample. for the When the mean does not represent the data well, particularly when extreme values are involved, we use the median. Median by definition is a number that falls in the center of the data after ascending or descending arrangements. The median is particularly used when extreme values are detected. The measurement of location which is used the least is called the mode. The mode is the most frequent value in a set of data. Statistics For Business 6 The field of statistical data analysis is based on descriptive and inferential statistics. Descriptive Statistics: Descriptive statistics is a part of data analysis; It consists of recording and organizing data. Often it ranges from the least important to the most important factors in a data set. Inferential Statistics: In inferential statistics we use measurements that are randomly selected from a population to draw conclusions for the entire population based on these measurements. Descriptive Statistics In order to be able to use the normal curve, we should know how to calculate the variances and standard deviation of a particular distribution. The variance of a population is the average of all the deviations from the mean squared. The standard deviation is the square root of this quantity. If observations represent a sample, then the sum of all deviations squared must be divided by sample size minus one o avoid errors. The following are formulas for variances of population, 2 and sample s2 . 2 ( x )2 N and s2 (x x ) 2 n 1 The standard deviation of both population and sample can be found by square rooting these quantities. It might be better to do a sample and find quantities such as, the mean, the median, the mode, the variance and the standard deviation of the sample. Example: Consider the following sample to be age of 6 employees of a 7 small firm: 23,26, 30,41, 29,43 1) Find the mean X X n 23+26+ 30+41+ 29+43 32 6 2) Find the median 23,26,29,30,41,43 X 29 30 29.5 2 3) Mode does not exist. 6 4) Calculate ( x x) i 1 (23-32)+(26-32)+(30-32)+(41-31)+(29-32)+(43-32) = 0 6 5) Calculate (x x ) 2 i 1 2 (23-32) +(26-32)2+(30-32)2+(41-32) 2+(29-32) 2+(43-32) 2= 332 This is known as the sum of squares and is used in many statistical procedures. 6) Now, to find the variance of the sample, simply divide the quantity found in the last step by the degree of freedom, 6 – 1, to find the variance of the sample. s2 (x x ) n 1 2 332 66.4 6 1 7) To find the standard deviation, find the square root of the variance. The standard deviation is known as the unit measure of distance. 8 s 66.4 8.15 xx , to find the norm or z score of a particular s 30 32 observation. For example, the z – score of 30 is z 0.25 . 8.15 We also use the formula, z Note: One measure of variation that is used the least is called the range of data. The range is the difference between the largest and smallest number in a set of data. Normal distribution scale conversions As I mentioned above, you can use formula z xx x , or z , to convert a score s from a normal distribution to standardized Z-score. For example if the mean of SAT scores is 500 with the standard deviation 100, then the score 500 has a Zscore of zero in standardized z-scale. A score of 600 has a Z-score of 1 and a score of 400 has a z score of -1. z x 500 500 600 500 400 500 z 0, z 1, z 1 100 100 100 Z=-3 Z=-2 Z=-1 Z= 0 Z=+1 In same way if we use formula z x , Z=+2 Z=+3 and solve for x in terms of the other variables we get x= x z . Hence, the x score of z=-1 is 400, the x score of 600 is +1 and so forth. X=200 x=300 x=400 µ =500 x=600 x=700 x=800 Students must first be familiarized with normal distribution before drawing any conclusions. 9 A distribution is normal or roughly normal if the three measurements of the location mean, the median and the mode are roughly equal. Before solving any application problem students must be able to find different probability values from the standard normal curve. Based on our book tables you can find the area to the left of a particular number on the standard curve. If a distribution is normal then the mean will be in the center of the data. Fifty percent of the data will be located to the left of the mean and the other 50% will be located in the right hand side. The normal distribution table covers probabilities within three standard deviations from the mean. The mean and standard deviation of the normal distribution are 0 and 1, respectively. The mean Z= 0 is always in the center of the distribution. Z scores in the left side of the mean are negative and Z scores in the right side are positive. To find probabilities in a normal distribution you may always draw a bell curved figure and shade the desired area before answering the question. Furthermore, understanding the mathematical such as ≤ ,≥ , < , and > are very important before finding your answers. To make a long story short, simply locate a given z score on the z score line then shade the area to the left. Sometimes the answer is the opposite of what you think it is. For example, if you are asked to calculate P (z > -1) Which means find the probability of getting an observation one standard deviation below the mean or higher, first easily find the area to the left side of -1. Then subtract it from 1, which represents the entire curve to find the answer. This matter will be a lot easier if you draw a picture: 10 To practice, find the answers for the following problems from normal curve: 1) Find P (z <-1.25) 2) Find P (z >- 1.87) 3) P (- 1.1 <z < 2.3) 4) P (2.1 < z <3.05) 11 5) P (2.1 < z <3.05) Normal distribution usage is very common. I will demonstrate this reality by using some popular examples. Assume the GMAT scores are normally distributed with the mean of 529 and standard deviation of 113 points. Answer the following questions based on this fact: a) What is the probability that a randomly selected student scores 700 points or better? 12 b) A college administrator would like to accept students whose scores are at least 400 points. What is the probability that a randomly selected student scores at least 400 points? Note: a student’s percentile shows his or her ranking among the students who took the GMAT exam. What is the percentile rank of a student whose score is 630 points if the mean and standard devition in the GMAT exam are 529 and 65 , respectively? Solution: Some more applications of normal distribution: 13 If x is a continuous variable from a normal population with mean, μ = 500 and standard deviation of σ = 100 then: 1) What is the Z-score of x = 635? Z = __________ 2) Using normal distribution table, what is the probability of getting x ≥ 583? P(x ≥ 583) = __________ 14 3) Refer to normal distribution, what is the probability that x ≤ 605? P (x ≤ 605) = __________ 4) The probability that 600 ≤ x ≤ 720? P(600 ≤ x ≤ 720) = _________ 5) Assume μ = 500 is the average mathematics scores of students taking the SAT exam, what is the probability that a randomly selected student score is either less than 550 or greater than 680? P(x ≤ 550 or x ≥ 680) = __________ 15 6) If x is two standard deviation below the mean, then x = _________? 7) Middle 80% of all observations is this population falls within what two x values? _______ ≤ x ≤ _______ 8) A sample size n = 5 is taken from a normal distribution. The value samples are: 38, 46, 34, 38, and 24. __ A) What is the mean of sample, x = ___________ B) What is the standard deviation of sample, S = __________ 16 9) If the sample in Part A is taken from a population with a known standard deviation, then what would be the interval coefficient for a 96% confidence interval? Zα/2 = __________ 10) Find a 96% confidence interval for the true mean of a population that sample in part 8 is taken from, if: A) It is known the standard deviation of the population is 3 B) The standard deviation of sample, S = 4, is used to estimate this confidence interval. 17