Download Mini – Statistics Preparation Course For OPRE 202 and OPRE 504

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Mini – Statistics Preparation Course For OPRE 202
and OPRE 504 Students
This course is designed for students who are planning to take OPRE 202 or OPRE
504 here at the University of Baltimore.
The main objective is to brush-up on the necessary basic statistic skills required for
OPRE 202 or a good start for OPRE 504 students.
If you have taken a basic statistics class here at UB or elsewhere you will be
surprised at how easily you can regain your skills before the beginning of the
actual class.
Let’s start with some definitions:
variable
 Simple Event – an event that
can be described by a single
characteristic
 Sample Space – the collection
of all possible events
1
There are three approaches to assessing the probability of an uncertain event:
1. a priori classical probability
X
number of ways the event can occur

T
total number of elementary outcomes
probabilit y of occurrence 
2. empirical classical probability
probabilit y of occurrence 
number of favorable outcomes observed
total number of outcomes observed
3. subjective probability
an individual judgment or opinion about the probability of
occurrence an individual judgment or opinion about the probability
of occurrence
Example
 Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have
a CD player (CD) and 20% of the cars have both.
 What is the probability that a car has a CD player, given that it has AC ?
Hint: we want to find P(CD | AC)
 Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have
a CD player (CD) and 20% of the cars have both.
 What is the probability that a car has a AC player, given that it has CD ?
2
3
4
5
Scales of Measurement
In the statistical data analysis designed for the school of business we should get
ourselves familiar with the two scales of measurement: qualitative and quantitative.
Qualitative Scale of Measurement:
When data are categorized we use this scale. For example to identify
one’s gender we can use M for male and F for female. In this scale of
measurement operations such as arithmetic average are usually meaningless.
Quantitative Scale of Measurement:
Continuous and discreet variables are used in this scale of measurement to quantify
certain data. Examples of continuous variables are: time, money and
measurements. Some examples of discreet variables are the number of students in
this class and the number of goals in a soccer game.
Measurs of Locations
Measures of location are items that identify a characteristic of a population. They
are known as the mean, the median and the mode.
Mean is the most frequent measure of location used in Science. Another name for
it is the arithmetic average.
Generally we use the symbol µ for the mean of a population and X
mean of a sample.
for the
When the mean does not represent the data well, particularly when extreme values
are involved, we use the median.
Median by definition is a number that falls in the center of the data after ascending
or descending arrangements.
The median is particularly used when extreme values are detected.
The measurement of location which is used the least is called the mode. The mode
is the most frequent value in a set of data.
Statistics For Business
6
The field of statistical data analysis is based on descriptive and inferential
statistics.
Descriptive Statistics:
Descriptive statistics is a part of data analysis; It consists of recording and
organizing data. Often it ranges from the least important to the most important
factors in a data set.
Inferential Statistics:
In inferential statistics we use measurements that are randomly selected from a
population to draw conclusions for the entire population based on these
measurements.
Descriptive Statistics
In order to be able to use the normal curve, we should know how to calculate the
variances and standard deviation of a particular distribution. The variance of a
population is the average of all the deviations from the mean squared. The
standard deviation is the square root of this quantity. If observations represent a
sample, then the sum of all deviations squared must be divided by sample size
minus one o avoid errors. The following are formulas for variances of population,
 2 and sample s2 .
2  
( x   )2
N
and
s2 
 (x  x )
2
n 1
The standard deviation of both population and sample can be found by square
rooting these quantities.
It might be better to do a sample and find quantities such as, the mean, the median,
the mode, the variance and the standard deviation of the sample.
Example: Consider the following sample to be age of 6 employees of a
7
small firm:
23,26, 30,41, 29,43
1) Find the mean
X
X
n

23+26+ 30+41+ 29+43
 32
6
2) Find the median
23,26,29,30,41,43
X
29  30
 29.5
2
3) Mode does not exist.
6
4) Calculate
 ( x  x)
i 1
(23-32)+(26-32)+(30-32)+(41-31)+(29-32)+(43-32) = 0
6
5) Calculate
 (x  x )
2
i 1
2
(23-32) +(26-32)2+(30-32)2+(41-32) 2+(29-32) 2+(43-32) 2= 332
This is known as the sum of squares and is used in many statistical procedures.
6) Now, to find the variance of the sample, simply divide the quantity found in the
last step by the degree of freedom, 6 – 1, to find the variance of the sample.
s2 
 (x  x )
n 1
2

332
 66.4
6 1
7) To find the standard deviation, find the square root of the variance. The
standard deviation is known as the unit measure of distance.
8
s  66.4  8.15
xx
, to find the norm or z score of a particular
s
30  32
observation. For example, the z – score of 30 is z 
 0.25 .
8.15
We also use the formula, z 
Note: One measure of variation that is used the least is called the range of data.
The range is the difference between the largest and smallest number in a set of
data.
Normal distribution scale conversions
As I mentioned above, you can use formula z 
xx
x
, or z 
, to convert a score
s

from a normal distribution to standardized Z-score. For example if the mean of
SAT scores is 500 with the standard deviation 100, then the score 500 has a Zscore of zero in standardized z-scale. A score of 600 has a Z-score of 1 and a score
of 400 has a z score of -1.
z
x

500  500
600  500
400  500
z
 0, z 
 1, z 
 1
100
100
100
Z=-3 Z=-2 Z=-1 Z= 0 Z=+1
In same way if we use formula z 
x

,
Z=+2 Z=+3
and solve for x in terms of the other
variables we get x= x    z . Hence, the x score of z=-1 is 400, the x score of 600
is +1 and so forth.
X=200
x=300 x=400
µ =500 x=600 x=700
x=800
Students must first be familiarized with normal distribution before drawing any
conclusions.
9
A distribution is normal or roughly normal if the three measurements of the
location mean, the median and the mode are roughly equal.
Before solving any application problem students must be able to find different
probability values from the standard normal curve. Based on our book tables you
can find the area to the left of a particular number on the standard curve. If a
distribution is normal then the mean will be in the center of the data. Fifty percent
of the data will be located to the left of the mean and the other 50% will be located
in the right hand side.
The normal distribution table covers probabilities within three standard deviations
from the mean. The mean and standard deviation of the normal distribution are 0
and 1, respectively.
The mean Z= 0 is always in the center of the distribution. Z scores in the left side
of the mean are negative and Z scores in the right side are positive. To find
probabilities in a normal distribution you may always draw a bell curved figure and
shade the desired area before answering the question.
Furthermore, understanding the mathematical such as ≤ ,≥ , < , and >
are very important before finding your answers.
To make a long story short, simply locate a given z score on the z score line then
shade the area to the left. Sometimes the answer is the opposite of what you think
it is. For example, if you are asked to calculate P (z > -1) Which means find the
probability of getting an observation one standard deviation below the mean or
higher, first easily find the area to the left side of -1. Then subtract it from 1,
which represents the entire curve to find the answer. This matter will be a lot
easier if you draw a picture:
10
To practice, find the answers for the following problems from normal curve:
1) Find P (z <-1.25)
2) Find P (z >- 1.87)
3) P (- 1.1 <z < 2.3)
4) P (2.1 < z <3.05)
11
5) P (2.1 < z <3.05)
Normal distribution usage is very common. I will demonstrate this reality by using
some popular examples.
Assume the GMAT scores are normally distributed with the mean of 529 and
standard deviation of 113 points. Answer the following questions based on this
fact:
a) What is the probability that a randomly selected student scores 700 points or
better?
12
b) A college administrator would like to accept students whose scores are at
least 400 points. What is the probability that a randomly selected student
scores at least 400 points?
Note: a student’s percentile shows his or her ranking among the students
who took the GMAT exam. What is the percentile rank of a student whose
score is 630 points if the mean and standard devition in the GMAT exam are
529 and 65 , respectively?
Solution:
Some more applications of normal distribution:
13
If x is a continuous variable from a normal population with mean,
μ = 500 and standard deviation of σ = 100 then:
1) What is the Z-score of x = 635?
Z = __________
2) Using normal distribution table, what is the probability of getting
x ≥ 583?
P(x ≥ 583) = __________
14
3) Refer to normal distribution, what is the probability that x ≤ 605?
P (x ≤ 605) = __________
4) The probability that 600 ≤ x ≤ 720?
P(600 ≤ x ≤ 720) = _________
5) Assume μ = 500 is the average mathematics scores of students taking the
SAT exam, what is the probability that a randomly selected student score is
either less than 550 or greater than 680?
P(x ≤ 550 or x ≥ 680) = __________
15
6) If x is two standard deviation below the mean, then x = _________?
7) Middle 80% of all observations is this population falls within what two x
values?
_______ ≤ x ≤ _______
8) A sample size n = 5 is taken from a normal distribution. The value
samples are: 38, 46, 34, 38, and 24.
__
A) What is the mean of sample, x = ___________
B) What is the standard deviation of sample, S = __________
16
9) If the sample in Part A is taken from a population with a known standard
deviation, then what would be the interval coefficient for a 96% confidence
interval?
Zα/2 = __________
10) Find a 96% confidence interval for the true mean of a population
that sample in part 8 is taken from, if:
A) It is known the standard deviation of the population is 3
B) The standard deviation of sample, S = 4, is used to estimate this
confidence interval.
17