Download BA 275, Fall 1998 Quantitative Business Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BA 555 Practical Business Analysis
Agenda
 Housekeeping
 Review of Statistics




Exploring Data
Sampling Distribution of a Statistic
Confidence Interval Estimation
Hypothesis Testing
1
Definition
“Statistics” is the science of data.
It involves collecting, classifying,
summarizing, organizing, analyzing,
and interpreting numerical information.
We will learn how to make
based on data
2
Fundamental Elements of Statistics
 A population is a set of units (usually people, objects,
transactions, or events) that we are interested in studying. It is
the totality of items or things under consideration.
 A sample is a subset of the units of a population. It is the
portion of the population that is selected for analysis.
 A parameter is a numerical descriptive measure of a population.
It is a summary measure that is computed to describe a
characteristic of an entire population.
 A statistic is a numerical descriptive measure of a sample. It is
a summary measure calculated from the observations in the
sample.
3
Example
A manufacturer of computer chips claims that less than 10% of his
products are defective. When 1,000 chips were drawn from a
large production, 7.5% were found to be defective.
 What is the population of interest?
 What is the sample?
 What is the parameter?
 What is the statistic?
 Does the value 10% refer to the parameter or to the statistic?
 Is the value 7.5% a parameter or a statistic?
4
statistics:x , s2, s,p̂ , etc.
x1, x2, …, xn
Sample of size n
Qualitative
Quantitative
Organizing data:
Estimation
Hypothesis Testing
Regression Analysis
Contingency Tables
Drawing conclusions from data:
Random variables,
Probability,
Distributions
Discrete: binomial distribution
Continuous: normal distribution,
Sampling distribution of the sample mean
Describing uncertainty:
X1, X2, …, Xn
Selecting a random sample:
parameters: , 2, , p, etc.
POPULATION
Statistical Analysis (p.3)
5
Types of Data (p.2)
 Numerical (Quantitative) Data
 Regular numerical observations. Arithmetic
calculations are meaningful.
 Age
 Household income
 Starting salary
 Categorical (Qualitative) Data
 Values are the (arbitrary) names of possible
categories.
 Gender: Female = 1 vs Male = 0.
 College major
6
Employee Database
(class website, EmployeeDB.sf3)
Quantitative
Qualitative
7
Describing Qualitative Data (p.4)
Qualitative Data
Graphical Methods
Pie chart
Bar chart
Line graph
Numerical Methods
Frequency tables
(Categorical Data)
e.g. gender, college
major, etc.
Display one variable:
 Histogram
 Stem-and-Leaf Display
 Dot plot
Measures of Location:
 Mean:
1 n
 sample mean X   X i
n i 1
1 N
 population mean  
 Xi
N i 1
Display two variables:
 Scatter plot
Display one variable over time:
 Time series plot
Measures of Relative Standing:
 Percentiles:

Median:
 Arrange the observations in ascending order.
 If n is odd, median = the middle number
 If n is even, median = the simple average of the
middle two observations.
 Mode:
 The measurement that occurs most frequently in
8
Summarizing Qualitative Data
Barchart for Gender
Piechart for Gender
Gender
F
M
39.44%
F
M
60.56%
0
10
20
30
40
50
frequency
Frequency Table for Gender
-------------------------------------------------------------Relative
Cumulative Cum. Rel.
Class
Value
Frequency Frequency Frequency
Frequency
-------------------------------------------------------------1
F
28
0.3944
28
0.3944
2
M
43
0.6056
71
1.0000
--------------------------------------------------------------
9
Describing Quantitative Data (p.4)
Graphical Methods
10
Quantitative Data: Histogram
Histogram for SALARY
24
frequency
20
16
12
8
4
Histogram for AGE
0
23
28
33
38
43
48
53
20
63
(X 1000)
frequency
Salary (in $000)
58
16
12
8
4
0
Histogram applet
30
35
40
45
50
Age
55
60
65
11
Describing Quantitative Data (p.4)
Descriptive/Summary Statistics
12
Guessing Correlations
-0.99
-0.29
0.54
0.95
13
Correlation: Be Careful
Scatter plot
Correlation value
?
Correlation
14
Example
Given the data below, complete the following summary statistics table. (Data are in ascending
order): 10.0, 10.5, 12.2, 13.9, 13.9, 14.1, 14.7, 14.7, 15.1, 15.3, 15.9, 17.7, 18.5
Count
Average
Median
Variance
Standard deviation
Minimum
Maximum
Range
Lower quartile
Upper quartile
Interquartile range
Sum
Variable X
13
14.3462
14.7
5.94936
2.43913
10.0
18.5
8.5
13.9
15.3
1.4
186.5
Box-and-Whisker Plot
10
12
14
16
18
20
Variable X
Lower invisible line: 11.8
Upper invisible line: 17.4
15
Statgraphics Plus (SG+) Demo (p.1)
Questions to ask when describing and summarizing
data:
 Where is the approximate center of the distribution?
 Are the observations close to one another, or are
they widely dispersed?
 Is the distribution unimodal, bimodal, or multimodal?
If there is more than one mode, where are the peaks,
and where are the valleys?
 Is the distribution symmetric? If not, is it skewed? If
symmetric, is it bell-shaped?
16
The Empirical Rule (p.5)
1. Approximately 68% of the observations will fall within 1 standard deviation of the mean.
2. Approximately 95% of the observations will fall within 2 standard deviations of the mean.
3. Approximately 99.7% of the observations will fall within 3 standard deviations of the mean.
99.7%
99.7%
95%
95%
68%
0.15%
x  3s
  3
x  3s
3
0.15%
  3
3
2.35%
13.5%
68%
34%
34%
13.5%
x  s x 34%x  s 34% x  2s13.5%x  3s
x  2s 13.5%
 x22 s   x s  x x  s2   3
x  2s
1 0
1
2
3
 
  2
  2
  2
0
1
1
2
2
2.35%
2.35%
0.15%
2.35%
0.15%
x  3s
  3
3
17
Example
 The average salary for employees with
similar background/skills/etc. is about
$120,000.
 Your salary is $122,000.
 Is it a big deal? Why or why not? What
additional information is required to answer
this question?
18
What to do next?
 Generalize the results from the empirical rule.
 Justify the use of the mound-shaped
distribution.
99.7%
95%
68%
0.15%
2.35%
13.5%
34%
34%
13.5%
2.35%
0.15%
x  3s
  3
x  2s
xs
x
xs
x  2s
  2
 
 
  2
3
2
1

0
x  3s
  3
1
2
3
19
Example: Warranty Level
Mean = 30,000 miles STD = 5,000 miles
Q1: If the level of warranty is set at 15,000 miles, about what % of tires will be returned
under warranty?
Q2: If we can accept that up to 2.5% of tires can be returned under warranty, what should be
the new warranty level?
0.04
0.03
0.02
0.01
0
0
10
20
30
40
50
60
20
Example: Warranty Level
Mean = 30,000 miles STD = 5,000 miles
Q1: If the level of warranty is set at 12,000 miles, about what % of tires will be returned
under the warranty?
Q2: If we can accept that up to 3.0% of tires can be returned under warranty, what should be
the warranty level?
0.04
0.03
0.02
0.01
0
0
10
20
30
40
50
60
21
Normal Probabilities
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
.00
.0000
.0398
.0793
.1179
.1554
.1915
.2257
.2580
.2881
.3159
.3413
.3643
.3849
.4032
.4192
.4332
.4452
.4554
.4641
.4713
.4772
.4821
.4861
.4893
.4918
.4938
.4953
.4965
.4974
.4981
.4987
.01
.0040
.0438
.0832
.1217
.1591
.1950
.2291
.2611
.2910
.3186
.3438
.3665
.3869
.4049
.4207
.4345
.4463
.4564
.4649
.4719
.4778
.4826
.4864
.4896
.4920
.4940
.4955
.4966
.4975
.4982
.4987
.02
.0080
.0478
.0871
.1255
.1628
.1985
.2324
.2642
.2939
.3212
.3461
.3686
.3888
.4066
.4222
.4357
.4474
.4573
.4656
.4726
.4783
.4830
.4868
.4898
.4922
.4941
.4956
.4967
.4976
.4982
.4987
.03
.0120
.0517
.0910
.1293
.1664
.2019
.2357
.2673
.2967
.3238
.3485
.3708
.3907
.4082
.4236
.4370
.4484
.4582
.4664
.4732
.4788
.4834
.4871
.4901
.4925
.4943
.4957
.4968
.4977
.4983
.4988
.04
.0160
.0557
.0948
.1331
.1700
.2054
.2389
.2704
.2995
.3264
.3508
.3729
.3925
.4099
.4251
.4382
.4495
.4591
.4671
.4738
.4793
.4838
.4875
.4904
.4927
.4945
.4959
.4969
.4977
.4984
.4988
.05
.0199
.0596
.0987
.1368
.1736
.2088
.2422
.2734
.3023
.3289
.3531
.3749
.3944
.4115
.4265
.4394
.4505
.4599
.4678
.4744
.4798
.4842
.4878
.4906
.4929
.4946
.4960
.4970
.4978
.4984
.4989
.06
.0239
.0636
.1026
.1406
.1772
.2123
.2454
.2764
.3051
.3315
.3554
.3770
.3962
.4131
.4279
.4406
.4515
.4608
.4686
.4750
.4803
.4846
.4881
.4909
.4931
.4948
.4961
.4971
.4979
.4985
.4989
.07
.0279
.0675
.1064
.1443
.1808
.2157
.2486
.2794
.3078
.3340
.3577
.3790
.3980
.4147
.4292
.4418
.4525
.4616
.4693
.4756
.4808
.4850
.4884
.4911
.4932
.4949
.4962
.4972
.4979
.4985
.4989
.08
.0319
.0714
.1103
.1480
.1844
.2190
.2517
.2823
.3106
.3365
.3599
.3810
.3997
.4162
.4306
.4429
.4535
.4625
.4699
.4761
.4812
.4854
.4887
.4913
.4934
.4951
.4963
.4973
.4980
.4986
.4990
.09
.0359
.0753
.1141
.1517
.1879
.2224
.2549
.2852
.3133
.3389
.3621
.3830
.4015
.4177
.4319
.4441
.4545
.4633
.4706
.4767
.4817
.4857
.4890
.4916
.4936
.4952
.4964
.4974
.4981
.4986
.4990
22
Sampling Distribution (p.6)
 The sampling distribution of a statistic is
the probability distribution for all possible
values of the statistic that results when
random samples of size n are repeatedly
drawn from the population.
 When the sample size is large, what is the
sampling distribution of the sample mean /
sample proportion / the difference of two
samples means / the difference of two sample
proportions?  NORMAL !!!
23
Central Limit Theorem (CLT) (p.6)
 If X ~ N(,  ), then X ~ N( 
2
X
,
 
2
X

2
n
)
Sample: X1, X2, …, Xn
X
P ( a  X  b)  ?
24
Central Limit Theorem (CLT) (p.6)
 If X ~ Any distribution with the mean , and
variance 2, then X ~ N(    ,   n ) for large n.
2
X
2
X
Sample: X1, X2, …, Xn
X
P ( a  X  b)  ?
25
Standard Deviations





Population standard deviation  X or simply  .
Sample standard deviation s X or simply s .
Standard deviation of sample means (aka. standard error)  X
Standard deviation of sample proportions (aka. standard error)  p̂
Relationships:
o
o
X 
 pˆ 
X
n

sX
n
p(1  p)

n
: ˆ X or s X
pˆ (1  pˆ )
: ˆ pˆ or s pˆ
n
26
Statistical Inference: Estimation
Population
Research Question:
What is the parameter value?
Sample of size n
Tools (i.e., formulas):
Point Estimator
Interval Estimator
27
Confidence Interval Estimation (p.7)
28
Example
 A random sampling of a company’s monthly
operating expenses for a sample of 12
months produced a sample mean of $5474
and a standard deviation of $764. Construct
a 95% confidence interval for the company’s
mean monthly expenses.
29
Statistical Inference: Hypothesis
Testing
Population
Research Question:
Is the claim supported?
Sample of size n
Tools (i.e., formulas):
z or t statistic
30
Hypothesis Testing (p.9)
31
Example
 A bank has set up a customer service goal
that the mean waiting time for its customers
will be less than 2 minutes. The bank
randomly samples 30 customers and finds
that the sample mean is 100 seconds.
Assuming that the sample is from a normal
distribution and the standard deviation is 28
seconds, can the bank safely conclude that
the population mean waiting time is less than
2 minutes?
32
Margin of Error (B)
(point estimator)  (multiplie r)  (std of point estimator)

B: margin of error
• What does B tell us
about the point
estimator?
• How do we reduce the
value of B?
X  z / 2

n
X  t / 2
s
n
pˆ  z / 2
pˆ (1  pˆ )
n
33
Relations among B, n, and 
B (margin of error)
N (sample size)
How to reduce B?
Confidence Level (e.g., 90%, 95%)
34
Estimation in Practice
 Determine a confidence level (say, 95%).
 How good do you want the estimate to be? (define
margin of error)
 Use formulas (p.8) to find out a sample size that
satisfies pre-determined confidence level and
margin of error.
Parameter

p
Sample Size Needed
2
2
1. Replace or s with the one from
 z / 2 
 z / 2 s 
or
n



a previous study, or
 B 
 B 
2. Estimate it by range 4 or
range/6.
2
1. use the p from a similar study or
 z / 2 p(1  p) 
previous experiment.

n


B
2. be conservative. Use p = 0.5.


35
Accuracy Gained by Increasing the
Sample Size (p.8)
2

1 1
 1.96  
2 2
A 95% Confidence Interval for p: n  


B




Margin of Error (B)
Sample Size (n)
7%
196
6%
266
5%
384
4%
600
3%
1067
2%
2401
1%
9604
36
Related documents