Download Intro Basic Statistics Part I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Introduction to Business Analytics
Dr. Barry A. Wray
1
Measures of Central Tendency
• Average/Arithmetic mean
• Median
• Mode
2
Measures of Central Tendency
Average/Arithmetic mean or “Average” value for a variable.
denoted by the Greek small m: m
Population Mean (mu), m=
𝑥𝑖
𝑁
=
𝑥1 + 𝑥2
+ · · · + 𝑥𝑁
𝑁
• 𝑥1 = value of variable x for the first observation
• 𝑥2 = value of variable x for the second observation
• 𝑥𝑛 = value of variable x for the nth observation
Where N = the Population size
3
Measures of Central Tendency
Same mean denoted by 𝑥.
Sample mean, 𝑥 =
𝑥𝑖
𝑛
=
𝑥1 + 𝑥2
+ · · · + 𝑥𝑛
𝑛
where n = sample size
4
Measures of Central Tendency
Sample median (middle value)
𝑋𝑛+1 = location of the median
2
where n = sample size
*First you MUST order the data from smallest
to largest!
5
Measures of Central Tendency
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
To locate the mean we
need to find:
𝑥=
=
𝑥𝑖
𝑛
=
385
20
𝑥1 + 𝑥2
+ · · · + 𝑥𝑛
𝑛
=
12 + 13
+ · · · + 33
20
= 19.25𝑎𝑢𝑑𝑖𝑡𝑠
6
Measures of Central Tendency
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
To locate the median we need to find:
𝑋𝑛+1 = 𝑋20+1 = 𝑋21 = 𝑋10.5
2
2
2
When you get a location like 10.5 then
you add the 10th and 11th observations
and divide by 2 to get the median.
18+18
=
2
18audits
7
Measures of Central Tendency
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
To find the mode we need to find the
value that occurs most frequently:
First you MUST order the data!
Mode = 18audits
•
Note: if there was one more 14 then we would have two modes:
14 and 18
8
Measures of Dispersion
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
The Standard Deviation is a measure
of the typical difference (distance)
between each value for a variable and
the mean of all values. Note the
difference in the formulas for the
Population Standard Deviation and the
sample standard deviation.
9
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
Measures of Dispersion
Population Std. Dev.
𝑁
1
𝜎=
𝑋𝑖 −𝜇 2
where m is the Population mean
𝑁
and N is the Population size
s=
𝑛
1
𝑋𝑖 −𝑋 2
where 𝑋 is the sample mean
𝑛−1
and n is the sample size
10
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
Measures of Dispersion
Because this is sample data we use:
s=
=
=
𝑛
1
𝑋𝑖 −𝑋 2
𝑛−1
12−19.25 2 + 13−19.25 2 +⋯+ 33−19.25 2
20−1
561.75
19
= 29.5658 =5.4374days
11
Audit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Days
12
13
14
14
15
15
16
17
18
18
18
19
20
21
22
22
23
27
28
33
Parameters and statistics
Because this is sample data we use:
So what is your best guess for m?
𝑥 = 19.25𝑑𝑎𝑦𝑠
So, what is your best guess for s?
s=5.4374days
12
Practice
• For the following 3 sample data sets compute the mean, median, and
mode. Also compute the standard deviation for each.
13
Practice
• Data Set #1:
Observation Data
40
1
43
2
48
3
50
4
52
5
53
6
55
7
59
8
14
Practice
• Data Set #2:
Observation Data
40
1
43
2
48
3
50
4
52
5
53
6
55
7
159
8
15
Practice
• Data Set #3:
Observation Data
18
1
22
2
25
3
48
4
54
5
65
6
75
7
93
8
16
Measures of Association Between Two
Variables
• Scatter Charts: Useful graph for analyzing the relationship between
two variables.
• Covariance: Descriptive measure of the linear association between
two variables.
•
Sample covariance for a sample of size n with the observations
(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), and so on:
𝑠𝑥𝑦 =
•
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑛−1
Population covariance, 𝜎𝑥𝑦 =
𝑥𝑖 − µ𝑥 𝑦𝑖 − µ𝑦
𝑁
17
Measures of Association Between Two
Variables
• Correlation coefficient: Measures the relationship between two
variables.
•
•
Not affected by the units of measurement for x and y.
Sample correlation coefficient denoted by 𝑟𝑥𝑦 .
•
𝑟𝑥𝑦 =
𝑠𝑥𝑦
𝑠𝑥 𝑠𝑦
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
•
𝑠𝑥𝑦 = sample covariance =
•
𝑥𝑖 − 𝑥
𝑛−1
2
𝑠𝑥 = sample standard deviation of x =
2
𝑠𝑦 = sample standard deviation of y =
𝑦𝑖 − 𝑦
𝑛−1
•
𝑛−1
18
Interpretation of Correlation Coefficient
• –1 ≤ r ≤ +1
r value
Relationship between the
x and y variables
<0
Negative linear
Near 0
No linear relationship
>0
Positive linear
19
Data for Bottled Water Sales at Queensland
Amusement Park for a Sample of 14 Summer Days
20
Chart Showing the Positive Linear Relation
Between Sales and High Temperatures
21
Sample Covariance Calculations for Daily High
Temperature and Bottled Water Sales at Queensland Amusement
Park
22
Scatter Diagrams and Associated Covariance
Values for Different Variable Relationships
(a)
𝑠𝑥𝑦 Positive:
(x and y are positively
linearly related)
(b)
(c)
𝑠𝑥𝑦 Approximately 0:
𝑠𝑥𝑦 Negative:
(x and y are not
linearly related)
(x and y are negatively
linearly related)
23
Computation of Correlation Coefficient
• Illustration - To determine the sample correlation coefficient for bottled
water sales at Queensland Amusement Park:
𝑟𝑥𝑦 =
𝑠𝑥𝑦
𝑠𝑥 𝑠𝑦
=
12.8
= 0.93
(4.36)(3.15)
• There is a very strong linear relationship between high temperature
and sales.
R2 is the coefficient of determination.
24
Example of Nonlinear Relationship
Producing a Correlation Coefficient Near
Zero
25
Calculating Covariance and Correlation Coefficient for
Bottled Water Sales Using Excel
26