Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Taylor's law wikipedia, lookup

Student's t-test wikipedia, lookup

Transcript
```1
Statistics for Business Lecture Notes (01.12.2016)
Dr. Cansu Unver Erbas |[email protected]
Measure of variability (measure of dispersion)
Range
The simplest measure of dispersion is the range. It is the difference between the maximum
and the minimum values in a data set. In the form of an equation:
Range= Maximum value-Minimum value
Example 1: What is the range for the given data?
1000 1050 3000 2500 1780 2210 2540 1980 3650 4970 5000 8500 7010
Solution 1: Range= Maximum value-Minimum value
= 8500-1000=7500
Interquartile range
A measure of variability that overcomes the dependency on extreme values is the
interquartile range (IQR). This measure of variability is simply the difference between the
third quartile, Q3 , and the first quartile, Q1,.In other words, the interquartile range is the range
for the middle 50 per cent of the data. See Figure1 for an illustration.
Figure 1 Interquartile range (IQR)
Variance
The variance is a measure of variability that uses all the data. The variance is based on the
difference between the value of each data and the mean. The difference is called a deviation
_
about the mean. For a sample, a deviation about the mean is written ( x i  x) . For a
population, it is written ( x i   ).
In the computation of the variance, the deviations about the mean are squared.
Population variance=  2 
 (x
i
 ) 2
N
2
In most statistical applications, the data being analysed are for a sample. When we compute
a sample variance, we are often interested in using it to estimate the population variance  2 .
Although the detailed explanation is beyond the scope of this lecture, it can be shown that if
the sum of the squared deviations about the sample mean is divided by n  1 , and not n , the
resulting sample variance provides an unbiased estimate of the population variance. For this
reason, the sample variance, denoted by s 2 , is defined as follows.
Sample variance= s
2
 (x
=
_
i
 x) 2
n 1
Example 1- Consider the data on class size for the sample of five university classes given
below:
Classes Number of students
1
46
2
48
3
47
4
50
5
44
_
_
Compute mean class size ( x ), deviation about the mean ( x i  x) , squared deviation about
_
the mean ( x i  x) 2, and the sample variance.
Solution1 :
Number of
students x i
Mean class
size x
mean ( x i  x)
the mean ( x i  x) 2
46
48
47
50
44
Total
47
47
47
47
47
-1
1
0
3
-3
0
1
1
0
9
9
20
_
_
_
 ( x i  x)
_
Sample variance= s 2 =
 ( x i  x) 2
n 1
=
20
20

5
5 1 4
_
_
 ( x i  x) 2
3
Practice 1: Consider the starting salaries in Table below for the 10 business school
Monthly salary x i
1500
1650
1800
1950
2000
1750
1900
1560
1940
1600
Total
Sample mean
_
_
_
monthly salary x
mean ( x i  x)
the mean ( x i  x) 2
1765
1765
1765
-265
115
35
185
235
-15
135
-205
175
-165
0
70.225
13.225
1.225
34.225
55.225
225
18.225
42.025
30.625
27.225
326.675
 (x
_
i
 x)
 (x
_
i
 x) 2
Thus, the sample variance is:
_
s2 =
 ( x i  x) 2
n 1
Standard Deviation
The standard deviation is defined to be the positive square root of the variance. Following the
notation we adopted for a sample variance and a population variance, we use s to denote the
sample standard deviation and
 to denote the population standard deviation. The standard
deviation is derived from the variance in the following way:
Sample standard deviation= s  s 2
and
Population standard deviation=    2
What is gained by converting the variance to its corresponding standard deviation? Recall
that the units associated with the variance are squared. If the sample variance for monthly
salary is s 2 = 36.297,2222( £2). Because the standard deviation is the square root of the
variance, the units are converted to pounds in the standard deviation. Hence the standard
deviation of the starting salary data is 190,52 £. In other words, the standard deviation is
measured in the same units as the original data. For this reason the standard deviation is
4
more easily compared to the mean and other statistics that are measured in the same units
as the original data.
Practice 2: Find the standard deviation for class size and starting monthly salary from the
previous examples.
Coefficient of Variation
In some situations we may be interested in a descriptive statistic that indicates how large the
standard deviation is relative to the mean. This measure is called the coefficient of variation
and is usually expressed as a percentage.
Coefficient of variance= (
std dev
 100)%
Mean
For instance, for a sample where a standard deviation is 4, and sample mean is 50, the
coefficient of variation is 4  50  100 %=8% This mean, the sample standard deviation is
8 per cent of the value of the sample mean.
Another example, a standard deviation and the sample mean is 2 and 46 respectively for a
sample data. The coefficient of variation is, 2  46  100 %=4.35% which tells us the
sample standard deviation is only 4.35 per cent of the value of the sample mean.
In general, the coefficient of variation is a useful statistics for comparing the variability of
variables that have different standard deviations and different means.
Summary
Statistics is the art and science of collecting, analysing, presenting and interpreting data.
Data consists of the facts and figures that are collected and analysed.
A set of measurements obtained for a particular element is an observation.
For purposes of statistical analysis, data can be classified a qualitative or quantitative.
Qualitative data use labels or names to identify an attribute of each element. Quantitative
data are numeric values that indicate how much or how many. Please check the following
table for the summary of data in detail.
5
Figure: Tabular and graphical methods for summarising data
Data
Qualitative
data
Tabular
methods
-Frequency dist.
-Relative frequency dist.
-Percentage frequency dist.
-Cross-tabulation
Graphical
methods
-Bar chart
-Pie chart
Quantitative
data
Tabular
methods
Graphical
methods
-Frequency dist.
-Relative frequency dist.
-Percentage frequency dist.
-Cumulative frequency dist.
-Cumulative relative f.d.
-Cumulative percentage f.d.
-Cross-tabulation
-Dot plot
-Histogram
-Ogive
-Scatter diagram
Practice 2: Please work out on questions from 9 to 16from the Exercises at page 81-82
```