Download Statistics - Ipemgzb.ac.in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Misuse of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript
Statistics
• The systematic and scientific treatment of
quantitative measurement is precisely known
as statistics.
• Statistics may be called as science of counting.
• Statistics is concerned with the collection,
classification (or organization), presentation,
analysis and interpretation of data which are
measurable in numerical terms.
Stages of Statistical Investigation
Collection of Data
Organization of data
Presentation of data
Analysis
Interpretation of Results
Statistics
• It is divided into two major parts: Descriptive
and Inferential Statistics.
• Descriptive statistics, is a set of methods to
describe data that we have collected. i.e.
summarization of data.
• Inferential statistics, is a set of methods
used to make a generalization, estimate,
prediction or decision. When we want to
draw conclusions about a distribution.
Statistics functions & Uses
•
•
•
•
•
•
•
It simplifies complex data
It provides techniques for comparison
It studies relationship
It helps in formulating policies
It helps in forecasting
It is helpful for common man
Statistical methods merges with speed of
computer can make wonders; SPSS, STATA
MATLAB, MINITAB etc.
Scope of Statistics
•
•
•
•
•
•
•
In Business Decision Making
In Medical Sciences
In Actuarial Science
In Economic Planning
In Agricultural Sciences
In Banking & Insurance
In Politics & Social Science
Distrust & Misuse of Statistics
• Statistics is like a clay of which one can make a
God or Devil.
• Statistics are the liers of first order.
• Statistics can prove or disprove anything.
Measure of Central Tendency
It is a single value represent the entire mass of
data. Generally, these are the central part of
the distribution.
It facilitates comparison & decision-making
There are mainly three type of measure
1. Arithmetic mean
2. Median
3. Mode
Arithmetic Mean
This single representative value can be
determined by:
A.M. =Sum/No. of observations
Properties:
1. The sum of the deviations from AM is always
zero.
2. If every value of the variable increased or
decreased by a constant then new AM will
also change in same ratio.
Arithmetic Mean
(contd..)
3. If every value of the variable multiplied or
divide by a constant then new AM will also
change in same ratio.
4. The sum of squares of deviations from AM is
minimum.
5. The combined AM of two or more related
group is defined as
Median
Mode
• Mode is that value which occurs most often in
the series.
• It is the value around which, the items tends
to be heavily concentrated.
• It is important average when we talk about
“most common size of shoe or shirt”.
Relationship among Mean, Median & Mode
• For a symmetric distribution:
Mode = Median = Mean
• The empirical relationship between mean,
median and mode for asymmetric distribution
is:
Mode = 3 Median – 2 Mean
Advantages and disadvantages


Mean
More sensitive than the
median, because it makes
use of all the values of the
data.
It can be misrepresentative
if there is an extreme
value.
Median
It is not affected by
It is less sensitive than the
extreme scores, so can give mean, as it does not take
a representative value.
into account all of the
values.
Mode
It is useful when the data
are in categories, such as
the number of babies who
are securely attached.
It is not a useful way of
describing data when there
are several modes.
Same center,
different variation
• Ignores the way in which data are
distributed
Range = 12 - 7 = 5
Range = 12 - 7 = 5
7
8
9
10
11
12
7
8
9
10
11
12
 When the value of Arithmetic mean is fraction
value(not an integer), Then to compute variance we
use the formulae:
1
 X 
2
  X 

n
 n 
2
2
Calculate S.D.;x
10
11
17
25
7
13
21
10
12
14
Formulae for Frequency distribution
 By Definition:
2
1
 
f x  x
f
2
 For Computation:
 f x   fx 
 


f
 f 
2
2
2
Example
• An analysis of
production rejects
resulted in the
following figures.
Calculate mean and
variance for number
of rejects per
operator
No. of
rejects per
operator
No. of
operators
21-25
5
26-30
15
31-35
23
36-40
42
41-45
12
46-50
03
Example
• Calculate variance from the following data. (Sale is given in
thousand Rs.)
Sale
No. of days
10-20
3
20-30
6
30-40
11
40-50
3
50-60
2
An Analysis of production rejects resulted in following observations
No. of rejects/ operator
No. of operators
20-25
5
25-30
15
30-35
28
35-40
42
40-45
15
45-50
12
50-55
3
Calculate the mean and standard deviation.
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Is used to compare two or more sets of data
measured in different units
Comparing Coefficient
of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
• Stock B:
– Average price last year = $100
– Standard deviation = $5
Coefficient of variation:
Stock A:
S
CV  
X

 $5 
100%  
100%  10%

 $50 
Stock B:
S
CV  
X

 $5 
100%  
100%  5%

 $100 
An investment ‘A’ has an Expected return of
Rs.1,000 and a standard deviation of Rs. 300.
Another investment ‘B’ has a standard deviation of
its returns as 400 but its expected return is 4,000.
Calculate which investment is more risky.
.
Example
• 2. A quality control laboratory received samples of electric bulbs
for testing their lives, from two companies. The results were as
follows:
(a). Which company’s bulbs have the greater length of life?
(b). Which company’s bulbs are more uniform with respect to their
lives?
Length of life (in
hrs.)
Company A
Company B
15-20
16
18
20-25
26
22
25-30
08
08
The share prices of a company in Mumbai and Kolkata markets during the last 10
months are recorded below:
Month
Mumbai
Kolkata
Jan
105
108
Feb
120
117
March
115
120
April
118
130
May
130
100
June
127
125
July
109
125
Aug
110
120
Sep
104
110
Oct
112
135
Determine the Mean and standard deviation of prices of shares .In which markets
are the share prices more stable?
Shape of a Distribution
• Describes how data is distributed
• Measures of shape
– Symmetric or skewed
Left-Skewed
Mean < Median < Mode
Symmetric
Mean = Median =Mode
Right-Skewed
Mode < Median < Mean
Skewness
For a positively skewed distribution:
Mean>Median>Mode
• For a Negatively skewed distribution:
Mean<Median<Mode
Measure of Skewness
• Karl Pearson coefficient of Skewness:
Mean  Mode
Sk 
S.D
Where
-3 <=
Sk <= 3
Calculate the Karl pearson coefficient of skewness for
the given data & comment about the result.
7, 9, 15, 16, 17, 22, 25, 27,33,39.
Advantages and disadvantages
Advantages
Disadvantages
Range
Quick and easy to calculate
Affected by extreme values
(outliers)
Does not take into account
all the values
Standard deviation
More precise measure of
dispersion because all
values are taken into
account
Much harder to calculate
than the range