Download 1. - NMIMS

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Business
Statistics
Course Index
S. No.
Reference
No.
1.
Chapter 1
Introduction to Business Statistics
08 – 21
2.
Chapter 2
Descriptive Statistics: Collection, Processing and Presentation of
Data
22 – 36
3.
Chapter 3
Measures of Central Tendency
37 – 51
4.
Chapter 4
Measures of Dispersion
52 – 66
5.
Chapter 5
Skewness and Kurtosis
67 – 79
6.
Chapter 6
Correlation Analysis
80 – 98
7.
Chapter 7
Regression Analysis
99 – 114
8.
Chapter 8
Theory of Probability
115 – 134
9.
Chapter 9
Probability Distribution
135 – 153
10.
Chapter 10
Use of Excel Software for Statistical Analysis
154 – 194
1– 2
Particulars
Slide From
– To
Course Introduction

Managerial decision-making can be made efficient and effective by
analyzing available data using appropriate statistical tools. Statistical
not only have application in research (marketing research
functional areas like quality management,
analysis, human resource planning
1– 3
tools
included) but also in other
inventory management, financial
and so on.
Cont….

The word statistics is derived from the Italian word ‘Stato’ which
means ‘state’; and ‘Statista’ refers to a person involved with the affairs
state. Thus, statistics originally was meant for collection of facts
affaires of the state, like taxes, land records, population
1– 4
useful
of
for
demography, etc.
Cont….

Significant contribution has also been made by Indians in the field of
statistics. Prof Prasant Chandra Mahalanobis, is the first to pioneer
the
study of statistical science in India. He founded the Indian Statistical Institute (ISI)
in1931. Mahalanobis viewed statistics as a
all human efforts and also

tool in increasing the efficiency of
concentrated on sample surveys.
Statistics are the classified facts representing the conditions of the
people in the state…. specially those facts which can be stated in
number or in table of numbers or in any tabular or classified
arrangement”.
– Webster
1– 5
Cont….

Statistical methods are broadly divided into five categories. These
are
Descriptive Statistics, Analytical Statistics, Inductive Statistics, Inferential
Statistics, Applied Statistics

Statistics is an indispensable tool of production control and market
research. Statistical tools are extensively used in business for time
motion study, consumer behaviour study, investment decisions,
measurements and compensations, credit ratings, inventory
accounting, quality control, distribution
1– 6
and
performance
management,
channel design, etc.
Cont….

Statistical analysis is a vital component in every aspect of research.
Social surveys, laboratory experiment, clinical trials, marketing research,
human resource planning, inventory management, quality management etc., require
statistical treatment before arriving at valid conclusions.

Functions of statistics are Condensation, Comparison, Forecast, Testing
of
hypotheses, Preciseness, Expectation.

Statistical techniques, because of their flexibility and economy, have
become popular and are used in numerous fields. But statistics is not
cure-all technique and has limitations. It cannot be applied to all
situations and cannot be made to answer all queries.
1– 7
kinds
a
of
Introduction to Business Statistics
S. No.
Reference
No.
1.
1– 8
Particulars
Slide From
– To
Learning Objectives
09 – 09
2.
Topic 1
Introduction
10 – 10
3.
Topic 2
Development of Statistics
11 – 11
4.
Topic 3
Definitions of Statistics
12 – 12
5.
Topic 4
Importance of Statistics
13 – 13
6.
Topic 5
Classification of Statistics
14 – 14
7.
Topic 6
Role of Statistics
15 – 15
8.
Topic 7
Functions of Statistics
16 – 16
9.
Topic 8
Limitations of Statistics
17 – 17
10.
Topic 9
Summary
18 – 21
Learning Objectives
After studying this chapter, you should be able to:

Understand the development, importance and role of statistics

Explain the basic concept of statistical studies

Understand the application of statistics in business and management

Learn about functions and limitations of statistics
1– 9
Introduction

Information derived from good statistical analysis is always precise and never useless.

One of the primary tasks of a manager is decision-making.

Statistical techniques offer powerful tools in the decision-making process.

These tools have power to interpret quantitative information in a scientific
an objective manner.
1– 10
and
Development of Statistics

The word statistics is derived from the Italian word ‘Stato’ which means ‘state’;
and
‘Statista’ refers to a person involved with the affairs of state.

Statistics originally was meant for collection of facts useful for affaires of the
state, like taxes, land records, population demography, etc.

During ancients times even before 300BC, the rulers and kings, like
Maurya used statistics to maintain the land and revenue
and registration of births and deaths.
1– 11
Chandragupta
records, collection of taxes
Definitions of Statistics

“Statistics are the classified facts representing the conditions of the people
the state…. specially those facts which can be stated in number or in
in
table
of
numbers or in any tabular or classified arrangement”.
– Webster

“By statistics we mean quantitative data affected to a marked extent by multiplicity of
causes”.
–Yule and Kendall

“Statistics may be defined as the science of collection, presentation, analysis
and
interpretation of data”.
– Croxton and Cowden
1– 12
Importance of Statistics

Identify what information or data is worth collecting,

Decide when and how judgments may be made on the basis of partial
information,
Measure the extent of doubt and risk associated with the use of partial
information
and

and stochastic processes.
1– 13
Classification of Statistics
1– 14
Role of Statistics
Role of
Statistics in
Business
Role of
Statistics
in Decision
Making
1– 15
Role of
Statistics in
Research
Functions of Statistics
Laws of Statistics
1– 16
Condensation
Comparison
Forecast
Testing of
Hypotheses
Preciseness
Expectation
Limitations of Statistics
COMMON STATISTICAL ISSUES
DISTRUST OF STATISTICS
MISUSE OF STATISTICS
1– 17
Summary

Managerial decision-making can be made efficient and effective by analyzing
available data using appropriate statistical tools. Statistical tools not only
application in research (marketing research included) but also in other
like quality management, inventory management, financial
have
functional areas
analysis, human resource
planning and so on.

The word statistics is derived from the Italian word ‘Stato’ which means ‘state’;
‘Statista’ refers to a person involved with the affairs of state. Thus, statistics
meant for collection of facts useful for affaires of the
state,
like
taxes,
and
originally
land
was
records,
population demography, etc.
Cont….
1– 18

Significant contribution has also been made by Indians in the field of
Prasant Chandra Mahalanobis, is the first to pioneer the study
India. He founded the Indian Statistical Institute (ISI)
as a tool in increasing the efficiency of
of
statistics.
statistical
Prof
science
in
in1931. Mahalanobis viewed statistics
all human efforts and also concentrated on sample
surveys.

Statistics is the classified facts representing the conditions of the people in
state…. specially those facts which can be stated in number or in table of
the
numbers or in
any tabular or classified arrangement.

Statistical methods are broadly divided into five categories. These are
Statistics, Analytical Statistics, Inductive Statistics, Inferential
Statistics
Descriptive
and
Applied
Statistics.
Cont….
1– 19

Statistics is an indispensable tool of production control and market research.
Statistical tools are extensively used in business for time and motion study,
consumer behaviour study, investment decisions, performance measurements
and
compensations, credit ratings, inventory management, accounting, quality control, distribution
channel design, etc.

Statistical analysis is a vital component in every aspect of research. Social
surveys,
laboratory
resource
planning,
experiment,
inventory
clinical
management,
trials,
quality
marketing
research,
management,
etc.,
human
require
statistical treatment before arriving at valid conclusions.

Functions
of
statistics
are
Condensation,
Comparison,
Forecast,
Testing
of
hypotheses, Preciseness and Expectation.
Cont….
1– 20

Statistical techniques, because of their flexibility and economy, have become
popular and are used in numerous fields. But statistics is not a cure-all
has limitations. It cannot be applied to all kinds of situations
technique
and
and cannot be made to
answer all queries.

More dangerous than distrust is misuse of statistics to draw convenient conclusions
satisfy selfish or ulterior motives. Arguments and analysis
charts, graphs, index numbers, etc. are indeed
can be used to intimidate opposing
1– 21
to
supported by facts, figures,
very appealing and convincing. They
views. Hence, statistics is open to manipulation.
Descriptive Statistics: Collection, Processing and
Presentation of Data
S. No.
Reference
No.
1.
1– 22
Particulars
Slide From
– To
Learning Objectives
23 – 23
2.
Topic 1
Introduction
24 – 24
3.
Topic 2
Descriptive and Inferential Statistics
25 – 26
4.
Topic 3
Collection of Data
27 – 27
5.
Topic 4
Editing and Coding of Data
28 – 28
6.
Topic 5
Classification of Data
29 – 29
7.
Topic 6
Tabulation of Data
30 – 30
8.
Topic 7
Diagrammatic and Graphical Presentation of Data
31 – 32
9.
Topic 8
Summary
33 – 36
Learning Objectives
After studying this chapter, you should be able to:

Describe descriptive and inferential statistics

Explain collection, editing and classification of primary and secondary data

Define tabulation and presentation of data

Understand diagrammatic and graphical presentation

Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and
Ogives
1– 23
Introduction

Success of any statistical investigation depends on the availability of
accurate and
reliable data.

These depend on the appropriateness of the method chosen for data

Data collection is a very basic activity in decision-making.

Data may be classified either as primary data or secondary data.

Successful use of the collected data depends to a great extent upon the
arranged, displayed and summarized.
1– 24
collection.
way
it
is
Descriptive and Inferential Statistics
Descriptive Statistics

Descriptive
statistics
is
the type of statistics that
probably
comes
to
most
of the minds of people
when they hear the word
“statistics.”
Cont….
1– 25
Inferential Statistics

Inferential statistics studies a
statistical
sample,
from this analysis we are
able
to say something about
the
population from which
the
sample came.
1– 26
and
Collection of Data
1– 27
Editing and Coding of Data
Coding of Data
Editing Primary Data

Completeness

Consistency

Coding
assigning

Accuracy

Homogeneity
is
the
some
process
symbols
of
either
alphabetical or numeral or both to
the
answers
so
that
the
responses can be recorded into
Editing Secondary Data

Field Editing

Central Editing
1– 28
a limited number of classes or
categories.
Classification of Data
Classification
grouping
refers
of
homogeneous
to
the
data
into
classes
and
2
1
Bases of
Rules of
Classification
Classification
categories. It is the process of
arranging things in groups or
classes
according
to
resemblances and affinities.
1– 29
their
3
Frequency
Distribution
Tabulation of Data

Tabulation
the
data
(two
format
is
in
flat
dimensional
by
table
arrays)
grouping
the
observations.

Table
is
with
rows
a
Types of
Tabulation
arranging
One – Way
Tabulation
Advantages
of
Tabulation
spreadsheet
and
columns
with headings and stubs
indicating
data.
1– 30
class
of
the
Two – Way
Tabulation
Multi – Way
Tabulation
Diagrammatic and Graphical Presentation of Data
Difference Between Diagrams And Graphs
Difference between Diagram and Graphs
Diagram
Graph
1. Can be drawn on an ordinary paper.
1. Can be drawn on a graph paper.
2. Easy to grasp.
2. Needs some effort to grasp.
3. Not capable of analytical treatment.
3. Capable of analytical treatment.
4. Can be used only for comparisons.
4. Can be used to represent a
mathematical relation.
5. Data are represented by bars, and
rectangles, pictures, etc.
5. Data are represented by lines
curves.
Cont….
1– 31
TYPES OF DIAGRAMS
BAR DIAGRAM
HISTOGRAM
PIE DIAGRAM
FREQUENCY POLYGON
OGIVES
1– 32
Summary

There are two major divisions of the field of statistics, namely descriptive and
inferential statistics. Both the segments of statistics are important, and
accomplish
different objectives.

Data can be obtained through primary source or secondary source according
need, situation, convenience, time, resources and availability. The most
for primary data collection is through questionnaire. Data
based so that it helps a decision-maker to arrive at

interested in. In other situations, data may constitute
important method
must be objective and fact-
a better decision.
Statistical data is a set of facts expressed in quantitative form. Data is
various methods. Sometimes our data set consists of the
to
collected through
entire population we are
a sample from some population.
Cont….
1– 33

Type of research, its purpose, conditions under which the data are obtained
determine the method of collecting the data. If relatively few items of
required quickly, and funds are limited telephonic interviews
are
will
information
are
recommended.
If
respondents are industrial clients Internet could also be used. If depth interviews and probing
techniques are to be used, it is necessary

to employ investigators to collect data.
The quality of information collected through the filling of a questionnaire
depends, to a large extent, upon the drafting of its questions. Hence, it is extremely
important that the questions be designed or drafted very carefully and in a tactful manner.

Before any processing of the data, editing and coding of data is necessary to
the correctness of data. In any research studies, the voluminous data
only after classification. Data can be presented through tables
ensure
can be handled
and charts.
Cont….
1– 34

Classification refers to the grouping of data into homogeneous classes and
categories. It is the process of arranging things in groups or classes according
to
their resemblances and affinities.

A frequency distribution is the principle tabular summary of either discrete
continuous data. The frequency distribution may show actual, relative
frequencies. Actual and relative frequencies may be charted as
or a frequency polygon. Two commonly used
or
data or
cumulative
either histogram (a bar chart)
graphs of cumulative frequencies are
less than ogive or more than ogive.

Once the raw data is collected, it needs to be summarized and presented to the
decision-maker in a form that is easy to comprehend. Tabulation not only
condenses the data, but also makes it easy to understand. Tabulation is the
way to extract information from the mass of data and hence popular
fastest
even among those
not exposed to the statistical method.
Cont….
1– 35

The charts help in grasping the data and analyze it qualitatively. This also
managers to effectively present the data as a part of reports. Various
bar diagram, multiple bar diagrams, component bar
helps
types of chart are
diagram, deviation bar diagram, sliding
bar diagram, Histogram and Pie charts.

A graphic presentation is another way of representing the statistical data in a
and intelligible form. There are two types of graphs which we have
graphs and ogives.
1– 36
discussed,
simple
line
Measures of Central Tendency
S. No.
Reference
No.
1.
1– 37
Particulars
Slide From
– To
Learning Objectives
38 – 38
2.
Topic 1
Introduction
39 – 39
3.
Topic 2
Characteristics of Central Tendency
40 – 41
4.
Topic 3
Arithmetic Mean
42 – 42
5.
Topic 4
Median
43 – 43
6.
Topic 5
Mode
44 – 44
7.
Topic 6
Empirical Relationship between Mean, Median and Mode
45 – 45
8.
Topic 7
Limitations of Central Tendency
46 – 46
9.
Topic 8
Summary
47 – 51
Learning Objectives
After studying this chapter, you should be able to:

Understand the concept and characteristics of central tendency

Describe all the measures of central tendency: mean, median and mode.

Explain merits and demerits of all measures of central tendency.

Discuss partition values or positional measures like quartiles, deciles and
percentiles.
1– 38
Introduction

The concept of central tendency plays a dominant role in the study of
statistics.

In many frequency distributions, the tabulated values show a distinct
tendency
to
cluster or to group around a typical central value.

This behaviour of the data to concentrate the values around a central part of
distribution is called ‘Central Tendency’ of the data.
1– 39
Characteristics of Central Tendency
A good measure of central tendency should possess as far as possible the following
characteristics:

Easy to understand.

Simple to compute.

Based on all observations.

Uniquely defined.

Possibility of further algebraic treatment.

Not unduly affected by extreme values.
Cont….
1– 40
Common Measures of Central Tendency
Mean
Median
1– 41
Mode
Arithmetic Mean

The arithmetic mean of
Properties of Arithmetic Mean
a series is the quotient
obtained
by
dividing
Calculation of Simple Arithmetic Mean
the sum of the values
by the number of items.
In
algebraic
language,
Merits and Demerits of Arithmetic Mean
if X1, X2, X3....... Xn
are the n values of a
variate X.
1– 42
Weighted Arithmetic Mean
Median
Median is the value, which divides the distribution of data, arranged in
ascending or descending order, into two equal parts. Thus, the ‘Median’ is a
value of the middle observation.
1– 43

Calculation of Median

Merits and Demerits of Median

Partition Values or Positional Measures

Quartiles

Deciles

Percentiles
Mode

Mode is the value
has the greatest
density. Mode is
1– 44
which
frequency
denoted by Z.

Calculation of Mode

Merits and Demerits of Mode

Graphic Location of Mode
Empirical Relationship between Mean, Median and Mode

A distribution in which the mean, the median, and the mode coincide is
known as symmetrical (bell shaped) distribution. Normal distribution
is one such a symmetric distribution, which is very commonly used.

If the distribution is skewed, the mean, the median and the mode are not equal. In a
moderately skewed distribution distance between the mean
approximately one third of the distance between the
expressed as:
Mean – Median = (Mean – Mode) / 3
Mode = 3 * Median – 2 * Mean
1– 45
and the median is
mean and the mode. This can be
Limitations of Central Tendency

In case of highly skewed data.

In case of uneven or irregular spread of the data.

In open end distributions.

When average growth or average speed is required.

When there are extreme values in the data.

Except in these cases AM is widely used in practice.
1– 46
Summary

Measures of the central tendency give one of the very important
of the data. According to the situation, one of the various
characteristics
measures
of
central
tendency may be chosen as the most representative.

Arithmetic mean is widely used and understood. What characterizes the three measures
of centrality, and what are the relative merits of each in the
given
situation,
is
the
question.

Mean summarizes all the information in the data. Mean can be visualized as
a
single point where all the mass (the weight) of the observations is concentrated. It is like a
centre of gravity in physics. Mean also has some
make it useful in the context of
desirable mathematical properties that
statistical inference.
Cont….
1– 47

To simplify the manual calculation, we may sometimes use shift of origin and
change of scale. Shifting of origin is achieved by adding or subtracting a
constant to all observations. In case of discrete data we add or subtract (usually
subtract) a constant to the individual observations. Whereas for
subtract (usually subtract) the constant to the class

grouped data, we add or
mark values.
There are cases where relative importance of the different items is not the
same. In such a case, we need to compute the weighted arithmetic mean.
procedure is similar to the grouped data calculations studied earlier,
The
when
we
consider frequency as a weight associated with the class-mark.

Median is the middle value when the data is arranged in order. The median
resistant to the extreme observations. Median is like the geometric centre
case we want to guard against the influence of a few outlying
is
in physics. In
observations
(called
outliers), we may use the median.
Cont….
1– 48

Quantiles are related positional measures of central tendency. These are useful
frequently employed measures. Most familiar quantiles are
Quartiles,
and
Deciles,
and
Percentiles.

Quartiles are position values similar to the Median. There are three
denoted by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile.
quartiles
The
second
quartile Q2 is nothing but the median. In a distribution, one fourth of the item are less then Q1
and the other ¾ th item are greater
then Q1 is called the upper quartile (or) the 3rd
quartile.

Inter-quartile range is defined as the difference between the first and third
quartile. It is a measure of spread of the data.

D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal
parts. One tenth of the items are less than or equal to D1. One tenth of the
items are more than or equal to D9 and one tenth of the items between any
successive pairs of deciles when all the items are in ascending order
1– 49
Cont….

Pth percentile of a group of observations is that observation below which lie
P%
(P percent) observations. The position of Pth percentile is given by
, where ‘n’ is the number of data points.

If the value of

The Mode of a data set is the value that occurs most frequently. There are
is a fraction, we need to interpolate the value.
many situations in which arithmetic mean and median fail to reveal the true
characteristics of a data (most representative figure), for example, most common size of
shoes, most common size of garments etc. In such cases,
mode is the best-suited
measure of the central tendency.

A distribution in which the mean, the median, and the mode coincide is
known as symmetrical (bell shaped) distribution. Normal distribution is
one
such a symmetric distribution, which is very commonly used.
Cont….
1– 50
This can be expressed as:


Mean – Median = (Mean – Mode) / 3

Mode = 3 * Median – 2 * Mean
No single average can be regarded as the best or most suitable under all circumstances.
Each average has its merits and demerits and its own
utility. A proper selection of an average
depends on the (1) nature of the data and (2)
purpose of enquiry or requirement of the data.
1– 51
particular field of importance and
Measures of Dispersion
S. No.
Reference
No.
1.
1– 52
Particulars
Slide From
– To
Learning Objectives
53 – 53
2.
Topic 1
Introduction
54 – 54
3.
Topic 2
Characteristics of Measures of Dispersion
55 – 55
4.
Topic 3
Absolute and Relative Measures of Dispersion
56 – 57
5.
Topic 4
Range
58 – 59
6.
Topic 5
Inter-quartile Range and Deviations
60 – 60
7.
Topic 6
Variance and Standard Deviation
61 – 62
8.
Topic 7
Summary
63 – 66
Learning Objectives
After studying this chapter, you should be able to:

Understand absolute and relative measures of variation

Learn about range and inter-quartile range

Discuss variance, standard deviation, mean deviation and coefficient of variation

Study the empirical relationship between different measures of variation
1– 53
Introduction

Data is useful:
A measure of dispersion

To compare the current results
or variation in any data
with the past results.
shows
the
extent
to

which
the
To compare two are more sets
numerical
of observations.
values
tend
to
spread

To suggest methods to control
about an average.
variation in the data.
1– 54
Characteristics of Measures of Dispersion
It should be simple to understand.
It should be easy to compute.
It should be rigidly defined.
It should be based on each individual item of the distribution.
It should be capable of further algebraic treatment.
It should have sampling stability.
It should not be unduly affected by the extreme items.
1– 55
Absolute and Relative Measures of Dispersion

‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a
measure of absolute dispersion to an appropriate average.

A precise measure of dispersion is one which gives the magnitude of the
variation in a series, i.e. it measures in numerical terms, the extent of the
scatter of the values around the average.
Cont….
1– 56
ABSOLUTE AND RELATIVE MEASURES OF DISPERSION
Measures of Dispersion
Relative Variability
The range
Relative range
The Quartile Deviation
Relative Quartile Deviation
The Mean Deviation
Relative Mean deviation
The Median Deviation
Coefficient of Variation
The Standard Deviation
Graphical Method
1– 57
Range
The ‘Range’ of the data is the difference
between the largest value of data and smallest
value of data.
Cont….
1– 58
Merits and Demerits of Range
Merits

Range is a simplest method of studying dispersion.

It takes lesser time to compute the ‘absolute’ and ‘relative’ range.
Demerits

Range does not take into account all the values of a series, i.e. it considers
only
the extreme items and middle items are not given any importance.

Range cannot be computed in the case of “open ends’ distribution i.e., a distribution
where the lower limit of the first group and upper limit of the
1– 59
higher group is not given.
Inter – Quartile Range and Deviations
Inter-quartile Range

Inter-quartile range is a difference between upper quartile (third quartile)
and
lower quartile (first quartile).
Quartile Deviation

Quartile Deviation is the average of the difference between upper quartile
and
lower quartile.
Mean Deviation

Mean deviation is the arithmetic mean of the absolute deviations of the values
their arithmetic mean or median or mode.
1– 60
about
Variance and Standard Deviation
Variance is defined as the average of squared
deviation of data points from their mean.
Cont….
1– 61
Different Formulae
for Calculating
Variance
Calculation
of Standard
Deviation
Properties
of Standard
Deviation
Merits and
Demerits of
Standard Deviation
Standard
Deviation of
Combined Means
Coefficient
of Variation
Empirical
Relationship Between
Different Measures of
Variation
1– 62
Summary

Study of distribution is very important for decision-making. Usually, measures
central tendency and variability are adequate for taking decision. However,
of
if data is quite
different from normal distribution then measure skewness and
kurtosis
need
considered. We discussed measures of variability: Range,
Variance
and
to
be
Standard
Deviation.

A measure of dispersion gives an idea about the extent of lack of uniformity in
sizes and qualities of the items in a series. It helps us to know the degree
consistency in the series. If the difference between items is
large
the
of uniformity and
the
dispersion
or
variation is large and vice versa.
Cont….
1– 63

The measures of dispersion can be either ‘absolute’ or ‘relative’. Absolute
measures of dispersion are expressed in the same units in which the original
are expressed. For example, if the series is expressed as Marks of the
particular subject; the absolute dispersion will provide the value
students
data
in
a
in Marks. The only difficulty
is that if two or more series are expressed in different units, the series cannot be compared on
the basis of dispersion.

The ‘Range’ of the data is the difference between the largest value of data and
smallest value of data. This is an absolute measure of variability. However, if
have to compare two sets of data, ‘Range’ may not give a true picture. In
such
we
case,
relative measure of range, called coefficient of range is used.

Inter-quartile range is a difference between upper quartile (third quartile) and
quartile (first quartile). Quartile Deviation is the average of the
difference
between
lower
upper
quartile and lower quartile.
Cont….
1– 64

Average used for calculating deviation can be the mean, the median or the
However, usually the mean is used. There is also an advantage of taking
the median, because ‘Mean Deviation’ from median is lowest
‘Mean Deviations’. Since absolute values of
calculating Mean Deviation, the mean
deviations
mode.
from
as compared to any other
deviations ignoring sign are taken for
deviation is not amenable to further algebraic
treatment.

The variance is the average squared deviation of the data from their mean.
sample data, we take the average by dividing with (n-1) where n is a
is to cater for degree of freedom. For population data, we
For
sample size. This
average by dividing with the
population size N.

The Standard Deviation (SD) of a set of data is the positive square root of the
variance of the set. This is also referred as Root Mean Square (RMS) value of
deviations of the data points. SD of sample is the square root of the sample
the
variance
Cont….
1– 65

There is no effect of shifting origin on standard deviation or variance.

The measures of deviation are very effective in making reports and
the business executives to present their data top general
public
presentations
by
who
not
do
understand statistical methods.

Variance analysis also helps in managing budgets by controlling budgeted
actual costs. Without the standard deviation, you can’t compare two
effectively.
1– 66
data
versus
sets
Skewness and Kurtosis
S. No.
Reference
No.
1.
1– 67
Particulars
Slide From
– To
Learning Objectives
68 – 68
2.
Topic 1
Introduction
69 – 70
3.
Topic 2
Karl Pearson’s Coefficient of Skewness (SKP)
71 – 71
4.
Topic 3
Bowley’s Coefficient of Skewness (SKB)
72 – 72
5.
Topic 4
Kelly’s Coefficient of Skewness (SKK)
73 – 73
6.
Topic 5
Measures of Kurtosis
74 – 74
7.
Topic 6
Moments
75 – 75
8.
Topic 7
Summary
76 – 79
Learning Objectives
After studying this chapter, you should be able to:

Understand the concept and different types of skewness

Discuss various measures of kurtosis

Learn about moments, its properties and coefficients based on moments
1– 68
Introduction
Skewness is a measure that studies the degree and direction of departure from symmetry.
Nature of Skewness
Skewness can be positive or negative or zero.
When the values of mean, median and mode are equal, there is no skewness.

When mean > median > mode, skewness will be positive.

When mean < median < mode, skewness will be negative.
Cont….
1– 69
Characteristic of a Good Measure of Skewness

It should be a pure number in the sense that its value should be independent
of
the unit of the series and also degree of variation in the series.

It should have zero-value, when the distribution is symmetrical.

It should have a meaningful scale of measurement so that we could easily
interpret the measured value.
Mathematical measures of skewness can be calculated by:

Karl-Pearson’s Method

Bowley’s Method

Kelly’s method
1– 70
Karl Pearson’s Coefficient of Skewness (SKP)
Karl Person has suggested two formulae:

Where the relationship of mean and mode is
established;

Where
the
relationship
median is not established.
1– 71
between
mean
and
Bowley’s Coefficient of Skewness (SKB)

Bowley’s method of skewness is based on the values of median, lower and
upper quartiles. This method suffers from the same limitations which are in
the
case of median and quartiles.

Wherever positional measures are given, skewness should be measured by
Bowley’s method. This method is also used in case of ‘open-end series’, where
importance of extreme values is ignored.
Absolute skewness = Q3 + Q1 – 2 Median
Coefficient of Skewness, (SkB) =
Where, Q is quartile.
1– 72
the
Kelly’s Coefficient of Skewness (SKK)
Kelly’s coefficient of skewness is defined as:
Skk =
Where, P is percentile.
Example: Calculate the Kelly’s coefficient of skewness from the following data:
1– 73
Measures of Kurtosis

Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more
more peaked will be the distribution. The kurtosis is calculated either as
an absolute or a
relative value. Absolute kurtosis is always a positive number.
Negative kurtosis indicates a flatter distribution than the normal
distribution, and called as platykurtic.
A positive kurtosis means more peaked curve, called Leptokurtic.
Peakedness of normal distribution is called Mesokurtic.
1– 74
and
Moments

The
arithmetic
mean
PROPERTIES OF MOMENTS
of various powers of
these
any
deviations
in
distribution
is
called the moments of
the
distribution
mean.
1– 75
about
COEFFICIENTS BASED ON MOMENTS
Summary

Measures of Skewness and Kurtosis, like measures of central tendency and
dispersion, study the characteristics of a frequency distribution. Averages tell
about the central value of the distribution and measures of dispersion tell
us
us
about
the
concentration of the items around a central value.

When two or more symmetrical distributions are compared, the difference in
is studied with ‘Kurtosis’. On the other hand, when two or more
compared, they will give different degrees of
exclusive i.e. the presence of
them
symmetrical distributions are
Skewness. These measures are mutually
skewness implies absence of kurtosis and vice-versa.
Cont….
1– 76

Bowley’s method of skewness is based on the values of median, lower and
quartiles. This method suffers from the same limitations which are in
and quartiles. Wherever positional measures are given, skewness
Bowley’s method. This method is also used in
should
upper
the case of median
be
measured
by
case of ‘open-end series’, where the
importance of extreme values is ignored.

Kelly’s coefficient of skewness is defined as:
Skk =
Where, P is percentile.
Cont….
1– 77

Kurtosis is a measure of peaked-ness of distribution. Larger the kurtosis, more
more peaked will be the distribution. The kurtosis is calculated either as
and
an absolute or a
relative value. Absolute kurtosis is always a positive number.
Absolute
kurtosis
of
normal distribution (symmetric bell shaped distribution)
is taken as 3. It is taken as
datum to calculate relative kurtosis as follows:
Absolute kurtosis =
Relative kurtosis = Absolute kurtosis – 3
Cont….
1– 78
a

Moments about mean are generally used in statistics. We use a Greek
mu for these moments. Consider a mass attached at each
frequency and take moments about the mean. First,
alphabet read as
point proportional to its
second, third and fourth moments can
be used as a measure of Central Tendency, Variation (dispersion), asymmetry and peakedness
of the curve.
1– 79
Correlation Analysis
S. No.
Reference
No.
1.
1– 80
Particulars
Slide From
– To
Learning Objectives
81 – 81
2.
Topic 1
Introduction
82 – 83
3.
Topic 2
Types of Correlation
84 – 84
4.
Topic 3
Methods of Calculating Correlation
85 – 85
5.
Topic 4
Scatter Diagram Method
86 – 86
6.
Topic 5
Co-variance Method – The Karl Pearson’s Correlation
Coefficient
87 – 88
7.
Topic 6
Rank Correlation Method
89 – 89
8.
Topic 7
Correlation Coefficient using Concurrent Deviation
90 – 91
9.
Topic 8
Summary
92 – 98
Learning Objectives
After studying this chapter, you should be able to:

Understand the concept of correlation

Study about different types of correlation

Describe various methods of calculating correlation such as scatter diagram
method

Discuss various types of correlation coefficients viz, Karl Pearson
coefficient, rank correlation and coefficient based on concurrent
1– 81
deviations.
correlation
Introduction
Croxton and Cowden say, “When
the relationship is of a quantitative
nature, the appropriate statistical
tool for discovering and measuring
the relationship and expressing it
in a brief formula is known as
correlation”.
Cont….
1– 82
The study of correlation helps managers in following ways:

To identify relationship of various factors and decision variables.

To estimate value of one variable for a given value of other if both are
estimating sales for a given advertising and promotion expenditure.

To understand economic behaviour and market forces.

To reduce uncertainty in decision-making to a large extent.
1– 83
correlated. E.g.
Types of Correlation
Positive or Negative Correlation
Simple or Multiple Correlations
Partial or Total Correlation
Linear and Non-linear Correlation
1– 84
Methods of Calculating Correlation
Scatter
Diagram Method
Karl Pearson’s
Coefficient of
Correlation
1– 85
Concurrent
Deviation
Method
Rank
Method
Scatter Diagram Method
The pattern of points
obtained by plotting the
observed points are knows
as scatter diagram.
It gives us two types of information.

Whether the variables are related
or
not.

If so, what kind of relationship or
estimating
equation
the relationship.
1– 86
that
describes
Co – Variance Method – The Karl Pearson’s Correlation Coefficient
The correlation coefficient measures the degree of association between two variables X and Y.
Karl Pearson’s formula for correlation coefficient is given as,
Where r is the ‘Correlation Coefficient’ or
‘Product Moment Correlation Coefficient’
between X and Y.
Cont….
1– 87
Assumptions Underlying Karl Pearson’s Correlation Coefficient
Interpretation of R
Estimation of Probable Error
1– 88
Rank Correlation Method
RANK CORRELATION WHEN RANKS ARE GIVEN
RANK CORRELATION WHEN RANKS ARE NOT GIVEN
RANK CORRELATION WHEN EQUAL RANKS ARE GIVEN
1– 89
Correlation Coefficient using Concurrent Deviation

This is the easiest method to find the correlation between two variables. Although
method is effective in giving the direction of the correlation as
to give the accurate strength of the correlation. In
each data series as increasing (+), or
the
positive or negative but fails
this method we check the fluctuation in
decreasing (-) or equal values. Then we count the
number of items that increase or decrease or remains equal concurrently and denote as c. The
correlation coefficient is then calculated as,
Where,
n = total number of pairs.
c = Number of concurrent changes
Cont….
1– 90
Example: The data of advertisement expenditure (X) and sales (Y) of a company for past
10 year period is given below. Determine the correlation coefficient between these
variables and comment the correlation.
1– 91
Summary

In this chapter the concept of correlation or the association between two variables
been discussed. A scatter plot of the variables may suggest that
related but the value of the Pearson correlation
the
coefficient
two
r
has
variables
quantifies
are
this
association.

Correlation is a degree of linear association between two random variables.
In
these two variables, we do not differentiate them as dependent and independent variables. It
may be the case that one is the cause and other is
an
effect
dependent variables respectively. On the other
hand,
both
i.e.
independent
may
be
and
dependent
variables on a third variable.
Cont….
1– 92

In business, correlation analysis often helps manager to take decisions by
estimating the effects of changing the values of the decision variables like
promotion, advertising, price, production processes, on the objective
costs, sales, market share, consumer satisfaction,
becomes more objective by removing

competitive
price.
parameters like
The
decision
subjectivity to certain extent.
The correlation coefficient r may assume values between –1 and 1. The sign
indicates whether the association is direct (+ve) or inverse (-ve). A numerical
value of r equal to unity indicates perfect association while a value of zero
indicates no association.
Cont….
1– 93

The correlation is said to be positive when the increase (decrease) in the value
variable is accompanied by an increase (decrease) in the value
of
other
of
variable
one
also.
Negative or inverse correlation refers to the movement of the variables in opposite direction.
Correlation is said to be negative, if an
accompanied by a decrease

increase (decrease) in the value of one variable is
(increase) in the value of other.
In simple correlation the variation is between only two variables under study
the variation is hardly influenced by any external factor. In other words,
variables remains same, there won’t be any change in other
and
if one of the
variable.
Cont….
1– 94

In case of multiple correlation analysis there are two approaches to study the
correlation. In case of partial correlation, we study variation of two variables
excluding the effects of other variables by keeping them under

controlled condition.
When the amount of change in one variable tends to keep a constant ratio to
amount of change in the other variable, then the correlation is said to be
amount of change in one variable does not bear a constant
change in the other variable then the correlation is
and
the
linear. But if the
ratio to the amount of
said to be non-linear.
Cont….
1– 95

Correlation analysis may also be necessary to eliminate a variable which
shows low or hardly any correlation with the variable of our interest. In statistics, there
are number of measures to describe degree of association
are Karl Pearson’s Correlation Coefficient, Spearman’s
coefficient of determination, Yule’s
coefficient
between variables. These
rank
of
correlation
association,
coefficient,
coefficient
of
colligation, etc.

The correlation coefficient measures the degree of association between two
variables X and Y.

Karl Pearson’s formula for correlation coefficient is given as,
Cont….
1– 96

The purpose of computing a correlation coefficient in such situations is to
determine the extent to which the two sets of ranking are in agreement. The
coefficient that is determined from these ranks is known as Spearman’s rank
coefficient, rs. This is defined by the following formula:
Cont….
1– 97

Although the concurrent deviation method is effective in giving the direction
of
the correlation as positive or negative but fails to give the accurate strength of the correlation.
In this method we check the fluctuation in each
decreasing (–) or equal values. Then we
decrease or remains equal
then calculated
Where,
count the number of items that increase or
concurrently and denote as c. The correlation coefficient is
as,
n = total number of pairs.
c = Number of concurrent changes
1– 98
data series as increasing (+), or
Regression Analysis
S. No.
Reference
No.
1.
1– 99
Particulars
Slide From
– To
Learning Objectives
100 – 100
2.
Topic 1
Introduction
101 – 101
3.
Topic 2
Regression Analysis
102 – 103
4.
Topic 3
Simple Linear Regression
104 – 106
5.
Topic 4
Coefficient of Regression
107 – 108
6.
Topic 5
Non-linear Regression Models
109 – 109
7.
Topic 6
Correlation Analysis vs Regression Analysis
110 – 110
8.
Topic 7
Summary
111 – 114
Learning Objectives
After studying this chapter, you should be able to:

Understand the concept of regression analysis

Discuss the applicability of regression

Describe simple linear regression and nonlinear regression model.

Learn about coefficient of regression and linear regression equations
1– 100
Introduction

In regression analysis we develop an equation called as an estimating
equation used
to relate known and unknown variables.

Then correlation analysis is used to determine the degree of the
relationship
between the variables.

In
this
chapter
mathematically.
1– 101
we
will
learn,
how
to
calculate
the
regression
line
Regression Analysis
According to Morris Myers Blair, “regression is the measure of the
average relationship between two or more variables in terms of the
original units of the data.”
Cont….
1– 102
Applicability of Regression Analysis

Regression
analysis
is
a
branch
of
statistical
theory which is widely used in all the scientific
disciplines. It is a basic technique for measuring or
estimating
the
relationship
among
economic
variables that constitute the essence of economic
theory and economic life.
1– 103
Simple Linear Regression
The

This
model
highest
bivariate
power
variables
of x is
is
used
we
have
i.e.
only
two
considered
and
the
distribution
are
if
‘best fit’ curve is approximated to a
called
as
straight line.
order of
the
model.
Cont….
1– 104
Simple Linear Regression Model

The linear regression model uses straight line relationship. Equation of a
straight line is of the form,
(1)

Where ŷ is the predicted value of Y corresponding to x.  and  are
if we assume the error (deviation) in Y direction is e, we
can
constants. Now
write
the
relationship of X and Y in data points as,

Error e is the amount by which observation will fall off regression line. Error e is due
to random error ‘a’ and ‘b’ are called parameters of the linear regression model whose values
are found out from the observed data.
1– 105
Cont….
Linear Regression Equation

Suppose the data points are (x1, y1) (x2, y2) ….. (xn, yn) . Then we can write
from
regression equation,
(2)
Thus, sum square of errors is,

1– 106
To have minimum sum of squares of errors (SSE) we must have the condition,
Coefficient of Regression
The coefficients of regression are bYX and bXY. They have following implications:

Slopes of regression lines of Y on X and X on Y viz. bYX and bXY must have
same signs (because r² cannot be negative).

Correlation coefficient is geometric mean of bYX and bXY.

If both slopes bYX and bXY are positive correlation coefficient r is positive. If
both bYX and bXY are negative the correlation coefficient r is negative.

If

Both regression lines intersect at point
indicating perfect correlation.
Cont….
1– 107
Properties of Regression Coefficients

The coefficient of correlation is the geometric mean of the two
regression coefficients.

Both the regression coefficients are either positive or
means that they always have identical sign i.e.,
negative.
It
either both have positive
sign or negative sign.

The coefficient of correlation and the regression coefficients will
also have same sign.

Regression coefficients are independent of the change in the origin but not
of the scale.
1– 108
Non – Linear Regression Models
Second Degree Model
Other Regression Models
Seasonal Model
Seasonal Model with Trend
Coefficient of Determination
1– 109
Correlation Analysis vs Regression Analysis

Degree and Nature of Relationship

Cause and Effect Relationship

Like in correlation, regression analysis can also be studied as ‘simple and
multiple’, ‘total and partial’, ‘linear and nonlinear’, etc.

In correlation, there is no distinction between independent and dependent
variables.
1– 110
Summary

In this chapter, the concept of regression between dependent and
variables has been discussed. Regression provides us a measure
facilitates to predict one variable for a value of

independent
of the relationship and also
other variable.
Unlike correlation analysis, in regression analysis, one variable is
and other dependent. Please note that this relationship need
not
independent
be
a
cause-effect
relationship.

Regression analysis is a branch of statistical theory which is widely used in
the scientific disciplines. It is a basic technique for measuring or
among economic variables that constitute the
all
estimating the relationship
essence of economic theory and
economic life. The uses of regression
analysis are not confined to economic and
business activities. Its applications
are extended to almost all the natural, physical
and social sciences.
Cont….
1– 111

Simple linear regression model is used if we have bivariate distribution i.e.
two variables are considered and the ‘best fit’ curve is approximated to
This describes the liner relationship between two variables.
too simplistic, in many business situations, it is
based on this model for any decision-

only
a straight line.
Although it appears to be
adequate. At least, initial study can be
making situation.
We have studied simple linear, non-linear and multiple regression models.
multiple regression and non-linear regression models, MS Excel or any
package would help in reducing voluminous calculations. We
of determination as a measure of the strength of
For
other computer
also discussed coefficient
relationship.
Cont….
1– 112

Least square principle can also be applied to the fitting of a second degree
polynomial which may be useful in business situation if we have some idea
that
the relationship between two variables is parabolic. In any case second
degree
polynomial fit is more likely to be better approximation of the actual
relationship. We
may use second order model (parabolic trend) if we feel that

the variation is parabolic.
The least square approximation can be calculated easily for low degree polynomials,
like linear, parabolic, cubic, etc. But for higher degrees (more
normal equations becomes ill conditioned. This
coefficients. Then the approximation
‘orthogonal polynomials’ are
than three), the system of
causes large errors in values of
becomes incorrect. To avoid these problems,
used for approximation.
Cont….
1– 113

Mean Square Error (MSE) is an estimate of the variance of the regression
error. MSE depends on the values of data and its scales. Hence we need a
measure that calculates relative degree of variation so that it can be
the fits obtained from different models and for different data
sets.
compared
for
Coefficient
of
determination is such a measure.

Coefficient of determination is a measure of the strength of the regression fit.
an estimator of population parameter of correlation and can be obtained
decomposition of variation in Y into two components, viz. due
to
It is
directly from a
error
and
due
to
regression. Error is a deviation of a data point from its respective group mean. Thus error is
the deviation of a data from its
1– 114
predicted values explained by the regression line.
Theory of Probability
S. No.
Reference
No.
1.
1– 115
Particulars
Slide From
– To
Learning Objectives
116 – 116
2.
Topic 1
Introduction
117 – 117
3.
Topic 2
Important Terms in Probability
118 – 119
4.
Topic 3
Kinds of Probability
120 – 120
5.
Topic 4
Simple Propositions of Probability
121 – 125
6.
Topic 5
Addition Theorem of Probability
126 – 127
7.
Topic 6
Multiplication Theorem of Probability
128 – 128
8.
Topic 7
Conditional Probability
129 – 129
9.
Topic 8
Law of Total Probability
130 – 131
10.
Topic 9
Independence of Events
132 – 132
11.
Topic 10
Combinatorial Concept
133 – 133
12.
Topic 11
Summary
134 – 134
Learning Objectives
After studying this chapter, you should be able to:

Understand the meaning and important terms of probability

Learn about addition theorem and multiplicative theorem of probability

Understand the concept of independence of events, combinatorial concepts
permutation and combination

Solve problems of conditional probability and Baye’s Theorem and other
concepts of probability
1– 116
like
Introduction

A probability is a quantitative measure of risk.

This chapter provides exposure to fundamental concepts, since probability
inseparable from statistical methods.
1– 117
is
Important Terms in Probability
Probability and sampling are inseparable parts of statistics.
Random Experiment
Random experiment is an experiment whose outcome is not
predictable in advance.
Cont….
1– 118
Sample Space
1– 119

Event

Event Space

Union of events

Intersection of events

Mutually exclusive events

Collectively exhaustive events

Complement of event
Kinds of Probability
Classical
Probability
Relative
Frequency
Axiomatic
Probability
Probability
Subjective
Probability
1– 120
Simple Propositions of Probability
Proposition 1
P (EC) = 1 – P (E)
Probability of compliment: Let even EC denote complement of the event E. Obviously by
definition of complement, EC has all elements from the sample space S that are not in E. Thus,
E and EC are mutually exclusive and collectively exhaustive. Therefore, by axiom 2 and 3 we
have,
1 = P(S) = P (E ∪ EC) = P (E) + P (EC)
or,
P (EC) = 1 - P (E)
Cont….
1– 121
Proposition 2
If E ⊂ F, then P (E) ≤ P (F)
If the event E is contained in event F, that is, then we can express,
F = E ∪ (EC ∩ F).
However, as events E and (EC ∩ F) are mutually exclusive, we get,
P (F) = P (E) + P (EC ∩ F)
But, by axiom 1, P (EC ∩ F) ≥ 0. Therefore, we have proved the proposition,
P (E) ≤ P (F)
Cont….
1– 122
Proposition 3
P (E ∪ F) = P (E) + P (F) – P (E ∩ F)
Probability of unions: Event E ∪ F can be written as the union of the two
disjoint events namely E and (EC ∩ F). Thus, from axiom 3,
P (E ∪ F) = P [E ∪ (EC ∩ F)] = P (E) + P (EC ∩ F) (1)
Also, F = (E ∩ F) ∪ (EC ∩ F), hence,
P (F) = P (E ∩ F) + P (EC ∩ F) (2)
From (1) and (2) we get the proposition 3 as,
P (E ∪ F) = P (E) + P (F) - P (E ∩ F)
Extended statement of this proposition for n events is also called as inclusionexclusion principle.
P(E ∪ F ∪ G) = P(E) + P(F) + P(G) – P(EF) – P(FG) – P(EG) + P(E∩F∩G)
Cont….
1– 123
Proposition 4
Mutually exclusive events: When the sets corresponding to two events are
disjoint (have no common elements, or the intersection is null), the two events
are
called mutually exclusive.
E ∩ F = Φ Therefore,
P (E ∩ F) = P (Φ) = 0
Also, for mutually exclusive events E and F,
P (E ∪ F) = P (E) + P (F)
Cont….
1– 124
Proposition 5
P (EC∩F) = P (F) – P (E∩F)
From set theory, F can be written as a union of two disjoint events E ∩ F and
F . Hence, by Axiom III, we have, P(F) = P(E ∩ F) + P(EC ∩ F). By re-
terms we get the result.
1– 125
arranging
EC ∩
the
Addition Theorem of Probability

The addition theorem in the probability concept is the process of
of the probability that either event ‘A’ or event ‘B’ occurs or
between two events ‘A’ and ‘B’ the addition is
determination
both occur. The notation
denoted as ‘∪’ and pronounced as
Union.
Let A and B be two events defined in a sample space. The union of events
A and B is the collection of all outcomes that belong either to A or to B or
to both A and B and is denoted by A or B.
Cont….
1– 126
The result of this addition theorem generally written using Set notation, P (A ∪ B) = P (A) + P
(B) – P (A ∩ B),
Where, P (A) = probability of occurrence of event ‘A’
P (B) = probability of occurrence of event ‘B’
P (A ∪ B) = probability of occurrence of event ‘A’ or event ‘B’.
P (A ∩ B) = probability of occurrence of event ‘A’ or event ‘B’.Addition theorem probability
can be defined and proved as follows: Let ‘A’ and ‘B’ are Subsets of a finite non empty set ‘S’
then according to the addition rule
P (A ∪ B) = P (A) + P (B) – P (A). P(B),
On dividing both sides by P(S), we get
P (A ∪ B) / P(S) = P (A) / P(S) + P (B) / P(S) – P (A ∩ B) / P(S) (1).
1– 127
Multiplication Theorem of Probability

Probability is the branch of mathematics which deals with the occurrence of
samples. The basic form of Multiplication theorems on probability for two
events
‘X’ and ‘Y’ can be stated as,
P (x. y) = p (x). P(x / y)

Here p (x) and p (y) are the probabilities of occurrences of events ‘x’ and ‘y’
respectively.
P (x / y) is the Conditional Probability of ‘x’ and the condition is that ‘y’ has
occurred before ‘x’.
P (x / y) is always calculated after ‘y’ has occurred. Here, occurrence of ‘x’
depends on ‘y’. ‘y’ has changed some events already. So, occurrence of ‘x’ also
changes.
1– 128
Conditional Probability

Conditional probability is the probability
that an event will occur given that
event has already occurred. If A and
events, then the
another
B
are
two
conditional probability of A
given B is written as P(A/B) and read as “the
probability of A given that B has already
occurred.”
1– 129
Law of Total Probability

Consider two events, E and F. Whatsoever be the events, we can
always say that the probability of E is equal to the probability of
intersection of E and F, plus, the probability of the intersection
of E and complement of F. That is,
P (E) = P (E ∩ F) + P (E ∩ F ∩ C)
1– 130
Bayes’s Formula
Let, E and F are events.
E = (E ∩ F) U (E ∩ F ∩ C)
For any element in E, must be either in both E and F or be in E but not in F. (E F) and (E
FC) are mutually exclusive, since former must be in F and latter must not in F, we have by
Axiom 3,
P (E) = (E F) + (E FC) = P(E/F) × P(F) +P(E/FC) × P(FC) = P(E/F) × P(F) + ()[1()]
1– 131
Independence of Events
1– 132
Combinatorial Concept
1
Product
Rule of
Counting
1– 133
2
Sum Rule
of Counting
3
4
Permutation
Combination
Summary

In this chapter, we discussed basic idea of probability. We defined probability
in
different ways and pointed out serious limitations of each definition.

Then we discussed axioms of probability, which are the backbone of theory
of
probability. Then we studied number of useful propositions of probability.

We also defined conditional probability, law of total probability, and Bayes’
Theorem. We also defined mutually exclusive events, and independence of
events.

Lastly, we discussed few important concepts of combinatorial analysis, which
comes very handy while calculating probability of an event.
1– 134
Probability Distribution
S. No.
Reference
No.
1.
1– 135
Particulars
Slide From
– To
Learning Objectives
136 – 136
2.
Topic 1
Introduction
137 – 137
3.
Topic 2
Random Variable
138 – 139
4.
Topic 3
Probability Distributions of Standard Random Variables
140 – 140
5.
Topic 4
Bernoulli Distribution
141 – 142
6.
Topic 5
Binomial Distribution
143 – 145
7.
Topic 6
Poisson Distribution
146 – 147
8.
Topic 7
Normal Distribution
148 – 149
9.
Topic 8
Summary
150 – 153
Learning Objectives
After studying this chapter, you should be able to:

Differentiate between discrete and continuous random variables

Discuss probability distributions of standard random variable

Understand discrete probability distribution which include Binomial and
Distribution

1– 136
Explain continuous probability distribution which includes Normal distribution
Poisson
Introduction

We will study a few common distributions in this chapter.

Normal distribution has extensive use in statistical tools and therefore
advised to study it in detail.

1– 137
Knowledge of sequences, series and calculus is expected.
readers
are
Random Variable
Arandom variable, usually writtenX, is a variable whose possible values are numerical
outcomes of a random phenomenon.
Cont….
1– 138
Discrete and Continuous Random Variables
Probability Mass Function (P.M.F.)
Probability Density Function
Cumulative Distribution Function
Expectation Value of Random Variables
Expected Value of a Function of a Random Variable
Variance and Standard Deviation of Random Variable
1– 139
Probability Distributions of Standard Random Variables
1
2
Bernoulli
Distribution
Normal
Distribution
Binomial
Distribution
Poisson
Distribution
4
1– 140
3
Bernoulli Distribution

It is a basis of many discrete
random variables, as it deals
with individual trial. It is a
building
block
random
variables.
for
It
other
is
a
single trial distribution.
Cont….
1– 141
Application of Bernoulli Distribution
Bernoulli trial is fundamental to many discrete distributions like Binomial,
Poisson, Geometric, etc. Situations where Bernoulli distribution is commonly
used are:
1– 142

Sex of newborn child; Male = 0, Female = 1 say.

Items produced by a machine are Defective or Non-defective.

During next flight an engine will fail or remain serviceable.

Student appearing for examination will pass or fail.
Binomial Distribution
A binomial random variable is the number of
successes x in n repeated trials of a binomial
experiment. The probability distribution of a
binomial random variable is called a binomial
distribution
(also
known
as
a
Bernoulli
distribution).
Cont….
1– 143
Applications of Binomial Distribution

Trials are finite (and not very large), performed repeatedly for ‘n’ times.

Each trial (random experiment) should be a Bernoulli trial, the one that results in either
success or failure.

Probability of success in any trial is ‘p’ and is constant for each trial.

All trials are independent.
Cont….
1– 144
Following are some of the real life examples of applications of binomial distribution.

Number of defective items in a lot of n items produced by a machine.

Number of male births out of n births in a hospital.

Number of correct answers in a multiple-choice test.

Number of seeds germinated in a row of n planted seeds.

Number of re-captured fish in a sample of n fishes.

Number of missiles hitting the targets out of n fired.
1– 145
Poisson Distribution
A random variable X, taking one of the values 0, 1, 2 … is said to
be a Poisson random variable with parameter λ, if for some λ > 0,
P(X = i) is a probability mass function (p.m.f.) of the Poisson random
variable. Its expected value and variance are,
m = E [X] = l
Var [X] = l
Cont….
1– 146
Some of the common examples where Poisson random variable can be used to
define the probability distribution are:

Number of accidents per day on expressway.

Number of earthquakes occurring over fixed time span.

Number of misprints on a page.

Number of arrivals of calls on telephone exchange per minute.

Number of interrupts per second on a server.
1– 147
Normal Distribution
Equation For Normal Probability Curve
Standard Normal Distribution
Properties Of Normal Distribution
Areas Under Standard Normal Probability
Curve
Importance Of Normal Distribution
Cont….
1– 148
Area under the Normal Curve
1– 149
Summary

Random variable is a real valued function defined over a sample space with
probability associated with it. The value of the random variable is outcome of
an
experiment. Random variables are neither ‘random’ nor ‘variable’.

In this chapter we discussed several important random variables, the
associated
formulae, and problem solving using formulae. A discrete random variable is the one that takes
at the most countable values. A continuous random variable can take any real value.
Cont….
1– 150

We
also
discussed
probability
distributions
of
random
variables.
Binomial
distribution is used if an experiment is carried out for finite number of n independent
trials; all trials being Bernoulli trials with constant probability of

success p.
Random variable will follow Poisson distribution if it is the number of occurrences of a
rare event during a finite period. Waiting time for a rare event
Negative binomial distribution is used if numbers
is exponentially distributed.
of Bernoulli trials are made to achieve
desired number of successes.
Cont….
1– 151

One of the continuous random variable required often is uniform random
variable. Waiting time for an event that occurs periodically follows uniform
distribution.

Normal probability distribution is the most important distribution in
defined normal distribution with parameters (μ, σ) where μ is
statistics.
We
mean and σ is standard
deviation.

Further, we defined standard normal distribution, which is a special case of
normal distribution with parameters (0, 1).
Cont….
1– 152

We also discussed transformation of normal random variable X to standard
random variable Z using xzms−= Z distribution is very convenient for manual
calculation as we can use standard normal tables which are extensively plotted, to find
probability and interval.

Normal distribution is used as a model in many real world situations, both as
continuous distribution or an approximation to discrete distributions like
Poisson.
1– 153
binomial
a
or
Use of Excel Software for Statistical Analysis
S. No.
Reference
No.
1.
1– 154
Particulars
Slide From
– To
Learning Objectives
155 – 155
2.
Topic 1
Introduction
156 – 157
3.
Topic 2
Introduction to Excel
158 – 168
4.
Topic 3
Entering Data in Excel
169 – 169
5.
Topic 4
Descriptive Statistics
170 – 172
6.
Topic 5
Basic Built-in Functions (Average, Mean, Mode,
Count, Max and Min)
173 – 177
7.
Topic 6
Statistical Analysis
178 – 182
8.
Topic 7
Normal Distribution
183 – 183
9.
Topic 8
Brief about SPSS
184 – 189
10.
Topic 9
Summary
190 – 194
Learning Objectives
After studying this chapter, you should be able to:

Understand the basic concepts of using Microsoft Excel

Discuss how to enter data in excel and basic built-in functions

Gain knowledge about SPSS
1– 155
Introduction
The most popular software in the MS Office Suite includes the following:

Microsoft Word

Microsoft Excel

Microsoft PowerPoint

Microsoft Access

Microsoft Project Plan

Microsoft Outlook
Cont….
1– 156
MICROSOFT OFFICE SUITE
1– 157
Suite Product
Home and
Student
Home and
Business
Professional
Word
2010
Included
Included
Included
Excel
2010
Included
Included
Included
PowerPoint
2010
Included
Included
Included
OneNote
2010
Included
Included
Included
Outlook
2010
-
Included
Included
Access
2010
-
-
Included
Publisher
2010
-
-
Included
Introduction to Excel
Opening A Document

Click on File-Open (Ctrl+O) to open/retrieve an existing workbook;
change
the
directory area or drive to look for files in other locations.

To create a new workbook, click on File-New-Blank Document.
Cont….
1– 158
Saving And Closing A Document

To save your document with its current filename, location and file format
either click on File - Save.

When you have finished working on a document you should close it. Go to
the File menu and click on Close.
Cont….
1– 159
Excel Screen
Menu Bar in Excel
Cont….
1– 160
Excel Screen
Cont….
1– 161
Workbooks and Worksheets
Cell
Row
Column
Spreadsheet
Workbook
Cont….
1– 162
Cell Name Box
Spreadsheet Tabs in Excel
Cont….
1– 163
Moving Around the Worksheet
Margins
Orientation
Paper Size
Print Area
Cont….
1– 164
Margin Options in Excel
Cont….
1– 165
Orientation Options in Excel
Cont….
1– 166
Print Area Selection
Cont….
1– 167
Moving between Cells

While working with any Office productivity tool, the clipboard functions are
invaluable.

The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’.

In the Microsoft Office suite, there are keyboard shortcuts for these
KEYBOARD SHORTCUTS
1– 168
Cut
Ctrl + X
Copy
Ctrl + C
Paste
Ctrl + V
functions.
Entering Data in Excel

A new worksheet is a grid of
Entering
Labels
Entering
Values
Rounding
Numbers that
Meet Specified
Criteria
Sorting by
Columns
rows and columns. The rows
are labeled with numbers, and
the columns are labeled
with
letters. Each intersection
of
row and a column is a cell.
1– 169
a
Descriptive Statistics

Excel includes elaborate and customisable toolbars, for example the
“standard”
toolbar shown here:

Some of the icons are useful mathematical computation:
is the “Autosum”
icon,
which enters the formula “=sum ()” to add up a range of cells.

is the “Function Wizard” icon, which gives you access to all the functions
available.
Cont….
1– 170

is the “Graph Wizard” icon, giving access to all graph types available,
as
shown in this display:
Cont….
1– 171
Excel can be used to generate measures of location and variability for a variable. Suppose we
wish to find descriptive statistics for a sample data: 2, 4, 6, and 8.

Step1: Select the Tools *pull-down menu, if you see data analysis, click on this
option,
otherwise, click on add-in.. option to install analysis tool pak.

Step 2: Click on the data analysis option.

Step 3: Choose Descriptive Statistics from Analysis Tools list.

Step 4: When the dialog box appears:
Enter A1:A4 in the input range box, A1is a value in column A and row 1; in
case this value is 2. Using the same technique enters other VALUES until
this
you reach the last
one.

Step 5: Select an output range, in this case B1. Click on summary statistics to
results.
Select OK.
1– 172
see the
Basic Built – in Functions (Average, Mean, Mode, Count, Max and
Min)
Manual Equation Entry
Cont….
1– 173
Arithmetic
Function, Syntax and Description
Functions in
Excel
Cont….
1– 174
SUM Function
The SUM function is probably the most commonly used function in Excel. It comes in three
flavours in Excel, namely:
1
2
SUMIF()
SUM()
SUMIFS()
3
Cont….
1– 175
Logical Functions
AND ()
FALSE
IF ()
TRUE
IFERROR ()
OR ()
NOT
Cont….
1– 176
Statistical Functions

Statistical functions are invaluable in any mathematical calculations.

They can provide insights into trends provide data for detailed analysis as
well as help identify gaps that need to be plugged.

Excel provides a wide range of functions that can be used to perform basic
statistical analyses.
1– 177
Statistical Analysis
Creating Charts

Select the data range (only numbers) for which the chart needs to be created.

Under the Insert Ribbon, in the Chart section, click on the type of chart you
want
to create and the category. Here the clustered chart has been used.

Select the chart and click on Select Data button in Data section of the Design
Layout.

In the Select Data Source dialog, select ‘Series 1’ and click on Edit button.
Cont….
1– 178
Select Data Source
1– 179
Cont….
This opens the Edit Series dialog that allows you to change the range of values in series and
provide a Series name. For the series name,
click on icon to select the column title of Series
1.
Edit Series
Cont….
1– 180
Histogram
Now follow the steps given below to draw histogram.

Select the first two columns i.e. class interval and frequency in the Excel

Click on ‘Chart Wizard’ icon on tool bar or select from menu [Insert → Chart…..]
insert drop down menu. A dialogue box with title ‘Chart
sheet.
From
Wizard – Step 1 to 4 – Chart
type’ will appear.

In the menu ‘Standard Type’, select ‘Column’. Click on ‘Next’ button.

Now the next menu with title ‘Chart Wizard – Step 2 to 4 – Chart Source Data’
appear. Since we have already selected the source data, select ‘Next’. Don’t
will
forget to check
that column is selected in data series.

Now the next menu with title ‘Chart Wizard – Step 3 to 4 – Chart Options’ will
appear.
Cont….
1– 181
Correlation Plot and Regression Analysis
Using MS Excel for calculating Karl Pearson’s correlation coefficient Calculating Karl
Pearson’s correlation coefficient using MS Excel is very simple. The steps are as follows:

Open an Excel worksheet and enter the data values of X and Y variables as
two
arrays (columns or rows). Keep these contiguous if possible.

Select the cell where you want to store the result r. Enter the formula with
as,
‘=CORREL (array1, array2)’
‘array1’ is a cell range of values and ‘array2’ is a second cell range of values.
1– 182
syntax
Normal Distribution
NORMDIST returns the normal distribution for the specified mean and standard deviation. This
function has a very wide range of applications in statistics, including hypothesis testing.
Syntax: NORMDIST(x,mean,standard_dev,cumulative)

X is the value for which you want the distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.
1– 183
Brief about SPSS
SPSS Statistics is a software package used for statistical analysis.
SPSS Files

SPSS uses several types of files. First, there is the file that contains data view
variable view. These have been entered using SPSS Data Editor
and
Window. It is known as an
SPSS system file.
Cont….
1– 184
SPSS Data Editor Window – Data View
Cont….
1– 185
Data Editor Window – Variable View
Cont….
1– 186
Define Variable Dialog Box
Student Motivation
Not willing
Undecided
Willing
Cont….
1– 187
Value Labels – Dialog Box
Value Labels Coded with Value and Value Label
Cont….
1– 188
SPSS Data Editor Window with all Record Entered
1– 189
Summary

Microsoft office is one of the most powerful office productivity tools in the
market
today. The entire suite is vast and covers a wide range of software solutions catering to various
aspects of modern businesses.

Microsoft excel is a powerful accounting and calculation solution. It has a
standard tabular layout and it supports a wide range of arithmetic, accounting
and
statistical functions.

The Microsoft Outlook is the mail client that can be set up to download mails
mail server as well as send and receive emails as desired. Being a part
Office suite, this tool is compatible with other applications in
from a
of the Microsoft
the suite.
Cont….
1– 190

One of the most popular and widely used Microsoft Office Suites is the MS
2003. Later Microsoft released two other versions of Office, namely
Office 2010. Although Office 2010 is the latest version, many
use Office 2003. From Office 2003 to Office 2007,
Office
Office 2007 and
businesses still continue to
Microsoft radicalised the overall look
and feel of the office suite.

Excel is built on the concept of cell, rows, columns, spreadsheets and
workbooks.
The
entire structure is hierarchical, and this allows it to be scalable and versatile enough to adapt to
varying needs for users from
different
specialisations.
Understanding
concepts is pretty useful
in developing complex reports and models.
the
following
Cont….
1– 191

As long as you work on the soft copies, page layouts are not really important –
can scroll a spreadsheet to view the contents. However, when it comes to
important that one gets the page layouts sorted out. Excel 2010
printouts
you
it
is
has all the page layout
options under Page Layout menu item.

While working with any Office productivity tool, the clipboard functions are
invaluable. The most common clipboard functions are ‘Cut’, ‘Copy’ and ‘Paste’.
Microsoft Office suite, there are keyboard shortcuts for these functions.
conversant with the Excel functions, you would prefer to use
In the
Once you become
the keyboard shortcuts as
they are faster and easier to use than the mouse.
Cont….
1– 192

A new worksheet is a grid of rows and columns. The rows are labelled with
numbers, and the columns are labelled with letters. Each intersection of a row
column is a cell. Each cell has an address, which are the column letter
number. The arrow on the worksheet to the right points to cell A1, which
highlighted, indicating that it is an active cell. A cell must
be
and
and a
the
is
active
row
currently
to
enter
information into it.

Excel is a very powerful accounting tool, but before going to the real complex
functions, let us sees how to use Excel for simple calculations. There are two
of using Excel for simple calculations: you can enter the actual
cell or use pre-defined Excel formulas to do the
ways
arithmetic equations in the
same.
Cont….
1– 193

Statistical calculations for exponential random variables could be calculated
statistical functions available in MS Excel. NORMDIST returns the
distribution for the specified mean and standard deviation. This
range of applications in statistics, including hypothesis
using
normal
function has a very wide
testing.
Syntax:
NORMDIST(x,mean,standard_dev,cumulative)

SPSS Statistics is a software package used for statistical analysis. Long produced
SPSS Inc., it was acquired by IBM in 2009. The current versions
IBM SPSS Statistics. Companion products in the
by
(2014) are officially named
same family are used for survey
authoring and deployment (IBM SPSS Data Collection), data mining(IBM SPSS Modeler), text
analytics, and collaboration
1– 194
and deployment (batch and automated scoring services).
1– 195