Download BSc Chemistry - e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of trigonometry wikipedia , lookup

Vincent's theorem wikipedia , lookup

Georg Cantor's first set theory article wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

Elementary mathematics wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Four color theorem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

History of statistics wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Theorem wikipedia , lookup

Tweedie distribution wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
____________________________________________________________________________________________________
Subject
BUSINESS ECONOMICS
Paper No and Title
2, Applied Business Statistics
Module No and Title
9,Chebyshev’s Inequality Theorem Law of Large Numbers
and Central Limit Theorem
Module Tag
BSE_P2_M9
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
TABLE OF CONTENTS
1.Learning Outcomes
2.Introduction
3.Chebyshev’s Theorem
3.1 Some important statistic
3.2 Relation between mean and standard deviation
3.3 Statement of the theorem
3.4 Area under the curve according to Chebyshev’s theorem and Normal
Distribution
3.5 Examples
4. Law of Large Numbers
4.1 Defining expectation and variance of sum
4.2 Statement of the theorem
4.3 Weak law of large numbers and strong law of large numbers
4.4 Illustration
5. Central Limit Theorem
5.1 Statement of the theorem
5.2 Size of sample
5.3 Significance of the theorem
6.
Summary
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
1. Learning Outcomes
You shall learn about some commonly used statistic in probability –
 Relationship between mean of a sample, its variance and sample size.
 You will learn three fundamental theorems which are widely used in obtaining probability of
occurrence of an event. These theorems are Chebyshev’s theorem, Law of large numbers and
Central limit theorem.
 You shall also learn their significance through illustrations.
2. Introduction
In earlier modules you have studied discrete and the continuous probability distributions. To
recapitulate Bernoulli’s variable and its distribution, Binomial distribution and Poisson
distribution are discrete distributions whereas Normal and Exponential distributions are
continuous distributions. You also know the characteristics of each of these distributions; that is
their mean and variance. In this module we will study the relationship between mean, variance
and the sample size in a group of data. We will also study how mathematicians of fifteen sixteen
centuries in Europe and Russia established the importance of large numbers or bigger sample in
the study of probability.
3. Chebyshev’s Theorem
3.1 Some important statistic
We know that Mean or the Expectation measures the concentration of random variables and
Variance tells about the dispersion or the variability of random values around its mean. Further
measure of variability in original units is more convenient in applications rather than having in
their square values. This led to the introduction of Standard Deviation in statistics which is the
positive square root of variance.
3.2 Relation between mean and standard deviation
A small standard deviation tells that there is a small deviation of sample random values from its
mean value. When looking in probability terms area under the curve of a continuous distribution
with large value of standard deviation will indicate a greater variability from its mean and
therefore, the area under the curve will be spread out. However a smaller value of standard
deviation will have most of its area close to its mean. Same is true for discrete probability
histogram. With this knowledge can we answer problems like what is the probability that a
variable X lies in a specified interval centered around its mean value or how wide an interval is
required around its mean so that 95% of the total values are included in the interval. To answer
such questions it is necessary to learn more about the relationship between mean and standard
deviation of the probability distribution.
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
3.3 Statement of the Theorem
In nineteenth century a Russian mathematician P. L. Chebyshev (1821 – 1894) showed that the
fraction of the area between any two values symmetric around its mean is related to its standard
deviation. Also as area under the probability distribution curve or in a probability histogram adds
to one therefore area between any two points will be the probability value between these two
points. The probability that any random variable which lies within a given interval can be
measured through the area that lies under it density curve or graph of f(x) within the specified
interval. Chebyshev gave estimate of the probability that a random variable can assume a value
within ‘± k’ standard deviations from its mean for any real number ‘k’. This remarkable theorem
due to Chebyshev is popularly known as Chebyshev’s Inequality Theorem. This theorem is
stated as: If X a random variable with finite mean or expected value µ and finite standard
deviation σ then the probability that X assumes a value outside the closed interval (µ - k σ, µ + k
σ) is never more then
. That is:
P (|X- µ| > k σ) ≤
-------------- (1)
In other words, the theorem asserts that the probability that a random variable X will deviate
absolutely from its mean by ‘kσ’ will always be less than or equal to
.
For example;
probability for any value of a random variable to deviate from its mean by 2σ will at most be ¼ =
0.25. In other words, no matter what the shape of the distribution is at most 25% of the values
will fall outside (± 2σ) from the mean of the distribution.
An alternative way to express the Chebyshev’s inequality theorem is:
P (|X- µ| ≤ k σ) ≥
At least
----------- (2)
) is the probability that any random variable X will lie within (± k σ) distance
from its mean. Note that equations (1) & (2) are same statements. This is true for both normally
distributed data as well as for data which is not normal or skewed.
3.4 Area under the curve according to Chebyshev’s theorem and Normal
Distribution
According to the Chebyshev’s theorem 75% of all the data points of a group of data falls within ±
2σ irrespective of the shape of the distribution. That is;
If k = 2 then
=
=
= 0.75
When data is normally distributed we all know that 95% of the data points lie within the interval
(µ ± 2σ). Further according to the Chebyshev’s theorem at least 89% of data points lie within
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
three standard deviations of the mean, whereas for normal distribution 99.7 % of data points lie
around its mean value.
This difference is due to the fact that Chebyshev’s theorem is applicable to all types of data
whether it is skewed, multi polar or normal. Figure 1 given below explains application of
Chebyshev’s theorem for two and three standard deviations when data is not normally distributed
and figure 2 explains for two and three standard deviations when data is normally distributed.
Figure 1
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
Figure 2
Since the Chebyshev’s theorem is relegated to situations where form of distribution do not matter
or is unknown therefore its results are usually weak.
Further, when K< 1 then
> 1 and 1-
For example: If k = 0.5 then
then
=
=
= 0.25 and 1 -
=
< 1.
= 4 and 1 = 1-
= 1-
= 1 – 4 = - 3 and when k = 2
= 1 – 0.25 = 0.75
Since we know that probability of occurrence of any event ranges from 0 to 1 therefore the
Chebyshev’s inequality theorem is applicable only when ‘k’ is greater than 1 and is trivial for
values of ‘k’ less than 1.
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
Examples:
Example 1:
Average age of a professional employed by a software company is 28 years with standard
deviation of 5 years. A histogram of age of the professionals employed with this company reveals
that the data are not normally distributed and is rather skewed towards 20 and a few employed are
towards 40. Apply Chebyshev’s theorem to determine within what range of age at least 85% of
the professionals’ employed ages would fall.
Solution:
According to Chebyshev’s theorem at least 1 (µ ± kσ). Therefore 1 -
proportion of the values are within the interval
= 0.85
Solving for k gives
= 0.15 or
= 6.667
or
k = 2.58
Therefore 85% of the values lie within ± 2.58σ of the mean.
Given µ = 28 and σ = 5, therefore 85% of the values will be within the age group of 28 ± 2.58(5)
= 28 ± 12.9 or between 15.1 and 40.9 years of age.
Example 2:
What is the smallest number of standard deviations from the mean that we must go in for if we
want to ensure that we have at least 50% of the data of the distribution?
Solution:
Here we have to use Chebyshev’s inequality and work backward. We want 50% of the data of the
distribution which is 0.50 = 1/2 = 1 – 1/k2. Solving for ‘k’,
we get 1/2 = 1/k2. By cross
multiplying we get 2 = k2. Taking the square root of both sides and since k is number of standard
deviations, ignoring the negative solution to the equation, this shows that k is equal to the square
root of 2 which is equal to 1.4. Therefore at least 50% of the data will be within approximately
1.4 standard deviations from the mean.
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
4. Law of large Numbers
4.1 Defining expectation and variance of sum
Before defining the law of large numbers recall that if Xi where i =1, 2, 3, 4, --, n are random
variables then their sum S = X 1 + X 2 + …………. X n, is also a random variable. It follows from
this that if Xi is identically distributed then they must possess a common mean µ and a common
variance σ2. Then the expectation of S; E(S) equals to the sum of expectation of all the individual
random variable. That is;
E(S) = E(X1) + E(X2) + ---- + E (Xn) = nµ.
Further, if
Xi are random variables and are also independent, then additive property of
Expectation also hold good for the variance of S. This implies that V(S) is the sum of the variance
of individual Xi. That is;
V(S) = V(X1) + V(X2) + ---- + V (Xn) = nσ2.
4.2 Statement of the theorem
The first general formulation of Law of large numbers was given by Jacques Bernoulli and was
published posthumously in 1713. Bernoulli proved the law of large numbers which is now one of
the fundamental theorems in probability, statistics and actuarial science. Bernoulli defined Law of
large numbers as if ‘s’ represents the number of successes in ‘n’ identical Bernoulli trials with
probability of success ‘p’ in each trial then the relative frequency ‘ s/n’ is very likely to be close
to ‘p’ when ‘n’ is sufficiently large and fixed integer. That is;
P (| (s/n) – p|) < δ) → 1 as n → ∞
P (| (s/n) – p|) > δ) → 0 as n → ∞,
or
where δ > 0.
Above are the different ways of expressing the theorem.
The mathematical version of Law of large numbers was first published by Khintchine in 1929.
This states that if Xi , i = 1,2,-------, n are independent and identically distributed random
variables and if E(Xi) = µ exists then when ‘n’ becomes very large integer, the probability that
random variable mean; ‘s/n’ will differ from the common expectation µ of Xi by more than any
arbitrarily prescribed small difference ε is very close to zero.
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
Mathematically it is written as P (| - µ| > ε) → 0 as n → ∞ and alternatively
P (| - µ| < ε) → 1 as n → ∞. This simply tells that when ‘n’ is very large integer then the
probability is very close to 1 for
being close to µ or
converges to µ. The larger the sample
size the closer the sample mean to the population mean.
4.3 Weak law of large numbers and strong law of large numbers
The Bernoulli’s law of large numbers is also known as the ‘weak law of large numbers’ as it does
not say that when n → ∞ mean or s/n or the E(X) gets stabilized at µ or ‘p’. It leaves open that
(| - µ| > ε) can happen a number of times although at infrequent intervals. The strong law of
large numbers states that when n → ∞ mean;
or s/n or the E(X) converges surely to µ or ‘p’.
That is P ( = µ) = 1 as n → ∞. In other words; when n → ∞ sample mean will converge to the
population mean µ with probability 1. Thus law of large numbers are said to be strong because it
guarantees stable long term results of averages of random events. This is certainly a strong
mathematical statement.
4.4 Illustration
Bernoulli may have thought that this concept of large numbers is self-evident, but it
nevertheless is striking. For illustration consider the game of golf. One cannot say with
any certainty at all where one particular stroke of golf ball will end up. Still can say with
very high accuracy that if 1000 balls are stroked then how they will end up. Better yet,
the more balls are stroked the better the prediction will be. The law of large numbers is a
powerful tool for taming randomness.
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
5. Central Limit Theorem
5.1 Statement of the Theorem
The central limit theorem and the law of large numbers are two fundamental theorems of
probability. The central limit theorem was introduced by De Moivre in the early eighteenth
century. This theorem has been expressed in many forms. One important version treats it in terms
of sum of identically distributed independent variables. According to this version the central
limit theorem states that distribution of the sum or averages of a large number of independent and
identically distributed variables will be approximately normal regardless of the underlying
distribution. The importance of this theorem lies in the fact that it virtually requires no conditions
on the probability distributions of the individual random variable being summed. Thus it
furnishes a practical method of computing approximate probability values associated with the
sums of arbitrary but independently distributed random variables.
The central limit theorem is also stated in terms of relationship between the shape of the
population distribution, shape of the sampling distribution of means and the size of the sample.
The theorem says that if the sample size is large enough and the sampling is done from a
population with unknown distribution either finite or infinite then the shape of the sampling
distribution of mean will be approximately normal with mean µ and standard deviation
.
5.2 Size of sample
Now question arises how large the sample size should be for a sampling distribution to be normal.
The answer depends on two main factors; first the degree of accuracy required for drawing
inferences about the population under study. Second size of the sample also depends upon the
shape of the underlying population. If the population under study is very close to the normal
distribution then fewer sample points will be required. By and large statisticians believe that the
sample size of 30 is large enough if the population distribution is roughly bell shaped. But if the
population distribution is highly skewed or having multiple peaks then the sample size should be
much larger and how much larger will again depend upon the accuracy required in the results.
5.3 Significance of the theorem
The significance of the central limit theorem is that it permits to use sample statistic to make
inference about population parameters without having knowledge about the shape of the
frequency distribution of that population. Its importance also stems from the fact that in many real
applications random variables of interest are sum of a large number of independent random
variables where central limit theorem justifies the use of the normal distribution. For instance in
finance percentage changes in the prices of some assets or consumer durable goods are modelled
by normal variables. Central limit theorem is also very useful in the calculation of probabilities
in the sense that it simplifies the calculations to a very large extent by allowing to use normal
probabilities.
BUSINESS
ECONOMICS
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM
____________________________________________________________________________________________________
6. Summary
 We learned about three very fundamental and widely applied theorems of probability namely
the chebyshev’s theorem, law of large numbers and the central limit theorem.
 The chebyshev’s theorem establishes the relationship between mean and the standard
deviation of a probability distribution. It holds true for both normally and non- normally
distributed data.
 According to the chebyshev’s theorem 75% of the data points of a sample falls within ± 2σ of
its mean irrespective of the shape of the distribution and 89% of data points lie within ± 3σ of
the mean whereas for normal distribution 95 % of data points lie within an interval (µ ± 2σ)
and 99.7% of the data points lie within the interval (µ ± 3σ).
 Since the probability of occurrence of an event lies between 0and 1 therefore the
Chebyshev’s theorem is trivial for number of standard deviation; ‘k’ less than 1.
 The law of large numbers is a statistical concept that relates to probability. It is a law which is
largely used by insurance companies to determine their premium rates and gamblers to find
the probability of their winning the game to make profits. Thus law of large numbers is a
powerful tool for taming randomness of the occurrence of an event.
 Statisticians are of the view that when the population is approximately bell-shaped then a
sample of size 30 is large enough to approximate normal distribution but if the population is
non normal having multiple peaks or is skewed then the sample size should be much larger.
Largely it would depend upon the requirement of accuracy of the results.
 The central limit theorem gave an amazing result that when sample of large size is drown
from a population with unknown distribution either finite or infinite then the shape of the
sampling distribution of mean will be approximately normal with mean µ and standard
deviation
BUSINESS
ECONOMICS
.
PAPER NO. : 2, APPLIED BUSINESS STATISTICS
MODULE NO. : 9, CHEBYSHEV’S INEQUALITY THEOREM LAW OF
LARGE NUMBERS AND CENTRAL LIMIT THEOREM