Download Chapter 1 - City University of Hong Kong

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 1: Basic Concepts in Probability and Statistics
Page 1
Department of Mathematics
Faculty of Science and Engineering
City University of Hong Kong
MA 3518: Applied Statistics
Chapter 1: Basic Concepts in Probability and Statistics
In this chapter, we will discuss some basic concepts and techniques
in probability theory and statistical inference. The materials in this
chapter are selective. Students are encouraged to read the first two
chapters of the representative monograph by Bickel and Doksum
(1977) “Mathematical Statistics: Basic Ideas and Selected Topics”
for a more comprehensive review. Topics covered in this chapter
are listed as follows:
Section 1.1: Introductory Statistics
Section 1.2: A Snapshot of Statistical Inference
Section 1.3: Basic Probability Theory
Let’s start our revision!
Chapter 1: Basic Concepts in Probability and Statistics
Page 2
Section 1.1: Introductory Statistics
1. Question: What is Statistics?
Informally speaking, statistics is considered the science of
studying data. It concerns:
 Collection of data
 Presentation and summarization of data
 Analysis of data
 Interpretation and conclusion
2. The role of real-life data:
 Numerical representation of observable information in reallife
 An important source of information for understanding the
unknown nature of the underlying mechanism that generates
them
3. Applications of Statistics:
 Biological Science: Population dynamic of organism, Lifecycle of organism
 Medical Science: Diagnostic aid, Testing of New Drugs,
Clinical Trials
 Engineering Science: Quality Control, Reliability
 Social Science: Demography, Sample Survey, Census
Chapter 1: Basic Concepts in Probability and Statistics
Page 3
 Business and Management: Forecasting sales and profit figures,
Marketing Research
 Economics and Finance: Analysis and forecasting Economic
Indicators and asset price dynamic
4. Descriptive statistics analysis:
 Collect, present and summarize data
 Extract useful information on some distributional
characteristics of a given set of data
(a) Central tendency
(b) Dispersion
(c) Skewness
(d) Tail behavior
5. Statistical inference:
 Understand the unknown underlying mechanism that
generates the observed data based on the information
from the observed data
 Estimate the unknown parameters in a statistical model that
describes the data set
 Test hypotheses on the unknown parameters
Chapter 1: Basic Concepts in Probability and Statistics
Page 4
6. Population: An entire collection of data from the measurement
of all the subjects to be investigated
 Example: The annual income of all males from the age
group 25-30 in Hong Kong
7. Sample: A set of observations obtained from the population by
 Direct observations or measurement:
Examples: Stock prices, Weights, Heights and Temperature
 Conducting experiment:
Examples: Survey results, Opinion poll and Rates of
chemical reactions
 Simulation:
Examples: Mark Six, Random numbers generated by
computer
8. Random or Probability Sampling:
 The most common way to obtain a sample
 Has widely been used in practice, for example, the Census
and Statistics Department at HKSAR
Chapter 1: Basic Concepts in Probability and Statistics
Page 5
9. Random sampling from a finite population:
 Select individuals at random (i.e. Each individual has equal
probability of being chosen) either with or without
replacement from the finite population
 Sampling with replacement: Obtain a set of independent
observations, namely a random sample
 Sampling without replacement from a finite but large
population: Obtain a set of approximately independent
observations which also calls a random sample in
convention
10. Random sampling from an infinite population:
 Identify the probability distribution for the infinite
population
 Generate or draw random numbers from the distribution
 The generated random numbers are called random
samples
11. Sample size: The number of observations in a random sample
12. Advantages of random sampling:
 Likely to draw representative observations from the
population
 Easy to deal with using standard mathematical and
probabilistic methods
Chapter 1: Basic Concepts in Probability and Statistics
Page 6
13. Other sampling methods:
 Stratified sampling:
(a) Divide the whole population into mutually exclusive
subgroups, called strata, according to some specified
criteria or certain characteristics of the population
(b) Select individuals randomly from each stratum
 Cluster sampling:
(a) Single individuals are selected from the population in
random sampling and stratified sampling
(b) Samples are selected randomly in groups or clusters in
cluster sampling
(c) Save time and costs when the population is very
dispersed
 Multi-stage sampling:
(a) Combine cluster sampling and stratified sampling
(b) Select random clusters
(c) Divide the clusters into strata
(d) Select individuals randomly from the strata
Chapter 1: Basic Concepts in Probability and Statistics
Page 7
14. Review of basic descriptive statistics:
 Measure of central tendency of ungrouped data:
Given a set of sample data or observations {y1, y2, …, yn}
with sample size being n
(a) Mean:
my =
1
n
n
 yi
i 1
(b) Median:
mD = y((n+1)/2) if n is odd
(y(n/2) + y(n/2 +1))/2 if n is even
(c) Mode: The most frequent point
It does not exists if the frequency of occurrence of any
data point is equal to one
For example, the following data set does not have a
mode since each of the data points only appear once
{1, 2, 3, 4, 6, 7, 8, 10, 12}
It may not be unique
For instances, the following data set contains two
modes, namely “3” and “6”. It is called bimodal
{1, 2, 3, 3, 4, 5, 6, 6, 7, 10}
Chapter 1: Basic Concepts in Probability and Statistics
Page 8
 Measure of dispersion of ungrouped data:
(a) Variance:
 2=
 2 is
n
1
n
 (yi - my)2
i 1
the population variance of the data
The sample variance s2 is no longer equal to  2 and it
is given by:
s2 =
n
1
n 1
 (yi - my)2
i 1
Note that
- The value of the sample variance s2 is fixed for a
given set of samples. It is subjected to variation
if a different set of samples is obtained
- If we perform random sampling many times and
calculate the sample variance for each random
sample, it is expected that the average of the
sample variances is equal to the unknown
population variance  2. Hence, s2 is called
an unbiased estimate of  2
(b)
Standard derivation:

n
= [ 1  (yi - my)2]1/2
n
i 1

is called the population standard deviation of the
data
Chapter 1: Basic Concepts in Probability and Statistics
Page 9
The sample standard deviation s of  is given by:
s= [
1
n 1
n
 (yi - my)2]1/2
i 1
Note that s is a biased estimate of 
(c) Percentiles:
- The q-th percentile of an ungrouped data set with size n
is defined as the value of the variable y corresponding
to the q( n  1 ) –th element if the data are arranged in
100
ascending order
- If q( n  1 ) is not an integer, for instances, it is between k
100
and k+1, we can find the q-th percentile using
interpolation between yk and yk+1
- Special cases: Median (i.e. 50th percentile), First
quartile Q1 (i.e. 25th percentile),
Third quartile Q3 (i.e. 75th percentile)
- Example:
Consider the following set of data:
{1, 3, 5, 6, 8, 10, 12, 13, 15, 16}
(a) Find the 15th percentile of the given data set
(b) Find the median of the given data set
(c) Find Q1 and Q3 of the given data set
Chapter 1: Basic Concepts in Probability and Statistics
Page 10
Solution:
(a) First, we notice that
q( n  1 ) = 15( 10  1 ) = 1.65
100
100
Hence, k = 1 and the 15th percentile should lie
between y1 = 1 and y2 = 3
By interpolation, the 15th percentile p15 is given by:
p15 = y1 + ( 1.65  1 ) (y2 - y1) = 2.3
2 1
(b) The median of the data set is given by:
q( n  1 ) = 50( 10  1 ) = 5.5
100
100
Hence, k = 5 and the median should lie between
y5 = 8 and y6 = 10
By interpolation, the median, namely p50, is given
by:
p50 = y1 + ( 5.5  5 ) (y6 - y5) = 0.5(y5 + y6) = 9
65
(c) The first quartile Q1 of the data set is given by:
q( n  1 ) = 25( 10  1 ) = 2.75
100
100
Chapter 1: Basic Concepts in Probability and Statistics
Page 11
Hence, k = 2 and the first quartile Q1 should lie
between y2 = 3 and y3 = 5
By interpolation, the first quartile Q1 is given by:
Q1 = y2 + ( 2.75  2 ) (y3 - y2) = 4.5
32
The third quartile Q3 of the data set is given by:
q( n  1 ) = 75( 10  1 ) = 8.25
100
100
Hence, k = 8 and the third quartile Q3 should lie
between y8 = 13 and y9 = 15
By interpolation, the third quartile Q3 is given by:
Q3 = y8 + ( 8.25  8 ) (y9 - y8) = 13.5
98
(d) Range:
- Definition:
Range = Max – Min
- Example:
Consider the data set in the aforementioned example:
Max = 16, Min = 1
Hence, the range of the data set is given by:
Range = 15
Chapter 1: Basic Concepts in Probability and Statistics
Page 12
(e) Inter-quartile range (IQ):
- Definition:
IQ = Q3 – Q1
- Example:
Consider the data set in the aforementioned example:
Q1 = 4.5; Q3 = 13.5
Hence, the range of the data set is given by:
IQ = 9
 Measures of central tendency and dispersion of grouped
data or frequency table:
Suppose we have N observations on a variable X and
these observations are grouped into M (  N)
different classes, namely {x1, x2,…, xM}.
Let fk (k = 1, 2, …, M) denote the number of observations
falling into the k-th class or the frequency of occurrence
of the k-th class
M
Note that  fk = N
k 1
Chapter 1: Basic Concepts in Probability and Statistics
Page 13
Then, the frequency distribution can be presented as in
the following table:
Class value
x1
x2
x3
……
xM
Frequency
f1
f2
f3
……
fM
(a) Example: (Discrete variable X)
Suppose X represents the number of members under 21
in a family in Hong Kong
Then, the class values are given the distinct values of the
discrete variable X
(b) Example: (Continuous variable X)
Suppose X represents the heights of students in an
university
Then, the range covered by the observations in the
sample can be divided into several classes, typically but
not necessarily of equal width. The mid-point of the
interval corresponding to the k-th class is used to specify
the representative value xk
The resulting distribution is called a grouped frequency
distribution
(c) Mean of the grouped data
x
=
1
N
M
 fk xk
k 1
Chapter 1: Basic Concepts in Probability and Statistics
Page 14
It is also called the weighted mean of the data with the
weights being determined by the frequencies {f1, f2, …,
fM} for different classes
(d) Variance and standard deviation of the grouped data
(Sample version)
The sample variance s2 of the grouped data is given by:
s2 =
1
N 1
M
 fk (xk – x )2
k 1
The sample standard deviation s of the grouped data is
given by:
s= [
1
N 1
M
 fk (xk – x )2]1/2
k 1
A useful formula for the sample variance s2 is given by:
s2 =
M
1
[
N  1 k 1
fk xk2 –
1
N
M
(  fk xk)2]
k 1
15. Other important statistical measures of data:
 Population skewness:
(a) It measures the degree of symmetry of a set of data
(b) It is defined as follows:
Suppose we have a set of N observations or data
{Y1, Y2,…, YN}
Chapter 1: Basic Concepts in Probability and Statistics
Page 15
where Y and s2 are the population mean and the variance of
the data
(c) Zero skewness for a set of data indicates that the data is
nearly symmetric, for instances, the skewness of a normal
distribution is zero
(d) Positive skewness indicates that the data are skewed to
right;
that is, the right tail is heavier than the left tail
(e) Negative skewness indicates that the data are skewed to left;
that is, the left tail is heavier than the right tail
 Sample skewness:
Suppose we have obtained the following set of sample data
{x1, x2,…, xn}
Then, the sample skewness is defined by:

=
n
(n  1)( n  2)
n
 (xk - x )3 / s3,
k 1
where x is the sample mean and s is the sample standard
deviation
Prior sampling, the sample skewness is given by the following
random variable 
Chapter 1: Basic Concepts in Probability and Statistics
=
n
(n  1)( n  2)
Page 16
n
 (Xk - x )3 / S3
k 1
If {X1, X2,…, Xn} are normally distributed,
sample n tends to infinity

~ N(0, 6/n) as the
 Population kurtosis:
(a) It measures whether the data is heavy-tailed (fat-tailed) or
not relative to a normal distribution
(b) It is defined by:
where Y and s2 are the population mean and the variance of
the data
(c) The kurtosis of a normal distribution is 3
(d) If the kurtosis of a set of data is greater (less) than 3, the
data is said to be heavy-tailed or fat-tailed (light-tailed
or thin-tailed) relative to a normal distribution
Most of the real-life returns data, especially for daily
returns and intra-day returns, for financial assets, like
equities, stock indices, commodities and exchange rates,
exhibit heavy-tailed feature
(e) The excess kurtosis of a given data set defined as the
kurtosis of the data minus 3
For instance, the excess kurtosis of a normal distribution is
zero
Chapter 1: Basic Concepts in Probability and Statistics
Page 17
 Sample excess kurtosis:
Suppose we have obtained the following set of sample data
{x1, x2,…, xn}
Then, the sample excess kurtosis is defined by:

=
n(n  1)
(n  1)( n  2)( n  3)
n
 (xk - x )4 / s4 – 3 (n – 1)2/[(n – 2)(n – 3)],
k 1
where x is the sample mean and s is the sample standard
deviation
Prior sampling, the sample excess kurtosis is given by the
following random variable K
K=
n(n  1)
(n  1)( n  2)( n  3)
n
 (Xk -
X )4 /
S4 – 3 (n – 1)2/[(n – 2)(n – 3)]
k 1
If {X1, X2,…, Xn} are normally distributed, K ~ N(0, 24/n) as
the sample n tends to infinity
16. Measure of association between two samples
 Pearson correlation coefficient
Consider the samples {x1, x2,..., xn} and {y1, y2, …, yn} for the
continuous variables X and Y
Chapter 1: Basic Concepts in Probability and Statistics
Page 18
We can measure the linear association between the variables X
and Y by the Pearson correlation coefficient rXY defined as
follows:
rXY =
Sxy
SxxSyy
,
n
n
i 1
i 1
where Sxy =  (xi - x ) (yi - y ), Sxx =  (xi - x )2 and
n
Syy =  (yi - y )2
i 1
 Some important properties of rXY:
(a) rXY

(-1, 1)
(b) It can only measure the linear dependent relationship
between two continuous variables
(c) rXY = 0 => X and Y are uncorrelated but not independent
in general
(d) If X and Y are normally distributed and rXY = 0, then X and
Y are independent
Chapter 1: Basic Concepts in Probability and Statistics
Page 19
Section 1.2: A Snapshot of Statistical Inference
1. Statistic:
 A function of samples that describes and summarizes a
certain distributional characteristic of the samples
Suppose {x1, x2, …,xn} is a random samples
The function T(x1, x2, …,xn) is called a statistic of the
samples {x1, x2, …,xn}
Examples: Sample mean, sample variance and sample
standard deviation
 A very important element in statistical inference
 Unknown prior sampling
 Subject to variations for different samples
2. Sampling distribution:
 Distribution of a statistic
 Useful for constructing interval estimates (i.e. Confidence
interval) of the unknown parameter corresponding to the
statistic
 Useful for constructing tests for hypotheses on the
unknown parameter
Chapter 1: Basic Concepts in Probability and Statistics
Page 20
2. The essence of statistical inference is to use statistics and their
sampling distributions:
 To construct point estimates and interval estimates of the
corresponding unknown parameters of the distribution of
the population
 To construct test for hypotheses on the unknown parameters
17. Examples of basic statistical inference:
 Use of the sample mean, variance and standard deviation as
point estimates of the unknown population mean, variance
and standard deviation, respectively
 Use the sampling distributions for the sample mean,
variance and standard deviation to construct the confidence
intervals of and tests of hypotheses on the population mean,
variance and standard deviation, respectively
18. Statistical models:
 A family of probability distributions that are assumed to
describe the random observations
 Assume that the observed data are generated by one
member of the family of probability distributions
 Parametric statistical models:
(a) A family of probability distributions indexed by some
finite-dimensional vectors of parameters in an index
set which is a subset of an Euclidean space
Chapter 1: Basic Concepts in Probability and Statistics
Page 21
(b) The functional form of the probability distributions is
specified
(c) For example, the observed data are from a normal
distribution with unknown mean but known variance
 Non-parametric statistical models:
(a) Difficult to provide a formal definition of non-parametric
models
(b) The functional form of the probability distributions is not
specified
 Semi-parametric statistical models: A half-way house
between parametric models and non-parametric models
19. Parametric statistical inference:
 Suppose the observed data are generated by one member of
the family corresponding to an unknown value of the
parameter called the “true” value of the parameter
 Infer the “true” value of the parameter using the observed
data. For instance, we perform:
(a) Point estimation of the parameter
(b) Interval estimation of the parameter
(c) Testing hypotheses on the parameter
Chapter 1: Basic Concepts in Probability and Statistics
Page 22
Section 1.3: Basic Probability Theory
1. Some terminologies and notations:
 Random experiment: An experiment whose actual outcome
is not known or given in advance
Examples: Toss a coin, Roll a die
 Sample space  : the set of all possible outcomes of a
random experiment
Example:
Consider a random experiment of rolling a die:
Sample space

= {1, 2, 3, 4, 5, 6}
 Event: A subset of the sample space
Denoted by capital letters A, B, C,…..
Example:
Consider a random experiment of rolling a die:
Let A denote the event that an even number appears on the
upper face of a die
Then, A = {2, 4, 6} which is a subset of

Chapter 1: Basic Concepts in Probability and Statistics
Page 23
 Probability P: A number between zero and one inclusively
that represents the likelihood of the occurrence of an event
(a) If E is an impossible event, P(E) = 0
(b) If E is a sure event, P(E) = 1
(c) Example:
Consider a random experiment of tossing a fair coin
twice
The sample space of the experiment is
{HH, HT, TH, TT}
What is the probability of getting two heads?
The required probability P({HH}) = (0.5)2 = 0.25
 Rules of probability:
(a) Area rule:
- Mutually exclusive or disjoint events A and B:
P(A or B) = P(A) + P(B)
- Non-disjoint events A and B:
P(A or B) = P(A) + P(B) – P(A and B)
Chapter 1: Basic Concepts in Probability and Statistics
Page 24
(b) Product rule:
- Independent events A and B:
P(A and B) = P(A) P(B)
- Dependent events A and B:
P(A and B) = P(A | B) P(B)
 Conditional probability:
The conditional probability of an event A given the
occurrence of an event B is defined by:
P(A | B) = P(A and B) / P(B)
 Bayes Rules:
P(A | B) = [P(B | A) P(A)] / P(B)
Proof by the argument of symmetry
 Random variable X:
(a) Take values on the real line according to the outcome of
a random experiment
(b) Its value is not known before the outcome of the
experiment is realized
Chapter 1: Basic Concepts in Probability and Statistics
Page 25
(c) Example:
Consider the experiment of tossing a fair coin twice:
Suppose the random variable X represents the number
of heads we get from the experiment
Then, X can take values in the set {0, 1, 2} according to
the outcome of the experiment
More specifically, we have
X({TT}) = 0; X({HT}) = 1; X({TH}) = 1; X({HH}) = 2
 Discrete random variables:
(a) Take values in a set of discrete real values {x1, x2, …, xn}
(b) Determine the probabilistic or statistical properties by a
probability distribution which is defined by a set of
probabilities as follows:
P({X = xk}) = pk, k = 1, 2,…, n,
where
(i) pk  0, k = 1, 2,…, n
n
(ii)  pk = 1
k 1
Note that the probability distribution is called a discrete
probability distribution
Chapter 1: Basic Concepts in Probability and Statistics
Page 26
(c) Different interpretation for each probability pk
(k = 1, 2,…,n)
(i) Classical probability:
Equal probable space (i.e. pk = 1/n, for each k = 1,
2,…,n)
(ii) Empirical probability:
Relative frequency of occurrence of an event
Suppose one perform the experiment M times
Let Fk denote the number of times that the event
{X = xk} occurs
Then, one can assign the probability pk by:
pk = Fk/M =Relative Frequency, k = 1, 2, …, n
n
Note that  Fk = M
k 1
By the law of large number,
True probability pk = lim M   (Fk/M)
(iii) Subjective probability:
Assign based on the subjective view or belief of an
agent
Chapter 1: Basic Concepts in Probability and Statistics
Page 27
 Expectation of a discrete random variable:
Consider a random variable X taking values in a set
{x1, x2, …, xn} and having the following probability
distribution:
P({X = xk}) = pk, k = 1, 2,…, n
Then, the expectation of X, denoted as E(X), is defined as
follows:
n
n
k 1
k 1
E(X) :=  xk P({X = xk}) =  xk pk
Example:
Consider the experiment of tossing a fair coin twice
Let X = the number of heads
We have mentioned that X can take values in the set {0, 1,2}
The probability distribution for X is evaluated as follows:
P({X = 0}) = P({TT}) = 0.25
P({X = 1}) = P({HT}or{TH}) = P({HT}) + P({TH})
= 0.25 + 0.25 = 0.5
P({X=2}) = P({HH}) = 0.25
Hence, the expectation of X is given by:
Chapter 1: Basic Concepts in Probability and Statistics
Page 28
E(X) = 0  0.25 + 1  0.5 + 2  0.25 = 1
Note that we can interpret the expectation of a random
variable X as the weighted average value or the weighted
mean value of the random variable with the weights
determined by the corresponding probability values
 Variance of a discrete random variable:
The variance of a discrete random variable X, denoted as
Var(X), is defined by:
Var(X) = E[(X – E(X))2]
It can be interpreted as the weighted average value of
(X – E(X))2 with the weights determined by the
corresponding probability values
We can also calculate Var(X) by the following formula:
n
n
k 1
k 1
Var(X) =  xk2 pk – (  xk pk)2
Example:
Consider the experiment of tossing a fair coin twice
Let X = the number of heads
Then, the variance of X is given by:
Var(X) = 02  0.25 + 12  0.5 + 22  0.25 - 12 = 0.5
Chapter 1: Basic Concepts in Probability and Statistics
Page 29
 Standard deviation of a discrete random variable:
The standard deviation of a discrete random variable X,
denoted as SD(X), is defined by:
SD(X) =
Var(X )
 Some important discrete probability distributions:
(a) Binomial distribution:
Consider a random experiment consisting of n trials that
satisfies the following conditions:
(i) Each trial is independent with each other
(ii) Each trial consists of only two possible outcomes
namely “Success” and “Failure”
(iii) The probability of “Success”, denoted as p, remains
constant from trial to trial (Note that 0  p  1)
Let X denote the number of successes in the n trials
X is a discrete random variable taking values in the set
{0, 1, 2, …, n}
Under the above three conditions, it can be shown that
pk = P(X = k) = nCk pk (1 - p)n-k, k = 0, 1, 2,…, n,
where nCk = n!/[k!(n - k)!]
Chapter 1: Basic Concepts in Probability and Statistics
Page 30
X is said to follow a binomial distribution with parameters
n and p
Mathematically, we can write X ~ Bin (n, p)
(b) Examples:
(i) The number of heads in n tosses of a fair coin
(ii) The number of defaults in n independent firms
in an economy
(c) Mean and variance of binomial distribution:
Suppose X ~ Bin (n, p)
Then, we have
E(X) = n p; Var(X) = n p (1 - p)
(d) Poisson distribution:
Let X denote a random variable taking values in
{0, 1, 2, …} (i.e. the set of all non-negative integers)
X is said to follow a Poisson distribution with intensity
parameter  > 0 if
pk = P(X = k) = (e-   k)/k!, k = 0, 1, 2, …
Mathematically, we can write X ~ Poi(  )
Chapter 1: Basic Concepts in Probability and Statistics
Page 31
(e) Examples:
(i) The number of arrivals of customers at a service
counter over a given period of time
(ii) The number of phone calls you received over a
given period of time
(f) Mean and variance of Poisson distribution:
Suppose X ~ Poi(  )
Then, we have
E(X) =  ; Var(X) = 
 Continuous random variables:
(a) Take any values in an interval (a, b) on the real line,
where either, or both, of a and b may be infinite
(b) Determine its probabilistic and statistical properties by
defining a probability density function (p.d.f.) of a
continuous random variable X as follows:
(c) Consider an extremely short interval [x, x+dx] of the
values taken by X
A probability density function (p.d.f.) of X, denoted as
f(x), is defined as:
P(x <X < x + dx) = f(x) dx
Chapter 1: Basic Concepts in Probability and Statistics
Page 32
Note that the p.d.f. must satisfies the following
properties:
(i) f(x)

0, for any x in (a, b)
b
(ii)  f(x) dx = 1
a
(d) Determine the probability that X takes values on
[c, d] by the p.d.f f(x) (Note that a < c < d < b)
d
P(c < X < d) =  f(x) dx
c
(e) Probability distribution function:
The probability distribution function of X, denoted as
F(x) is defined by:
x
F(x):= P(a < X < x) =  f(u) du, x

b
a
Note that F(a) = 0 and F(b) = 1
(f) Examples of continuous random variables:
(i) The average temperature in HK tomorrow
(ii) The closing value of Hang Seng Index (HSI)
tomorrow
Chapter 1: Basic Concepts in Probability and Statistics
Page 33
(g) Mean and variance of a continuous random variable
Suppose X is a continuous random variable with p.d.f.
f(x) and taking values on the interval (a, b)
Then, the mean of X is given by:
b
E(X) =  x f(x) dx
a
The variance of X is given by:
b
Var(X) =  ( x - E(x))2 f(x) dx
a
Question: What is the mode of X?
 Some important continuous random variables:
(a) Normal distribution (Gaussian distribution or error
distribution)
(b) Definition:
A continuous random variable X is said to follow
a normal distribution with mean  and variance  2
(i.e. X ~ N(  ,  2)) if the p.d.f. of X is given by:
f(x) =
1
2 
exp (- 1  -2 ( x 2
 )2
), -  < x <

Note that X can take any values on the real line
Chapter 1: Basic Concepts in Probability and Statistics
Page 34
(c) It is a bell-shaped curve symmetrical about its mean
(d) Determine the probability that a normal random
variable X takes values on an interval [c, d]
Suppose X ~ N(  ,  2)
Question: What is the probability P(a < X < b)?
Answer:
Let Z =
X 

~ N(0, 1)
Then, we have
P(a < X < b) = P( a   < Z <

= P(Z <
=
(
b
b


)-
b

)
) – P(Z < a   )
(
a


),
where  (z) is the probability distribution function
of the standard normal distribution N(0, 1)
Mathematically,
 (z):=
P(Z

z)
Given the value of z,  (z) can be determined by
using the standard normal table, or vice versa
Chapter 1: Basic Concepts in Probability and Statistics
Page 35
(f) Many distributions in probability and statistics can be
well approximated by a normal distribution
For instance, binomial distribution when the number
of trials n goes large
 Chi-square distribution (  2 - distribution)
(a) Definition:
A continuous random variable X is said to follow a chisquare distribution with degree of freedom  if the p.d.f.
f(x) of X is given by:
f(x) =
1
2- / 2
( / 2)
x  / 2 - 1 e - x / 2, x > 0,
where the gamma function
( ) :=

-x -1
dx
 e x
0
(b) Mean and variance of a chi-square distribution
E(X) =  and Var(X) = 2
(c) Application of
 2-
distribution:
(i) Goodness-of-fit test
(ii) Variance ratio test
(d) The percentiles  2 (  ; ) of a chi-square random variable
 2 ( ) with degree of freedom  can be determined from
the chi-square table for the given probability level  and
Chapter 1: Basic Concepts in Probability and Statistics
Page 36
degree of freedom 
 Student’s t-distribution (or t-distribution, for short)
(a) Definition:
Suppose X ~ N(0, 1) and  2 ~  2 ( ), where  2 ( ) is
the chi-square distribution with degree of freedom 
Then, the random variable T defined as:
T = X/(  2/  )1/2 ~ t( ),
where t( ) is the t-distribution with degree of freedom 
(b) It is a bell-shaped curve symmetrical about the axis t = 0
(c) It approaches the standard normal distribution
when   
(d) The percentiles t (  ; ) of a chi-square random variable
t ( ) with degree of freedom  can be determined from
the student’s t-table for the given probability level  and
degree of freedom 
 Moment generating function (mgf):
(a) Definition:
The moment generating function for a random variable X
or its probability distribution is given by:
MX ( ) := E(eX  ),
provided that the expectation exists
Chapter 1: Basic Concepts in Probability and Statistics
Page 37
(b) Some useful properties:
(i) The distribution of a random variable can be
determined uniquely from its mgf since all the
moments of the distribution can be calculated
from the mgf; that is,
E(Xk) = the k-th moment of X = MX (k)(0)
(ii) Suppose X and Y are independent
MX+Y ( ) = MX (  ) MY ( )
(iii) Suppose a and b are constants
MaX+b ( ) = eb  MX (a )
 Convergence of distribution:
Let X1, X2,…, Xn and X be random variables with probability
distribution functions F1, F2,…, Fn and F, respectively
Then, we say that {Xn} converges in distribution to X if
Fn(x)  F(x) as n  
 Weak law of large number:
Let X1, X2,… be i.i.d. with mean
Then,
Xn 
as n
 

<

and
Xn
=
1
n
n
 Xi
i 1
Chapter 1: Basic Concepts in Probability and Statistics
Page 38
 Central Limit Theory:
Let X1, X2,… be i.i.d. with mean

<

and variance  2 <
Then, n ( X n –  ) /  converges in distribution to N(0, 1)
as n  
~ End of Chapter 1~
