Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Short Resume of Statistical Terms
Fall 2013
By Yaohang Li, Ph.D.
Review
• Last Class
– Introduction to Monte Carlo
• This Class
– Important Statistics Terms
• Random Events
–
–
Independence of Random Events
Axioms on Random Events
–
Independence of Random Variables
–
Characteristics of Expectation
–
–
rth moment
rth central moment
• Random Variables
• CDF
• PDF
• Expectation
• Moments of a Distribution
•
•
•
•
•
•
•
•
Mean
Variance
Standard Deviation
Covariance
–
Characteristics of covariance
Review of Statistics and Probability Terms
Important Distribution
Central Limit Theorem
Estimand and Estimator
• Next Class
– Monte Carlo for Integration
Random Events and Probability
• Random Event
– An event which has a chance of happening
• Probability
– A numerical measure of that chance
– Lying between 0 and 1, both inclusive
• Terminology
– P(A)
• The probability that an event A occurs
– P(A+B+…)
• The probability that at least one of the events A, B, … occurs
– P(AB…)
• The probability that all the events A, B, … occur
– P(A|B)
• The probability that the event A occurs when it known that the event B
occurs
• Conditional probability of A given B
Axioms in Probability
• P(A+B+…)P(A)+P(B)+…
– If only one of the events A, B, … can occur, they are called
exclusive. The equality holds
– If at least one of the events A, B, … must occur, they are called
exhaustive. P(A+B+…)=1
• P(AB)=P(A|B)P(B)
– If P(A|B)=P(A), A and B are independent
• The chance of A occurring is uninfluenced by the
occurrence of B
Random Variables and Distributions
• Random variable ()
– A number to characterize a set of exclusive and exhaustive
events
• Cumulative Distribution Function (CDF)
– F(y)=P( y)
– The probability that the event which occurs has a value not
exceeding a prescribed y
– F(+)=1 and F(-)=1
– F(y) is a non-decreasing function of y
Expectation
• If g() is a function of , the expectation (or mean value) of g is
denoted and defined by
Eg ( ) g ( y )dF ( y )
– Stieltjes integral
– The integral is taken over all values of y
• Explanation
– Continuous random events
• F(y) is continuous and f(y) is a derivative
Eg ( ) g ( y ) f ( y )dy
– Discrete random events
• F(y) is a step function and fi is the step of height at the points of yi
Eg ( ) g ( yi ) f i
i
• Probability Density Function (pdf)
– f(y) and yi are the probability density functions
More on Expectation
• The statistical physicist uses another notation for
expectation
– Suppose pi is the probability density function
• How about if g(x) is a constant function?
Linear Combination of the
Expectation Values
Multi-dimensional Distribution
• Multi-dimensional Random Variable
– Represented used a vector
• Multi-dimensional CDF
– F(y)=P( y)
• y means that each coordinate of is not greater than
the corresponding coordinate of y
• Expectation
Eg (η) g ( y )dF ( y )
– Continuous multidimensional events
Eg (η) g ( y ) f ( y )dy
• where
k F ( y1 , y2 ,..., yk )
f ( y) f ( y1 , y2 ,..., yk )
y1y2 ...yk
Independence of Random Variables
• Consider a set of exhaustive and exclusive events, each
characterized by a pair of numbers and , for which
F(y,z) is the distribution. G(y) is an CDF for and H(z)
is an CDF for .
– F(y,z) = P( y, z)
– G(y) = P( y)
– H(z) = P( z)
• If it so happens that
– F(y,z)=G(y)H(z) for all y and z
– the random variables and are called independent
Characteristics of Expectations
Eg ( ) E g ( )
i
i
i
i
i
i
• Hold regardless whether or not the random variables i
are independent or not
Eg ( ) E g ( )
i
i
i
i
i
i
• Hold only i are mutual independent
Moments of Distribution
• rth moment of a distribution
– E(r)
• Principle moment
– = E()
• rth central moment
– r= E{(- )r}
• Most important moments
– = E(), known as the mean of
• Measure of location of a random variable
– 2, known as the variance of (usually used abbreviation of “var”)
• Measure of dispersion about the mean
– standard deviation
2
– coefficients of variation
• /
Covariance
• Definition of covariance (usually abbreviation of cov)
– If and are random variables with means and v,
respectively, the quantity E{(- )(-v)} is called the
covariance of and
– If and are independent, the covariance is 0
• Why?
– Also, cov(, )=var()
• Why?
Important Formula of Covariance
k
k
k
var( i ) cov(i , j )
i 1
i 1 j 1
Correlation Coefficient
• Definition
cov( , ) / var var
–
–
–
–
Always between +1 and -1
If =0, they are not correlated
If <0, they are negatively correlated
If >0, they are positively correlated
Important Distributions
•
•
•
•
•
Uniform Distribution
Exponential Distribution
Binomial Distribution
Poison Distribution
Normal Distribution
Uniform Distribution
• Uniform Distribution (Rectangle Distribution)
– A distribution has constant probability
– Mean?
– Variance?
Exponential Distribution
• Exponential Distribution
– mean 1/
– variance 1/ 2
Binomial Distribution
• Binomial Distribution
– Discrete probability distribution Pp(n|N) of obtaining exactly n
successes out of N Bernoulli trials
– Each Bernoulli trial is true with probability p and false with
probability q=1-p
=
=
Poisson Distribution
• Poisson Distribution
– The limit of the Binomial Distribution
– Mean is v
– Variance is v
v nev
Pv (n) lim PB (n)
N
n!
Normal Distribution
• Normal Distribution (Gaussian Distribution)
– Bell curve
– De Moivre developed the normal distribution as an
approximation to the binomial distribution
Normal Distribution in Data Analysis
• 68.26% of the data will be found within one SD
either side of the mean (±1SD)
95.44% of the data will be found within two SD
either side of the mean(±2SD)
99.74% of the data will be found within three SD
either side of the mean (±3SD)
Central Limit Theorem
• Central Limit Theorem
– The sum of n independent random variables has an
approximately normal distribution when n is large
• Random variables conform to arbitrary distribution
Central Limit Theorem in Practice
• In practice
– n = 10 is reasonably large number
– n = 25 is rather large (effective infinite)
Estimation
• Monte Carlo Computation
– Goal: estimating the unknown numerical value of some parameter of some
distribution
• The parameter is called an estimand
• Sample
• The available data (may consist of a number of observed random
variables)
• The number of observations in the sample is called the sample size
• Estimand
– mean
• (1+ 2+…+ n)/n
– weighted average
• (w11+w22+…+wnn)/(w1+w2+…+wn)
• May be a better estimator
• Connection between the sample and the estimand
– The estimand is a parameter of the distribution of the random variables
constituting the sample
Sampling Distribution
• Parent Distribution
– We can represent the sample by a vector with coordinates 1, 2, 3,…,
n
– The distribution of 1, 2, 3,…, n is called the Parent Distribution
– To estimate the estimand (a parameter of the Parent Distribution), we use
some function t()
• t is an estimator
• Sampling Distribution
– is a random variable, so is t()
• if we repeated the experiment, we should expect to get a different value
of
– Since varies from experiment, t() has a distribution, called sampling
distribution
– If t() is to be close to , then the sampling distribution ought to be closely
concentrated around
Measuring Sampling Distribution
• The bias of t
– The difference between and the average value of t()
– =E{t()-}
– t is an unbiased estimator if =0
• The sampling variance of t
– 2t=var{t()}=E{[t()-Et()]2}=E{[t- - ]2}
• If and 2t are small, t is a good estimator
Important Estimators
• Mean of the parent distribution
(1 2 ... n ) / n
– standard error
/ n
• Variance of the parent distribution
2
s 2 (1 2 ... n n ) /( n 1)
2
2
– standard error
s 2 / 0.5n
2
2
Efficiency
• Goal of Monte Carlo Work
– Obtain a respectably small standard error in the final result
– More random samples can lead to better accuracy
• Not very rewarding
– Variance Reduction Method
Summary
• Important Statistics Terms
– Random Events
• Independence of Random Events
• Axioms on Random Events
– Random Variables
• Independence of Random Variables
– CDF
– PDF
– Expectation
• Characteristics of Expectation
– Moments of a Distribution
• rth moment
• rth central moment
– Mean
– Variance
– Standard Deviation
– Covariance
• Characteristics of covariance
– Correlation Coefficient
Summary (Cont.)
• Important Distributions
– Uniform Distribution
– Exponential Distribution
– Binomial Distribution
– Poison Distribution
– Normal Distribution
• Estimation
–
–
–
–
–
Sample
Estimand
Parent Distribution
Sampling Distribution
Estimator
• Important estimators
– Buffon’s Needle
What I want you to do?
• Review Slides
• Review basic probability/statistics concepts
• Work on your Assignment 1