Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1: Basic Concepts in Probability and Statistics Page 1 Department of Mathematics Faculty of Science and Engineering City University of Hong Kong MA 3518: Applied Statistics Chapter 1: Basic Concepts in Probability and Statistics In this chapter, we will discuss some basic concepts and techniques in probability theory and statistical inference. The materials in this chapter are selective. Students are encouraged to read the first two chapters of the representative monograph by Bickel and Doksum (1977) “Mathematical Statistics: Basic Ideas and Selected Topics” for a more comprehensive review. Topics covered in this chapter are listed as follows: Section 1.1: Introductory Statistics Section 1.2: A Snapshot of Statistical Inference Section 1.3: Basic Probability Theory Let’s start our revision! Chapter 1: Basic Concepts in Probability and Statistics Page 2 Section 1.1: Introductory Statistics 1. Question: What is Statistics? Informally speaking, statistics is considered the science of studying data. It concerns: Collection of data Presentation and summarization of data Analysis of data Interpretation and conclusion 2. The role of real-life data: Numerical representation of observable information in reallife An important source of information for understanding the unknown nature of the underlying mechanism that generates them 3. Applications of Statistics: Biological Science: Population dynamic of organism, Lifecycle of organism Medical Science: Diagnostic aid, Testing of New Drugs, Clinical Trials Engineering Science: Quality Control, Reliability Social Science: Demography, Sample Survey, Census Chapter 1: Basic Concepts in Probability and Statistics Page 3 Business and Management: Forecasting sales and profit figures, Marketing Research Economics and Finance: Analysis and forecasting Economic Indicators and asset price dynamic 4. Descriptive statistics analysis: Collect, present and summarize data Extract useful information on some distributional characteristics of a given set of data (a) Central tendency (b) Dispersion (c) Skewness (d) Tail behavior 5. Statistical inference: Understand the unknown underlying mechanism that generates the observed data based on the information from the observed data Estimate the unknown parameters in a statistical model that describes the data set Test hypotheses on the unknown parameters Chapter 1: Basic Concepts in Probability and Statistics Page 4 6. Population: An entire collection of data from the measurement of all the subjects to be investigated Example: The annual income of all males from the age group 25-30 in Hong Kong 7. Sample: A set of observations obtained from the population by Direct observations or measurement: Examples: Stock prices, Weights, Heights and Temperature Conducting experiment: Examples: Survey results, Opinion poll and Rates of chemical reactions Simulation: Examples: Mark Six, Random numbers generated by computer 8. Random or Probability Sampling: The most common way to obtain a sample Has widely been used in practice, for example, the Census and Statistics Department at HKSAR Chapter 1: Basic Concepts in Probability and Statistics Page 5 9. Random sampling from a finite population: Select individuals at random (i.e. Each individual has equal probability of being chosen) either with or without replacement from the finite population Sampling with replacement: Obtain a set of independent observations, namely a random sample Sampling without replacement from a finite but large population: Obtain a set of approximately independent observations which also calls a random sample in convention 10. Random sampling from an infinite population: Identify the probability distribution for the infinite population Generate or draw random numbers from the distribution The generated random numbers are called random samples 11. Sample size: The number of observations in a random sample 12. Advantages of random sampling: Likely to draw representative observations from the population Easy to deal with using standard mathematical and probabilistic methods Chapter 1: Basic Concepts in Probability and Statistics Page 6 13. Other sampling methods: Stratified sampling: (a) Divide the whole population into mutually exclusive subgroups, called strata, according to some specified criteria or certain characteristics of the population (b) Select individuals randomly from each stratum Cluster sampling: (a) Single individuals are selected from the population in random sampling and stratified sampling (b) Samples are selected randomly in groups or clusters in cluster sampling (c) Save time and costs when the population is very dispersed Multi-stage sampling: (a) Combine cluster sampling and stratified sampling (b) Select random clusters (c) Divide the clusters into strata (d) Select individuals randomly from the strata Chapter 1: Basic Concepts in Probability and Statistics Page 7 14. Review of basic descriptive statistics: Measure of central tendency of ungrouped data: Given a set of sample data or observations {y1, y2, …, yn} with sample size being n (a) Mean: my = 1 n n yi i 1 (b) Median: mD = y((n+1)/2) if n is odd (y(n/2) + y(n/2 +1))/2 if n is even (c) Mode: The most frequent point It does not exists if the frequency of occurrence of any data point is equal to one For example, the following data set does not have a mode since each of the data points only appear once {1, 2, 3, 4, 6, 7, 8, 10, 12} It may not be unique For instances, the following data set contains two modes, namely “3” and “6”. It is called bimodal {1, 2, 3, 3, 4, 5, 6, 6, 7, 10} Chapter 1: Basic Concepts in Probability and Statistics Page 8 Measure of dispersion of ungrouped data: (a) Variance: 2= 2 is n 1 n (yi - my)2 i 1 the population variance of the data The sample variance s2 is no longer equal to 2 and it is given by: s2 = n 1 n 1 (yi - my)2 i 1 Note that - The value of the sample variance s2 is fixed for a given set of samples. It is subjected to variation if a different set of samples is obtained - If we perform random sampling many times and calculate the sample variance for each random sample, it is expected that the average of the sample variances is equal to the unknown population variance 2. Hence, s2 is called an unbiased estimate of 2 (b) Standard derivation: n = [ 1 (yi - my)2]1/2 n i 1 is called the population standard deviation of the data Chapter 1: Basic Concepts in Probability and Statistics Page 9 The sample standard deviation s of is given by: s= [ 1 n 1 n (yi - my)2]1/2 i 1 Note that s is a biased estimate of (c) Percentiles: - The q-th percentile of an ungrouped data set with size n is defined as the value of the variable y corresponding to the q( n 1 ) –th element if the data are arranged in 100 ascending order - If q( n 1 ) is not an integer, for instances, it is between k 100 and k+1, we can find the q-th percentile using interpolation between yk and yk+1 - Special cases: Median (i.e. 50th percentile), First quartile Q1 (i.e. 25th percentile), Third quartile Q3 (i.e. 75th percentile) - Example: Consider the following set of data: {1, 3, 5, 6, 8, 10, 12, 13, 15, 16} (a) Find the 15th percentile of the given data set (b) Find the median of the given data set (c) Find Q1 and Q3 of the given data set Chapter 1: Basic Concepts in Probability and Statistics Page 10 Solution: (a) First, we notice that q( n 1 ) = 15( 10 1 ) = 1.65 100 100 Hence, k = 1 and the 15th percentile should lie between y1 = 1 and y2 = 3 By interpolation, the 15th percentile p15 is given by: p15 = y1 + ( 1.65 1 ) (y2 - y1) = 2.3 2 1 (b) The median of the data set is given by: q( n 1 ) = 50( 10 1 ) = 5.5 100 100 Hence, k = 5 and the median should lie between y5 = 8 and y6 = 10 By interpolation, the median, namely p50, is given by: p50 = y1 + ( 5.5 5 ) (y6 - y5) = 0.5(y5 + y6) = 9 65 (c) The first quartile Q1 of the data set is given by: q( n 1 ) = 25( 10 1 ) = 2.75 100 100 Chapter 1: Basic Concepts in Probability and Statistics Page 11 Hence, k = 2 and the first quartile Q1 should lie between y2 = 3 and y3 = 5 By interpolation, the first quartile Q1 is given by: Q1 = y2 + ( 2.75 2 ) (y3 - y2) = 4.5 32 The third quartile Q3 of the data set is given by: q( n 1 ) = 75( 10 1 ) = 8.25 100 100 Hence, k = 8 and the third quartile Q3 should lie between y8 = 13 and y9 = 15 By interpolation, the third quartile Q3 is given by: Q3 = y8 + ( 8.25 8 ) (y9 - y8) = 13.5 98 (d) Range: - Definition: Range = Max – Min - Example: Consider the data set in the aforementioned example: Max = 16, Min = 1 Hence, the range of the data set is given by: Range = 15 Chapter 1: Basic Concepts in Probability and Statistics Page 12 (e) Inter-quartile range (IQ): - Definition: IQ = Q3 – Q1 - Example: Consider the data set in the aforementioned example: Q1 = 4.5; Q3 = 13.5 Hence, the range of the data set is given by: IQ = 9 Measures of central tendency and dispersion of grouped data or frequency table: Suppose we have N observations on a variable X and these observations are grouped into M ( N) different classes, namely {x1, x2,…, xM}. Let fk (k = 1, 2, …, M) denote the number of observations falling into the k-th class or the frequency of occurrence of the k-th class M Note that fk = N k 1 Chapter 1: Basic Concepts in Probability and Statistics Page 13 Then, the frequency distribution can be presented as in the following table: Class value x1 x2 x3 …… xM Frequency f1 f2 f3 …… fM (a) Example: (Discrete variable X) Suppose X represents the number of members under 21 in a family in Hong Kong Then, the class values are given the distinct values of the discrete variable X (b) Example: (Continuous variable X) Suppose X represents the heights of students in an university Then, the range covered by the observations in the sample can be divided into several classes, typically but not necessarily of equal width. The mid-point of the interval corresponding to the k-th class is used to specify the representative value xk The resulting distribution is called a grouped frequency distribution (c) Mean of the grouped data x = 1 N M fk xk k 1 Chapter 1: Basic Concepts in Probability and Statistics Page 14 It is also called the weighted mean of the data with the weights being determined by the frequencies {f1, f2, …, fM} for different classes (d) Variance and standard deviation of the grouped data (Sample version) The sample variance s2 of the grouped data is given by: s2 = 1 N 1 M fk (xk – x )2 k 1 The sample standard deviation s of the grouped data is given by: s= [ 1 N 1 M fk (xk – x )2]1/2 k 1 A useful formula for the sample variance s2 is given by: s2 = M 1 [ N 1 k 1 fk xk2 – 1 N M ( fk xk)2] k 1 15. Other important statistical measures of data: Population skewness: (a) It measures the degree of symmetry of a set of data (b) It is defined as follows: Suppose we have a set of N observations or data {Y1, Y2,…, YN} Chapter 1: Basic Concepts in Probability and Statistics Page 15 where Y and s2 are the population mean and the variance of the data (c) Zero skewness for a set of data indicates that the data is nearly symmetric, for instances, the skewness of a normal distribution is zero (d) Positive skewness indicates that the data are skewed to right; that is, the right tail is heavier than the left tail (e) Negative skewness indicates that the data are skewed to left; that is, the left tail is heavier than the right tail Sample skewness: Suppose we have obtained the following set of sample data {x1, x2,…, xn} Then, the sample skewness is defined by: = n (n 1)( n 2) n (xk - x )3 / s3, k 1 where x is the sample mean and s is the sample standard deviation Prior sampling, the sample skewness is given by the following random variable Chapter 1: Basic Concepts in Probability and Statistics = n (n 1)( n 2) Page 16 n (Xk - x )3 / S3 k 1 If {X1, X2,…, Xn} are normally distributed, sample n tends to infinity ~ N(0, 6/n) as the Population kurtosis: (a) It measures whether the data is heavy-tailed (fat-tailed) or not relative to a normal distribution (b) It is defined by: where Y and s2 are the population mean and the variance of the data (c) The kurtosis of a normal distribution is 3 (d) If the kurtosis of a set of data is greater (less) than 3, the data is said to be heavy-tailed or fat-tailed (light-tailed or thin-tailed) relative to a normal distribution Most of the real-life returns data, especially for daily returns and intra-day returns, for financial assets, like equities, stock indices, commodities and exchange rates, exhibit heavy-tailed feature (e) The excess kurtosis of a given data set defined as the kurtosis of the data minus 3 For instance, the excess kurtosis of a normal distribution is zero Chapter 1: Basic Concepts in Probability and Statistics Page 17 Sample excess kurtosis: Suppose we have obtained the following set of sample data {x1, x2,…, xn} Then, the sample excess kurtosis is defined by: = n(n 1) (n 1)( n 2)( n 3) n (xk - x )4 / s4 – 3 (n – 1)2/[(n – 2)(n – 3)], k 1 where x is the sample mean and s is the sample standard deviation Prior sampling, the sample excess kurtosis is given by the following random variable K K= n(n 1) (n 1)( n 2)( n 3) n (Xk - X )4 / S4 – 3 (n – 1)2/[(n – 2)(n – 3)] k 1 If {X1, X2,…, Xn} are normally distributed, K ~ N(0, 24/n) as the sample n tends to infinity 16. Measure of association between two samples Pearson correlation coefficient Consider the samples {x1, x2,..., xn} and {y1, y2, …, yn} for the continuous variables X and Y Chapter 1: Basic Concepts in Probability and Statistics Page 18 We can measure the linear association between the variables X and Y by the Pearson correlation coefficient rXY defined as follows: rXY = Sxy SxxSyy , n n i 1 i 1 where Sxy = (xi - x ) (yi - y ), Sxx = (xi - x )2 and n Syy = (yi - y )2 i 1 Some important properties of rXY: (a) rXY (-1, 1) (b) It can only measure the linear dependent relationship between two continuous variables (c) rXY = 0 => X and Y are uncorrelated but not independent in general (d) If X and Y are normally distributed and rXY = 0, then X and Y are independent Chapter 1: Basic Concepts in Probability and Statistics Page 19 Section 1.2: A Snapshot of Statistical Inference 1. Statistic: A function of samples that describes and summarizes a certain distributional characteristic of the samples Suppose {x1, x2, …,xn} is a random samples The function T(x1, x2, …,xn) is called a statistic of the samples {x1, x2, …,xn} Examples: Sample mean, sample variance and sample standard deviation A very important element in statistical inference Unknown prior sampling Subject to variations for different samples 2. Sampling distribution: Distribution of a statistic Useful for constructing interval estimates (i.e. Confidence interval) of the unknown parameter corresponding to the statistic Useful for constructing tests for hypotheses on the unknown parameter Chapter 1: Basic Concepts in Probability and Statistics Page 20 2. The essence of statistical inference is to use statistics and their sampling distributions: To construct point estimates and interval estimates of the corresponding unknown parameters of the distribution of the population To construct test for hypotheses on the unknown parameters 17. Examples of basic statistical inference: Use of the sample mean, variance and standard deviation as point estimates of the unknown population mean, variance and standard deviation, respectively Use the sampling distributions for the sample mean, variance and standard deviation to construct the confidence intervals of and tests of hypotheses on the population mean, variance and standard deviation, respectively 18. Statistical models: A family of probability distributions that are assumed to describe the random observations Assume that the observed data are generated by one member of the family of probability distributions Parametric statistical models: (a) A family of probability distributions indexed by some finite-dimensional vectors of parameters in an index set which is a subset of an Euclidean space Chapter 1: Basic Concepts in Probability and Statistics Page 21 (b) The functional form of the probability distributions is specified (c) For example, the observed data are from a normal distribution with unknown mean but known variance Non-parametric statistical models: (a) Difficult to provide a formal definition of non-parametric models (b) The functional form of the probability distributions is not specified Semi-parametric statistical models: A half-way house between parametric models and non-parametric models 19. Parametric statistical inference: Suppose the observed data are generated by one member of the family corresponding to an unknown value of the parameter called the “true” value of the parameter Infer the “true” value of the parameter using the observed data. For instance, we perform: (a) Point estimation of the parameter (b) Interval estimation of the parameter (c) Testing hypotheses on the parameter Chapter 1: Basic Concepts in Probability and Statistics Page 22 Section 1.3: Basic Probability Theory 1. Some terminologies and notations: Random experiment: An experiment whose actual outcome is not known or given in advance Examples: Toss a coin, Roll a die Sample space : the set of all possible outcomes of a random experiment Example: Consider a random experiment of rolling a die: Sample space = {1, 2, 3, 4, 5, 6} Event: A subset of the sample space Denoted by capital letters A, B, C,….. Example: Consider a random experiment of rolling a die: Let A denote the event that an even number appears on the upper face of a die Then, A = {2, 4, 6} which is a subset of Chapter 1: Basic Concepts in Probability and Statistics Page 23 Probability P: A number between zero and one inclusively that represents the likelihood of the occurrence of an event (a) If E is an impossible event, P(E) = 0 (b) If E is a sure event, P(E) = 1 (c) Example: Consider a random experiment of tossing a fair coin twice The sample space of the experiment is {HH, HT, TH, TT} What is the probability of getting two heads? The required probability P({HH}) = (0.5)2 = 0.25 Rules of probability: (a) Area rule: - Mutually exclusive or disjoint events A and B: P(A or B) = P(A) + P(B) - Non-disjoint events A and B: P(A or B) = P(A) + P(B) – P(A and B) Chapter 1: Basic Concepts in Probability and Statistics Page 24 (b) Product rule: - Independent events A and B: P(A and B) = P(A) P(B) - Dependent events A and B: P(A and B) = P(A | B) P(B) Conditional probability: The conditional probability of an event A given the occurrence of an event B is defined by: P(A | B) = P(A and B) / P(B) Bayes Rules: P(A | B) = [P(B | A) P(A)] / P(B) Proof by the argument of symmetry Random variable X: (a) Take values on the real line according to the outcome of a random experiment (b) Its value is not known before the outcome of the experiment is realized Chapter 1: Basic Concepts in Probability and Statistics Page 25 (c) Example: Consider the experiment of tossing a fair coin twice: Suppose the random variable X represents the number of heads we get from the experiment Then, X can take values in the set {0, 1, 2} according to the outcome of the experiment More specifically, we have X({TT}) = 0; X({HT}) = 1; X({TH}) = 1; X({HH}) = 2 Discrete random variables: (a) Take values in a set of discrete real values {x1, x2, …, xn} (b) Determine the probabilistic or statistical properties by a probability distribution which is defined by a set of probabilities as follows: P({X = xk}) = pk, k = 1, 2,…, n, where (i) pk 0, k = 1, 2,…, n n (ii) pk = 1 k 1 Note that the probability distribution is called a discrete probability distribution Chapter 1: Basic Concepts in Probability and Statistics Page 26 (c) Different interpretation for each probability pk (k = 1, 2,…,n) (i) Classical probability: Equal probable space (i.e. pk = 1/n, for each k = 1, 2,…,n) (ii) Empirical probability: Relative frequency of occurrence of an event Suppose one perform the experiment M times Let Fk denote the number of times that the event {X = xk} occurs Then, one can assign the probability pk by: pk = Fk/M =Relative Frequency, k = 1, 2, …, n n Note that Fk = M k 1 By the law of large number, True probability pk = lim M (Fk/M) (iii) Subjective probability: Assign based on the subjective view or belief of an agent Chapter 1: Basic Concepts in Probability and Statistics Page 27 Expectation of a discrete random variable: Consider a random variable X taking values in a set {x1, x2, …, xn} and having the following probability distribution: P({X = xk}) = pk, k = 1, 2,…, n Then, the expectation of X, denoted as E(X), is defined as follows: n n k 1 k 1 E(X) := xk P({X = xk}) = xk pk Example: Consider the experiment of tossing a fair coin twice Let X = the number of heads We have mentioned that X can take values in the set {0, 1,2} The probability distribution for X is evaluated as follows: P({X = 0}) = P({TT}) = 0.25 P({X = 1}) = P({HT}or{TH}) = P({HT}) + P({TH}) = 0.25 + 0.25 = 0.5 P({X=2}) = P({HH}) = 0.25 Hence, the expectation of X is given by: Chapter 1: Basic Concepts in Probability and Statistics Page 28 E(X) = 0 0.25 + 1 0.5 + 2 0.25 = 1 Note that we can interpret the expectation of a random variable X as the weighted average value or the weighted mean value of the random variable with the weights determined by the corresponding probability values Variance of a discrete random variable: The variance of a discrete random variable X, denoted as Var(X), is defined by: Var(X) = E[(X – E(X))2] It can be interpreted as the weighted average value of (X – E(X))2 with the weights determined by the corresponding probability values We can also calculate Var(X) by the following formula: n n k 1 k 1 Var(X) = xk2 pk – ( xk pk)2 Example: Consider the experiment of tossing a fair coin twice Let X = the number of heads Then, the variance of X is given by: Var(X) = 02 0.25 + 12 0.5 + 22 0.25 - 12 = 0.5 Chapter 1: Basic Concepts in Probability and Statistics Page 29 Standard deviation of a discrete random variable: The standard deviation of a discrete random variable X, denoted as SD(X), is defined by: SD(X) = Var(X ) Some important discrete probability distributions: (a) Binomial distribution: Consider a random experiment consisting of n trials that satisfies the following conditions: (i) Each trial is independent with each other (ii) Each trial consists of only two possible outcomes namely “Success” and “Failure” (iii) The probability of “Success”, denoted as p, remains constant from trial to trial (Note that 0 p 1) Let X denote the number of successes in the n trials X is a discrete random variable taking values in the set {0, 1, 2, …, n} Under the above three conditions, it can be shown that pk = P(X = k) = nCk pk (1 - p)n-k, k = 0, 1, 2,…, n, where nCk = n!/[k!(n - k)!] Chapter 1: Basic Concepts in Probability and Statistics Page 30 X is said to follow a binomial distribution with parameters n and p Mathematically, we can write X ~ Bin (n, p) (b) Examples: (i) The number of heads in n tosses of a fair coin (ii) The number of defaults in n independent firms in an economy (c) Mean and variance of binomial distribution: Suppose X ~ Bin (n, p) Then, we have E(X) = n p; Var(X) = n p (1 - p) (d) Poisson distribution: Let X denote a random variable taking values in {0, 1, 2, …} (i.e. the set of all non-negative integers) X is said to follow a Poisson distribution with intensity parameter > 0 if pk = P(X = k) = (e- k)/k!, k = 0, 1, 2, … Mathematically, we can write X ~ Poi( ) Chapter 1: Basic Concepts in Probability and Statistics Page 31 (e) Examples: (i) The number of arrivals of customers at a service counter over a given period of time (ii) The number of phone calls you received over a given period of time (f) Mean and variance of Poisson distribution: Suppose X ~ Poi( ) Then, we have E(X) = ; Var(X) = Continuous random variables: (a) Take any values in an interval (a, b) on the real line, where either, or both, of a and b may be infinite (b) Determine its probabilistic and statistical properties by defining a probability density function (p.d.f.) of a continuous random variable X as follows: (c) Consider an extremely short interval [x, x+dx] of the values taken by X A probability density function (p.d.f.) of X, denoted as f(x), is defined as: P(x <X < x + dx) = f(x) dx Chapter 1: Basic Concepts in Probability and Statistics Page 32 Note that the p.d.f. must satisfies the following properties: (i) f(x) 0, for any x in (a, b) b (ii) f(x) dx = 1 a (d) Determine the probability that X takes values on [c, d] by the p.d.f f(x) (Note that a < c < d < b) d P(c < X < d) = f(x) dx c (e) Probability distribution function: The probability distribution function of X, denoted as F(x) is defined by: x F(x):= P(a < X < x) = f(u) du, x b a Note that F(a) = 0 and F(b) = 1 (f) Examples of continuous random variables: (i) The average temperature in HK tomorrow (ii) The closing value of Hang Seng Index (HSI) tomorrow Chapter 1: Basic Concepts in Probability and Statistics Page 33 (g) Mean and variance of a continuous random variable Suppose X is a continuous random variable with p.d.f. f(x) and taking values on the interval (a, b) Then, the mean of X is given by: b E(X) = x f(x) dx a The variance of X is given by: b Var(X) = ( x - E(x))2 f(x) dx a Question: What is the mode of X? Some important continuous random variables: (a) Normal distribution (Gaussian distribution or error distribution) (b) Definition: A continuous random variable X is said to follow a normal distribution with mean and variance 2 (i.e. X ~ N( , 2)) if the p.d.f. of X is given by: f(x) = 1 2 exp (- 1 -2 ( x 2 )2 ), - < x < Note that X can take any values on the real line Chapter 1: Basic Concepts in Probability and Statistics Page 34 (c) It is a bell-shaped curve symmetrical about its mean (d) Determine the probability that a normal random variable X takes values on an interval [c, d] Suppose X ~ N( , 2) Question: What is the probability P(a < X < b)? Answer: Let Z = X ~ N(0, 1) Then, we have P(a < X < b) = P( a < Z < = P(Z < = ( b b )- b ) ) – P(Z < a ) ( a ), where (z) is the probability distribution function of the standard normal distribution N(0, 1) Mathematically, (z):= P(Z z) Given the value of z, (z) can be determined by using the standard normal table, or vice versa Chapter 1: Basic Concepts in Probability and Statistics Page 35 (f) Many distributions in probability and statistics can be well approximated by a normal distribution For instance, binomial distribution when the number of trials n goes large Chi-square distribution ( 2 - distribution) (a) Definition: A continuous random variable X is said to follow a chisquare distribution with degree of freedom if the p.d.f. f(x) of X is given by: f(x) = 1 2- / 2 ( / 2) x / 2 - 1 e - x / 2, x > 0, where the gamma function ( ) := -x -1 dx e x 0 (b) Mean and variance of a chi-square distribution E(X) = and Var(X) = 2 (c) Application of 2- distribution: (i) Goodness-of-fit test (ii) Variance ratio test (d) The percentiles 2 ( ; ) of a chi-square random variable 2 ( ) with degree of freedom can be determined from the chi-square table for the given probability level and Chapter 1: Basic Concepts in Probability and Statistics Page 36 degree of freedom Student’s t-distribution (or t-distribution, for short) (a) Definition: Suppose X ~ N(0, 1) and 2 ~ 2 ( ), where 2 ( ) is the chi-square distribution with degree of freedom Then, the random variable T defined as: T = X/( 2/ )1/2 ~ t( ), where t( ) is the t-distribution with degree of freedom (b) It is a bell-shaped curve symmetrical about the axis t = 0 (c) It approaches the standard normal distribution when (d) The percentiles t ( ; ) of a chi-square random variable t ( ) with degree of freedom can be determined from the student’s t-table for the given probability level and degree of freedom Moment generating function (mgf): (a) Definition: The moment generating function for a random variable X or its probability distribution is given by: MX ( ) := E(eX ), provided that the expectation exists Chapter 1: Basic Concepts in Probability and Statistics Page 37 (b) Some useful properties: (i) The distribution of a random variable can be determined uniquely from its mgf since all the moments of the distribution can be calculated from the mgf; that is, E(Xk) = the k-th moment of X = MX (k)(0) (ii) Suppose X and Y are independent MX+Y ( ) = MX ( ) MY ( ) (iii) Suppose a and b are constants MaX+b ( ) = eb MX (a ) Convergence of distribution: Let X1, X2,…, Xn and X be random variables with probability distribution functions F1, F2,…, Fn and F, respectively Then, we say that {Xn} converges in distribution to X if Fn(x) F(x) as n Weak law of large number: Let X1, X2,… be i.i.d. with mean Then, Xn as n < and Xn = 1 n n Xi i 1 Chapter 1: Basic Concepts in Probability and Statistics Page 38 Central Limit Theory: Let X1, X2,… be i.i.d. with mean < and variance 2 < Then, n ( X n – ) / converges in distribution to N(0, 1) as n ~ End of Chapter 1~