* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download JOINT AND CONDITIONAL DISTRIBUTIONS
Survey
Document related concepts
Transcript
RANDOM VARIABLES, EXPECTATIONS, VARIANCES ETC. 1 Variable • Recall: • Variable: A characteristic of population or sample that is of interest for us. • Random variable: A function defined on the sample space S that associates a real number with each outcome in S. 2 DISCRETE RANDOM VARIABLES • If the set of all possible values of a r.v. X is a countable set, then X is called discrete r.v. • The function f(x)=P(X=x) for x=x1,x2, … that assigns the probability to each value x is called probability density function (p.d.f.) or probability mass function (p.m.f.) 3 Example • Discrete Uniform distribution: 1 P(X x ) ; x 1,2,..., N; N 1,2,... N • Example: throw a fair die. P(X=1)=…=P(X=6)=1/6 4 CONTINUOUS RANDOM VARIABLES • When sample space is uncountable (continuous) • Example: Continuous Uniform(a,b) 1 f (X) ba a x b. 5 CUMULATIVE DENSITY FUNCTION (C.D.F.) • CDF of a r.v. X is defined as F(x)=P(X≤x). • Note that, P(a<X ≤b)=F(b)-F(a). • A function F(x) is a CDF for some r.v. X iff it satisfies lim x lim x lim F( x ) 1 h 0 a b F( x ) 0 F( x h ) F( x ) implies F(x) is continuous from right F(a ) F( b) F(x) is non-decreasing. 6 Example • • • • Consider tossing three fair coins. Let X=number of heads observed. S={TTT, TTH, THT, HTT, THH, HTH, HHT, HHH} P(X=0)=P(X=3)=1/8; P(X=1)=P(X=2)=3/8 x F(x) (-∞,0) 0 [0,1) 1/8 [1,2) 1/2 [2,3) 7/8 [3, ∞) 1 7 Example 3 f ( x ) 2 ( 1 x ) for x 0 • Let x 2(1 t ) 3 dt 1 (1 x ) 2 for x 0 F( x ) P(X x ) 0 0 for x 0 P(0.4 X 0.45) 0.45 0.4 f ( x )dx F(0.45) F(0.4) 0.035 8 JOINT DISTRIBUTIONS • In many applications there are more than one random variables of interest, say X1, X2,…,Xk. JOINT DISCRETE DISTRIBUTIONS • The joint probability mass function (joint pmf) of the k-dimensional discrete rv X=(X1, X2,…,Xk) is f x1, x 2 ,..., x k PX1 x1, X2 x 2 ,..., Xk x k x1, x 2 ,..., x k of X . 9 JOINT DISCRETE DISTRIBUTIONS • A function f(x1, x2,…, xk) is the joint pmf for some vector valued rv X=(X1, X2,…,Xk) iff the following properties are satisfied: f(x1, x2,…, xk) 0 for all (x1, x2,…, xk) and ... f x1, x 2 ,..., x k 1. x1 xk 10 Example • Tossing two fair dice 36 possible sample points • Let X: sum of the two dice; Y: |difference of the two dice| • For e.g.: – For (3,3), X=6 and Y=0. – For both (4,1) and (1,4), X=5, Y=3. 11 Example • Joint pmf of (x,y) x 2 0 1 y 2 3 3 1/36 4 5 1/36 1/18 6 7 1/36 1/18 1/18 1/18 1/18 5 9 1/36 1/18 4 8 1/18 1/18 11 1/36 1/18 1/18 10 12 1/36 1/18 1/18 1/18 1/18 1/18 Empty cells are equal to 0. e.g. P(X=7,Y≤4)=f(7,0)+f(7,1)+f(7,2)+f(7,3)+f(7,4)=0+1/18+0+1/18+0=1/9 12 MARGINAL DISCRETE DISTRIBUTIONS • If the pair (X1,X2) of discrete random variables has the joint pmf f(x1,x2), then the marginal pmfs of X1 and X2 are f1 x1 f x1 , x2 and f 2 x2 f x1 , x2 x2 x1 13 Example • In the previous example, 5 – P(X 2) P(X 2, y) P(X 2, y 0) ... P(X 2, y 5) 1 / 36 y 0 – P(Y 2) 12 P( x, Y 2) 4 / 18 x 2 14 JOINT DISCRETE DISTRIBUTIONS • JOINT CDF: Fx1, x 2 ,..., x k PX1 x1,..., Xk x k . • F(x1,x2) is a cdf iff lim Fx1, x 2 F , x 2 0, x 2 . x1 lim x 2 Fx1, x 2 Fx1, 0, x1. lim Fx1, x 2 F, 1 x1 x 2 P(a X1 b, c X 2 d) Fb, d Fb, c Fa , d Fa , c 0, a b and c d. lim Fx1 h, x 2 lim Fx1, x 2 h Fx1, x 2 , x1 and x2 . h 0 h 0 15 JOINT CONTINUOUS DISTRIBUTIONS • A k-dimensional vector valued rv X=(X1, X2,…,Xk) is said to be continuous if there is a function f(x1, x2,…, xk), called the joint probability density function (joint pdf), of X, such that the joint cdf can be given as Fx1, x 2 ,..., x k x1 x 2 xk ... f t1, t 2 ,..., t k dt1dt 2 ...dt k 16 JOINT CONTINUOUS DISTRIBUTIONS • A function f(x1, x2,…, xk) is the joint pdf for some vector valued rv X=(X1, X2,…,Xk) iff the following properties are satisfied: f(x1, x2,…, xk) 0 for all (x1, x2,…, xk) and ... f x1, x 2 ,..., x k dx1dx 2 ...dx k 1. 17 JOINT CONTINUOUS DISTRIBUTIONS • If the pair (X1,X2) of discrete random variables has the joint pdf f(x1,x2), then the marginal pdfs of X1 and X2 are f1 x1 f x1 , x2 dx2 and f 2 x2 f x1 , x2 dx1 . 18 JOINT DISTRIBUTIONS • If X1, X2,…,Xk are independent from each other, then the joint pdf can be given as f x1, x 2 ,..., x k f x1 f x 2 ...f x k And the joint cdf can be written as Fx1, x 2 ,..., x k Fx1 Fx 2 ...Fx k 19 CONDITIONAL DISTRIBUTIONS • If X1 and X2 are discrete or continuous random variables with joint pdf f(x1,x2), then the conditional pdf of X2 given X1=x1 is defined by f x1, x 2 f x 2 x1 , x1 such that f x1 0, 0 elsewhere. f x1 • For independent rvs, f x2 x1 f x2 . f x1 x2 f x1 . 20 Example Statistical Analysis of Employment Discrimination Data (Example from Dudewicz & Mishra, 1988; data from Dawson, Hankey and Myers, 1982) % promoted (number of employees) Pay grade Affected class others 5 100 (6) 84 (80) 7 88 (8) 87 (195) 9 93 (29) 88 (335) 10 7 (102) 8 (695) 11 7 (15) 11 (185) 12 10 (10) 7 (165) 13 0 (2) 9 (81) 14 0 (1) 7 (41) Affected class might be a minority group or e.g. women 21 Example, cont. • Does this data indicate discrimination against the affected class in promotions in this company? • Let X=(X1,X2,X3) where X1 is pay grade of an employee; X2 is an indicator of whether the employee is in the affected class or not; X3 is an indicator of whether the employee was promoted or not • x1={5,7,9,10,11,12,13,14}; x2={0,1}; x3={0,1} 22 Example, cont. Pay grade Affected class others 10 7 (102) 8 (695) • E.g., in pay grade 10 of this occupation (X1=10) there were 102 members of the affected class and 695 members of the other classes. Seven percent of the affected class in pay grade 10 had been promoted, that is (102)(0.07)=7 individuals out of 102 had been promoted. • Out of 1950 employees, only 173 are in the affected class; this is not atypical in such studies. 23 Example, cont. Pay grade Affected class others 10 7 (102) 8 (695) • E.g. probability of a randomly selected employee being in pay grade 10, being in the affected class, and promoted: P(X1=10,X2=1,X3=1)=7/1950=0.0036 (Probability function of a discrete 3 dimensional r.v.) • E.g. probability of a randomly selected employee being in pay grade 10 and promoted: P(X1=10, X3=1)= (7+56)/1950=0.0323 (Note: 8% of 695 > 56) (marginal probability function of X1 and X3) 24 Example, cont. • E.g. probability that an employee is in the other class (X2=0) given that the employee is in pay grade 10 (X1=10) and was promoted (X3=1): P(X2=0| X1=10, X3=1)= P(X1=10,X2=0,X3=1)/P(X1=10, X3=1) =(56/1950)/(63/1950)=0.89 (conditional probability) • probability that an employee is in the affected class (X2=1) given that the employee is in pay grade 10 (X1=10) and was promoted (X3=1): P(X2=1| X1=10, X3=1)=(7/1950)/(63/1950)=0.11 25 Production problem • Two companies manufacture a certain type of sophisticated electronic equipment for the government; to avoid the lawsuits lets call them C and company D. In the pact, company C has had 5% good output, whereas D had 50% good output (i.e., 95% of C’s output and 50% of D’s output is not of acceptable quality). The government has just ordered 10,100 of these devices from company D and 11,000 from C (maybe political reasons, maybe company D does not have a large enough capacity for more orders). Before the production of these devices start, government scientists develop a new manufacturing method that they believe will almost double the % of good devices received. Companies C and D are given this info, but its use is optional: they must each use this new method for at least 100 of their devices, but its use beyond that point is left to their discretion. Production problem, cont. • When the devices are received and tested, the following table is observed: Production method Results Standard New Bad 5950 9005 Good 5050 (46%) 1095 (11%) • Officials blame scientists and companies for producing with the lousy new method which is clearly inferior. • Scientists still claim that the new method has almost doubled the % of good items. • Which one is right? Production problem, cont. • Answer: the scientists rule! Company C Results D Standard New Standard New Bad 950 9000 5000 5 Good 50 (5%) 1000 (10%) 5000 (50%) 95 (95%) • The new method nearly doubled the % of good items for both companies. • Company D knew their production under standard method is already good, so they used the new item for only minimum allowed. • This is called Simpson’s paradox. Do not combine the results for 2 companies in such cases. Describing the Population • We’re interested in describing the population by computing various parameters. • For instance, we calculate the population mean and population variance. 29 EXPECTED VALUES Let X be a rv with pdf fX(x) and g(X) be a function of X. Then, the expected value (or the mean or the mathematical expectation) of g(X) g x f X x , if X is discrete x E g X g x f X x dx, if X is continuous providing the sum or the integral exists, i.e., <E[g(X)]<. 30 EXPECTED VALUES • E[g(X)] is finite if E[| g(X) |] is finite. g x f X x < , if X is discrete x E g X g x f X x dx< , if X is continuous 31 Population Mean (Expected Value) • Given a discrete random variable X with values xi, that occur with probabilities p(xi), the population mean of X is E(X) x i p( x i ) all xi 32 Population Variance – Let X be a discrete random variable with possible values xi that occur with probabilities p(xi), and let E(xi) =. The variance of X is defined by V( X) E( X ) ( x i ) p( x i ) 2 2 2 Unit*Unit all xi The s tan dard deviation is Unit 2 33 EXPECTED VALUE • The expected value or mean value of a continuous random variable X with pdf f(x) is E( X ) xf ( x)dx all x • The variance of a continuous random variable X with pdf f(x) is 2 Var ( X ) E ( X ) 2 ( x ) 2 f ( x)dx all x E( X 2 ) 2 all x ( x) 2 f ( x)dx 2 34 EXAMPLE • The pmf for the number of defective items in a lot is as follows 0.35, x 0 0.39, x 1 p ( x) 0.19, x 2 0.06, x 3 0.01, x 4 Find the expected number and the variance of defective items. 35 EXAMPLE • Let X be a random variable. Its pdf is f(x)=2(1-x), 0< x < 1 Find E(X) and Var(X). 36 Laws of Expected Value • Let X be a rv and a, b, and c be constants. Then, for any two functions g1(x) and g2(x) whose expectations exist, a) E ag1 X bg 2 X c aE g1 X bE g 2 X c b) If g1 x 0 for all x, then E g1 X 0. c) If g1 x g 2 x for all x, then E g1 x E g 2 x . d ) If a g1 x b for all x, then a E g1 X b 37 Laws of Expected Value and Variance Let X be a rv and c be a constant. Laws of Expected Value E(c) = c E(X + c) = E(X) + c E(cX) = cE(X) Laws of Variance V(c) = 0 V(X + c) = V(X) V(cX) = c2V(X) 38 EXPECTED VALUE E ai X i ai E X i . i 1 i 1 k k If X and Y are independent, Eg X hY Eg X EhY The covariance of X and Y is defined as CovX, Y EX EX Y EY E(XY ) E(X)E (Y) 39 EXPECTED VALUE If X and Y are independent, Cov X ,Y 0 The reverse is usually not correct! It is only correct under normal distribution. If (X,Y)~Normal, then X and Y are independent iff Cov(X,Y)=0 40 EXPECTED VALUE Var X 1 X 2 Var X 1 Var X 2 2Cov X 1 , X 2 If X1 and X2 are independent, Var X 1 X 2 Var X 1 Var X 2 41 CONDITIONAL EXPECTATION AND VARIANCE yf y x , if X and Y are discrete. y E Y x yf y x dy , if X and Y are continuous. Var Y x E Y x E Y x 2 2 42 CONDITIONAL EXPECTATION AND VARIANCE E E Y X E Y Var (Y) EX (Var (Y | X)) VarX (E(Y | X)) (EVVE rule) Proofs available in Casella & Berger (1990), pgs. 154 & 158 43 Example • An insect lays a large number of eggs, each surviving with probability p. Consider a large number of mothers. X: number of survivors in a litter; Y: number of eggs laid • Assume: X | Y ~ Binomial (Y, p) Y | ~ Poisson () ~ Exponentia l( ) • Find: expected number of survivors, i.e. E(X) 44 Example - solution EX=E(E(X|Y)) =E(Yp) =p E(Y) =p E(E(Y|Λ)) =p E(Λ) =pβ 45 SOME MATHEMATICAL EXPECTATIONS • Population Mean: = E(X) • Population Variance: 2 2 2 Var X E X E X 2 0 (measure of the deviation from the population mean) 2 0 • Population Standard Deviation: • Moments: k E X the k-th moment * k k E X the k-th central moment k 46 SKEWNESS • Measure of lack of symmetry in the pdf. Skewness E X 3 3 3 3/2 2 If the distribution of X is symmetric around its mean , 3=0 Skewness=0 47 KURTOSIS • Measure of the peakedness of the pdf. Describes the shape of the distribution. Kurtosis E X 4 4 4 2 2 Kurtosis=3 Normal Kurtosis >3 Leptokurtic (peaked and fat tails) Kurtosis<3 Platykurtic (less peaked and thinner tails) 48 KURTOSIS • What is the range of kurtosis? • Claim: Kurtosis ≥ 1. Why? • Proof: Var (Y ) E (Y 2 ) ( EY ) 2 Let Y ( X 1 ) 2 . E (( X 1 ) 4 ) Var (( X 1 ) 2 ) [ E (( X 1 ) 2 ) 2 ] Var (( X 1 ) 2 ) 4 Kurtosis Var (( X 1 ) 2 ) 4 1 1 49 Measures of Central Location • Usually, we focus our attention on two types of measures when describing population characteristics: – Central location – Variability or spread 50 Measures of Central Location • The measure of central location reflects the locations of all the data points. • How? With two data points, the central location But if the third data With one data point should fall inpoint the middle on the leftthem hand-side clearly the centralappears between (in order of the midrange, it should “pull”of location is at the point to reflect the location the central location to the left. itself. both of them). 51 The Arithmetic Mean • This is the most popular measure of central location Sum of the observations Mean = Number of observations 52 The Arithmetic Mean Sample mean x n n ii11xxii nn Sample size Population mean N i1 x i N Population size 53 The Arithmetic Mean • Example The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time on the Internet. 10 x01 x72 ... x22 i 1 xi 10 x 10 10 11.0 54 The Arithmetic Mean • Drawback of the mean: It can be influenced by unusual observations, because it uses all the information in the data set. 55 The Median • The Median of a set of observations is the value that falls in the middle when the observations are arranged in order of magnitude. It divides the data in half. Example Comment Find the median of the time on the internet Suppose only 9 adults were sampled (exclude, say, the longest time (33)) for the 10 adults of previous example Even number of observations 0, 0, 5, 0, 7, 5, 8, 7, 8, 9, 12, 14,14, 22,22, 33 33 8.59,, 12, Odd number of observations 0, 0, 5, 7, 8 9, 12, 14, 22 56 The Median • Depth of median = (n+1)/2 X (( n 1) / 2) if n is odd Median X ( k ) X ( k 1) if n is even(n 2k ) 2 57 The Mode • The Mode of a set of observations is the value that occurs most frequently. • Set of data may have one mode (or modal class), or two or more modes. The modal class 58 The Mode • Find the mode for the data in the Example. Here are the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 Solution • All observation except “0” occur once. There are two “0”s. Thus, the mode is zero. • Is this a good measure of central location? • The value “0” does not reside at the center of this set (compare with the mean = 11.0 and the median = 8.5). 59 Relationship among Mean, Median, and Mode • If a distribution is from a bell shaped symmetrical one, the mean, median and mode coincide Mean = Median = Mode • If a distribution is asymmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mode < Median < Mean Mode Mean Median 60 Relationship among Mean, Median, and Mode • If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) A negatively skewed distribution (“skewed to the left”) Mode Mean Median Mean Mode Median Mean < Median < Mode 61 Measures of variability • Measures of central location fail to tell the whole story about the distribution. • A question of interest still remains unanswered: How much are the observations spread out around the mean value? 62 Measures of variability Observe two hypothetical data sets: Small variability The average value provides a good representation of the observations in the data set. This data set is now changing to... 63 Measures of Variability Observe two hypothetical data sets: Small variability The average value provides a good representation of the observations in the data set. Larger variability The same average value does not provide as good representation of the observations in the data set as before. 64 The Range – The range of a set of observations is the difference between the largest and smallest observations. – Its major advantage is the ease with which it can be computed. – Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. But, how do all the observations spread out? The range cannot assist in answering this question ? Range ? ? Smallest observation Largest observation 65 The Variance This measure reflects the dispersion of all the observations The variance of a population of size N x1, x2,…,xN whose mean is is defined as 2 2 N ( x ) i i 1 N The variance of a sample of n observations x1, x2, …,xn whose mean is x is defined as s2 ni1( xi x)2 n 1 66 Why not use the sum of deviations? Consider two small populations: 9-10= -1 11-10= +1 8-10= -2 12-10= +2 A measure of dispersion A Can the sum of deviations agreesofwith this Be aShould good measure dispersion? The sum of deviations is observation. zero for both populations, 8 9 10 11 12 therefore, is not a good …but Themeasurements mean of both in B measure of arepopulations moredispersion. dispersed is 10... 4-10 = - 6 16-10 = +6 7-10 = -3 than those in A. B 4 Sum = 0 7 10 13 16 13-10 = +3 Sum = 067 The Variance Let us calculate the variance of the two populations 2 2 2 2 2 2 (8 10) (9 10) (10 10) (11 10) (12 10) A 2 5 2 2 2 2 2 2 (4 10) (7 10) (10 10) (13 10) (16 10) B 18 5 Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of variation instead? After all, the sum of squared deviations increases in magnitude when the variation of a data set increases!! 68 The Variance Let us calculate the sum of squared deviations for both data sets Which data set has a larger dispersion? Data set B is more dispersed around the mean A B 1 2 3 1 3 5 69 The Variance SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10 SumB = (1-3)2 + (5-3)2 = 8 SumA > SumB. This is inconsistent with the observation that set B is more dispersed. A B 1 2 3 1 3 5 70 The Variance However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked. A2 = SumA/N = 10/5 = 2 B2 = SumB/N = 8/2 = 4 A B 1 2 3 1 3 5 71 The Variance • Example – The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Find its mean and variance • Solution x i61 xi 6 17 15 23 7 9 13 84 14 jobs 6 6 n 2 ( x x ) 1 2 i1 i s (17 14)2 (15 14)2 ...(13 14)2 n 1 6 1 33.2 jobs2 72 The Variance – Shortcut method n 2 n 1 ( x ) 2 2 i1 i s x i n 1 i1 n 2 1 2 17 15 ... 13 2 2 17 15 ... 13 6 1 6 33.2 jobs2 73 Standard Deviation • The standard deviation of a set of observations is the square root of the variance. Sample standard dev iation: s s 2 Population standard dev iation: 2 74 Standard Deviation • Example – To examine the consistency of shots for a new innovative golf club, a golfer was asked to hit 150 shots, 75 with a currently used (7-iron) club, and 75 with the new club. – The distances were recorded. – Which club is better? 75 Standard Deviation • Example – solution Excel printout, from the “Descriptive Statistics” submenu. The innovation club is more consistent, and because the means are close, is considered a better club Current Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Innovation 150.5467 0.668815 151 150 5.792104 33.54847 0.12674 -0.42989 28 134 162 11291 75 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 150.1467 0.357011 150 149 3.091808 9.559279 -0.88542 0.177338 12 144 156 11261 75 76 Interpreting Standard Deviation • The standard deviation can be used to – compare the variability of several distributions – make a statement about the general shape of a distribution. • The empirical rule: If a sample of observations has a mound-shaped distribution, the interval ( x s, x s) contains approximately 68% of the measuremen ts ( x 2s, x 2s) contains approximately 95% of the measuremen ts ( x 3s, x 3s) contains approximately 99.7% of the measuremen ts 77 Interpreting Standard Deviation • Example A practitioner wants to describe the way returns on investment are distributed. – The mean return = 10% – The standard deviation of the return = 8% – The histogram is bell shaped. 78 Interpreting Standard Deviation Example – solution • The empirical rule can be applied (bell shaped histogram) • Describing the return distribution – Approximately 68% of the returns lie between 2% and 18% [10 – 1(8), 10 + 1(8)] – Approximately 95% of the returns lie between -6% and 26% [10 – 2(8), 10 + 2(8)] – Approximately 99.7% of the returns lie between -14% and 34% [10 – 3(8), 10 + 3(8)] 79 The Chebyshev’s Theorem • For any value of k 1, greater than 100(1-1/k2)% of the data lie within the interval from x ks to x ks . • This theorem is valid for any set of measurements (sample, population) of any shape!! k Interval Chebyshev Empirical Rule 1 2 3 x s, x s x 2s, x 2s x 3s, x 3s at least 0% at least 75% at least 89% (1-1/12) (1-1/22) (1-1/32) approximately 68% approximately 95% approximately 99.7% 80 The Chebyshev’s Theorem • Example – The annual salaries of the employees of a chain of computer stores produced a positively skewed histogram. The mean and standard deviation are $28,000 and $3,000,respectively. What can you say about the salaries at this chain? Solution At least 75% of the salaries lie between $22,000 and $34,000 28000 – 2(3000) 28000 + 2(3000) At least 88.9% of the salaries lie between $19,000 and $37,000 28000 – 3(3000) 28000 + 3(3000) 81 The Coefficient of Variation • The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. s Sample coefficien t of variation : cv x Population coefficien t of variation : CV • This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived large when the mean value is 100, but only moderately large when the mean value is 500 82 Percentiles • Example from http://www.ehow.com/how_2310404_calculate-percentiles.html • Your test score, e.g. 70%, tells you how many questions you answered correctly. However, it doesn’t tell how well you did compared to the other people who took the same test. • If the percentile of your score is 75, then you scored higher than 75% of other people who took the test. 83 Sample Percentiles and Box Plots • Percentile – The pth percentile of a set of measurements is the value for which • p percent of the observations are less than that value • 100(1-p) percent of all the observations are greater than that value. 84 Sample Percentiles •Find the 10 percentile of 6 8 3 6 2 8 1 •Order the data: 1 2 3 6 6 8 8 •7*(0.10) = 0.70; round up to 1 The first observation, 1, is the 10 percentile. 85 • Commonly used percentiles – First (lower) quartile, Q1 = 25th percentile – Second (middle) quartile,Q2 = 50th percentile – Third quartile, Q3 = 75th percentile – Fourth quartile, Q4 = 100th percentile – First (lower) decile = 10th percentile – Ninth (upper) decile = 90th percentile 86 Quartiles and Variability • Quartiles can provide an idea about the shape of a histogram Q1 Q2 Positively skewed histogram Q3 Q1 Q2 Q3 Negatively skewed histogram 87 Interquartile Range • Large value indicates a large spread of the observations Interquartile range = Q3 – Q1 88 Box Plot – This is a pictorial display that provides the main descriptive measures of the data set: • • • • • L - the largest observation Q3 - The upper quartile Q2 - The median Q1 - The lower quartile S - The smallest observation 1.5(Q3 – Q1) S Whisker 1.5(Q3 – Q1) Q1 Q2 Q 3 Whisker L 89 Box Plot – The following data give noise levels measured at 36 different times directly outside of Grand Central Station in Manhattan. NOISE 82 89 94 110 . . . Smallest = 60 Q1 = 75 Median = 90 Q3 = 107 Largest = 125 IQR = 32 Outliers = BoxPlot 75 75-1.5(IQR)=27 60 70 107 80 90 100 110 120 130 107+1.5(IQR) 90 =155 Box Plot NOISE - continued Q1 75 60 25% Q2 90 Q3 107 50% 125 25% – Interpreting the box plot results • The scores range from 60 to 125. • About half the scores are smaller than 90, and about half are larger than 90. • About half the scores lie between 75 and 107. • About a quarter lies below 75 and a quarter above 107. • Data is slightly positively skewed. 91 Box Plot Example: A study was organized to compare the service time in 5 drive through restaurants. Jack in the Box5 Jack in the box is the slowest in service Hardee’s Hardee’s service time variability is the largest C7 McDonalds 4 3 Wendy’s 2 Popeyes 1 Wendy’s service time appears to be the shortest on average and most consistent. 100 300 200 C6 92 Box Plot Times are symmetric Jack in the Box5 Jack in the box is the slowest in service Hardee’s Hardee’s service time variability is the largest C7 McDonalds 4 3 Wendy’s 2 Popeyes 1 Wendy’s service time appears to be the shortest and most consistent. 100 300 200 C6 Times are positively skewed 93 Paired Data Sets and the Sample Correlation Coefficient • The covariance and the coefficient of correlation are used to measure the direction and strength of the linear relationship between two variables. – Covariance - is there any pattern to the way two variables move together? – Coefficient of correlation - how strong is the linear relationship between two variables 94 Covariance Population covariance COV(X, Y) (x i x )(y i y ) N x (y) is the population mean of the variable X (Y). N is the population size. (xi x)(y i y) Sample cov ariance cov (x y, ) n-1 x (y) is the sample mean of the variable X (Y). n is the sample size. 95 Covariance • If the two variables move in the same direction, (both increase or both decrease), the covariance is a large positive number. • If the two variables move in opposite directions, (one increases when the other one decreases), the covariance is a large negative number. • If the two variables are unrelated, the covariance will be close to zero. 96 Covariance • Compare the following three sets xi yi (x – x) (y – y) (x – x)(y – y) 2 6 7 13 20 27 -3 1 2 -7 0 7 21 0 14 x=5 y =20 Cov(x,y)=17.5 xi yi (x – x) (y – y) (x – x)(y – y) 2 6 7 27 20 13 -3 1 2 7 0 -7 -21 0 -14 x=5 y =20 Cov(x,y)=-17.5 xi yi 2 6 7 20 27 13 Cov(x,y) = -3.5 x=5 y =20 97 The coefficient of correlation Population coefficien t of correlatio n COV ( X, Y) xy Sample coefficien t of correlatio n cov(X, Y) r sx sy – This coefficient answers the question: How strong is the association between X and Y. 98 The coefficient of correlation +1 Strong positive linear relationship COV(X,Y)>0 or r = or 0 No linear relationship -1 Strong negative linear relationship COV(X,Y)=0 COV(X,Y)<0 99 The Coefficient of Correlation • If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). • If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). • No straight line relationship is indicated by a coefficient close to zero. 100 The Coefficient of Correlation 101 Correlation and causation • Recognize the difference between correlation and causation — just because two things occur together, that does not necessarily mean that one causes the other. • For random processes, causation means that if A occurs, that causes a change in the probability that B occurs. 102 Correlation and causation • Existence of a statistical relationship, no matter how strong it is, does not imply a cause-and-effect relationship between X and Y. for ex, let X be size of vocabulary, and Y be writing speed for a group of children. There most probably be a positive relationship but this does not imply that an increase in vocabulary causes an increase in the speed of writing. Other variables such as age, education etc will affect both X and Y. • Even if there is a causal relationship between X and Y, it might be in the opposite direction, i.e. from Y to X. For eg, let X be thermometer reading and let Y be actual temperature. Here Y will affect X. 103 Example Dr. Leonard Eron, professor at the University of Illinois at Chicago, has conducted a longitudinal study of the long–term effects of violent television programming. In 1960, he asked 870 third grade children their favorite television shows. He found that children judged most violent by their peers also watched the most violent television. Dr. Eron noted, however, that it was not clear which came first — the child’s behavior or the influence of television. In follow-up interviews at ten–year intervals, Eron found that youngsters who at age eight were nonaggressive but were watching violent television were more aggressive than children who at age eight were aggressive and watched non–violent television. Eron claims that this establishes a cause–and–effect relationship between watching violent television and aggressive behavior. Can you think of any other possible causes? 104 Example - solution • It could be that the difference in aggressive behavior is due to other familial influences. Perhaps children who are permitted to watch violent programming are more likely to come from violent or abusive families, which could also lead to more aggressive behavior. 105