Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Grading • • • • Homework 25 % In-class quiz 5 % (Jan. 29, 9:00 a.m.) First exam 35 % (Feb. 12 and due Feb. 17, 9:00 a.m.) Second exam 35 % (March 12 and due March 17, 5:00 p.m.) 1 Measures of Central Tendency and Dispersion [ST&D p. 16-27] Individual values of a population are designated Yi, i = 1,...,N, where N= size of pop. Individual values of a sample are also denoted Yi, i = 1,...,n, where n= size of the sample. Greek letters are used for population parameters (µ = pop. mean; σ2 = pop. variance). Mean or average (measure of central tendency) N Pop. mean: r Yi i 1 * Sample mean: Y N Y i i 1 n Variance (measure of dispersion of the individuals about the mean) N Pop. variance: 2 (Y i ) r 2 * Sample variance: s 2 i 1 N (Y i 1 i Y )2 n 1 The quantities (Yi - Y ) are called deviations. To express these measures of dispersion in the original units of observation: Pop. standard deviation: 2 2 * Sample standard deviation: s s To express the standard deviation in units of the mean (or %): Pop. coeff. of variation: CV * Sample coeff. of variation: CV s Y Visualization of central tendency and dispersion using boxplots * 0 Box Plots Outliers 0 >1.5 IQ and<3 IQ * >3 IQ median 1.5 IQ range interqartile (IQ) range mean Review ST&D p. 58 Estimation and inference, p53: 3.8 Distribution of means 2 Measures of dispersion of sample means An important population parameter is the sample variance of the mean ( Y ). 2 If you repeatedly sample a population by taking samples of size n, the variance of those sample means is what we call the sample variance of the mean. It relates very simply to the population variance: Variance of the mean: 2 Y 2 n We can estimate Y for a population by taking r independent, random samples 2 of size n from that population, calculating the sample means Yi , and then calculating the variance of those sample means. r sY2 (Y i 1 i Y )2 r 1 Y2 2 The square root of s Y is called standard error (or standard deviation of a mean). Standard error: sY sY2 s n As with the standard deviation, this is a quantity in the original units of observation. The SE is important in determining confidence intervals and the powers of tests. 3 The Normal distribution (~N) If you measure a quantitative trait most of the measurements will cluster near the population mean (µ), and as you consider values further and further from µ, individuals exhibiting those values become rarer. Frequency of observation µ Observed value Some basic characteristics of this kind of distribution are: 1) The maximum value occurs at µ; 2) The dispersion is symmetric about µ (i.e. the mean, median, and mode of the population are equal); and 3) The “tails” asymptotically approach zero. A distribution which meets these basic criteria is known as a normal distribution. The following conditions tend to result in a normal distribution: 1) There are many factors which contribute to the observed value of the trait; 2) These many factors act independently of one another; and 3) The individual effects of the factors are additive and of comparable magnitude. Many biological and ecological variables are approximately normally distributed. The bell-shaped normal distribution is also known as a Gaussian curve, named after Friedrich Gauss who figured out the formal mathematics: Z (Y ) 1 e 2 1 Y 2 2 Z(Y) is the height of the curve at a given observed value Y. The location and shape are uniquely determined by only two parameters, µ and σ2 . 4 If we set µ = 0 and σ2 = 1, we obtain a standard normal curve [N(0,1)]: By varying the value of µ, one can center Z(Y) anywhere on the x-axis. By varying σ2, one can freely adjust the width of the central hump. Normal (0 , 1) 0 .4 0 .3 . q e r F Normal (1 , 1) Normal (0 , 2) 0.4 0.4 . q e r F 0 .2 0 .1 0 .0 0.3 0.3 0.2 . q re 0.2 F 0 .1 0.1 0.0 0.0 -5 0 5 -5 0 Sig ma 5 -5 0 Sigma 5 Sigma To convert any ~N into a standard N curve: Standard N curve =0, =1 Zi Yi where - centers to 0 / puts variation in units of Location and Scale transformation (when 0 and/or 1) Normal (0 , 1) Normal (1 , 1) 0.4 0 .4 0.3 N(1,1) 0 .3 . q re 0.2 F -= N(0,1) 0.1 . q re F 0 .2 0 .1 0.0 0 .0 -5 Z= (Y-)/ -5 0 0 1 Norm (a0 , 2) Sa iglm 5 5 -5 -5 Norm al (0 , 1) Sig ma 0 0 5 5 0 5 0 .4 0 .4 0 .3 N(0,2) . q re F 0 .3 /= N(0,1) 0 .2 . q e r F 0 .2 0 .1 0 .1 0 .0 0 .0 -5 -5 0 0 Si g ma 5 5 -5 -5 0 Sig ma 5 The following % of items lie within the indicated limits: contains 68.27% of the items 2 contains 95.45% of the items 3 contains 99.73% of the items Conversely: 50% of the items fall between 0.674 95% of the items fall between 1.960 99% of the items fall between 2.576 68.27% 95.45% .45 99.73% 7% 5 Q1: From a ~N population of finches with mean weight µ = 17.2 g and variance σ2 = 36 g2, what is the probability of randomly selecting an individual finch weighing > than 22 g? Solution: To answer this, first convert the value 22 g to its corresponding normal score: Zi Yi 22 g 17.2 g 0.8 6g Table A14: 21.19% of the area lies to the right of Z = 0.8. Then, 22 g is not an unusual weight for a finch in this population (less than 1 SD from the mean). Question: What is this area? Or: P(Y≥22) = X Answer: P(Y≥22) = P(Z≥0.8) = 0.2119 Y 17.2 22.0 Z 0 0.8 Q2: From the same population. What is the probability of randomly selecting a sample of 20 finches with an average weight of more than 22 g? This question is asking for the probability of selecting a sample of a certain average value. For a sample of size n = 20, the appropriate distribution to consider is the normal distribution of sample means 2 36 g 2 2 1.8 g 2 for sample size n = 20 (µ = 17.2 g and Y ( n 20) n 20 With this in mind, we proceed as before: Zi Yi Y ( n20) 22 g 17.2 g 3.6 1.34 Table A14: only 0.02% of the area lies to the right of Z = 2.67 (only 0.02% chance) 22 g is an extremely unusual mean weight for a sample of twenty finches in this population (it is >3 SE from the mean!). One final word about the wide applicability of the normal distribution: The central limit theorem states that, as sample size increases, the distribution of sample means drawn from a population of any distribution will approach a normal distribution with mean µ and variance σ2/n. 6 Use of the normal distribution table (page 612, Appendix A4) For any value of Z, the table reports the area under the curve to the right of Z. This area to the right of Z is the theoretical probability of randomly picking an individual from N(0,1) whose value is greater than Z. From Table P(Z 1.17)= 0.121 (pb inside Table) If asked P(Z 1.17)=1- P(Z 1.17)= 0.879 P(0.42Z 1.61)= P(Z 0.42) - P(Z 1.61)= 0.3372 - 0.0537 = 0.2835 P(-1.61Z 0.42)= P(Z -1.61) - P(Z 0.42)= 1- P(Z 1.61) - P(Z 0.42)= [1- 0.0537] - 0.3372= 0.9463 - 0.3372=0.6091 P(|Z| 1.05)= 2 * P(Z 1.05)= 2 * 0.1469= 0.2938 7 Normal probability plot (Q-Q plot) ST&D p. 566 14 malt extract values: 77.7, 76.0, 76.9, 74.6, 74.7, 76.5, 74.2, 75.4, 76.0, 76.0, 73.9, 77.4, 76.6, 77.3 (ST&D p. 30, Lab1). N=14 Divide ~N in 14 intervals = area. Normal line: slope=s=1.227, intercept= z 78.4 Y =75.943. y= a+bx Y Y Y ( z * s ) Y (2 * 1.227) 75.943 78.4 s Sahpiro-Wilk test for ~N Correlation coefficient between the data and the normal scores. W=1 perfect ~N W=0.8 ~N? SAS PROC UNIVARIATE NORMAL; Pr<W should be lower than 0.05 to reject Normality Graphic tool for assessing normality 8