Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scientific Practice The Detail of the Normal Distribution @UWE_JT9 @dave_lush The Binomial Distribution  This distribution can be seen when the outcomes have discrete values…   eg rolling dice Assumptions…  Fixed number of trials   Independent trials   one roll cannot influence another Two different classifications   eg we will roll the dice 10 times rolled/didn’t roll a 12 = ‘success/failure’ Probability of success stays the same for all trials  didn’t add extra dice half way through Rolling Dice  One die…    outcome values are 1, 2, 3, 4, 5 or 6 each equally probable (1 in 6) distribution is…  boring! Rolling Dice  Two dice…  outcome values are 2,3,4,5,6,7,8,9,10,11,12   each not equally probable   36 ways of making these only 1 way to get 2 (1+1), 3 ways to get 4, etc distribution is…  slightly less boring! Rolling Dice  Three dice…  outcome values are 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18   each not equally probable   216 ways of making these 27 ways to throw a 10 or 11, only 1 to get a 3 or 18 distribution is…  starting to curve Rolling Dice  Four dice…  outcome values are 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,20,21, 22,23,24    1926 ways of making these each not equally probable distribution is…  looking familiar! Rolling Dice  24 dice…  outcome values are 24 144    4.73838134 × 1018 ways of making these! each not equally probable distribution is…   looking very familiar! still discrete outcomes Rolling Dice  Infinite number of dice…     outcome values are no longer discrete but continuous the Binomial Distribution becomes known as… …the Normal Distribution/Bell Curve Substitute dice for something like height and…  height, being determined by the sum effect of a large number of factors (genes, nutrition, etc)…    looks like a continuous variable approximates the Normal Distribution Ie its variation becomes definable/predictable  we can expect our data to behave in a certain way The Normal Distribution  Represents the idealised distribution of a large number of things we measure in biology   many parameters approximate to the ND Is defined by just two things…  population mean    µ (mu) the centre of the distribution (mean=median=mode) population standard deviation (SD)     σ (sigma) the distribution ‘width’ (mean  point of inflexion) encompasses 68% of the area under the curve 95% of area found within 1.96 σ either side of mean The Normal Distribution  Is symmetrical  mean=median=mode The Normal Distribution  One SD either side of mean includes 68% of represented population  SD boundary is inflexion point   curvature changes direction the ‘s’ bit  2 SD covers 95%  3 SD covers 99.7% The Normal Distribution  All Normal Distributions are similar  differ in terms of…    mean SD (governs how ‘spikey’ curve is) Fig below…  4 different SDs, 2 different means Standardising Normal Distributions  Regardless of what they measure, all Normal Distributions can be made identical by…  subtracting the mean from every reading   dividing each reading by the SD   the mean then becomes zero a reading one SD bigger  +1 Called Standard Scores or z-scores  amazing! Different measurements  same ‘view’ Standard (z) Scores  A ‘pure’ way to represent data distribution        the actual measurements (mg, m, sec) disappear! replaced by number of SDs from the mean (zero) For any reading, z = (x - µ) / σ A survey of daily travel time had these results (in minutes): 26,33,65,28,34,55,25,44,50,36,26,37,43,62,35,38,45,32,28,34 The Mean is 38.8 min, and the SD is 11.4 min To convert the values to z-scores…     eg to convert 26 first subtract the mean: 26 - 38.8 = -12.8, then divide by the Standard Deviation: -12.8/11.4 = -1.12 So 26 is -1.12 Standard Deviations from the Mean Familiarity with the Normal Distribution  95% of the class are between 1.1 and 1.7m tall   what is the mean and SD? Assuming normal distribution…   the distribution is symmetrical, so mean height is (1.7 - 1.1) / 2 = 1.4m the range 1.1  1.7m covers 95% of the class, which equals ± 2 SDs    one SD = (1.7 – 1.1) / 4 = 0.6 / 4 = 0.15m Familiarity with the Normal Distribution  One of that class is 1.85m tall   what is the z-score of that measurement? Assuming normal distribution…     z-score = (x - µ) / σ z = (1.85m - 1.4m) / 0.15m = 0.45m / 0.15m =3   note there are no units 3 SDs cover 99.7% of the population  only 1.5 in 1000 of the class will be as tall/taller  a big class, with fractional students!  Familiarity with the Normal Distribution  36 students took a test; you were 0.5 SD above the average; how many students did better?     from the curve, 50% sit above zero from the curve, 19.1% sit between 0 and 0.5 SD so 30.9% sit above you 30.9% of 36 is about 11 Familiarity with the Normal Distribution  Need to have a ‘feel’ for this… Populations and Samples – a Diversion  A couple of seemingly pedantic but important points about distributions…  population  the potentially infinite group on which measurements might be made   don’t often measure the whole population sample  a sub-set of the population on which measurements are actually made   most studies will sample the population n is the number studied   n-1 called the ‘degrees of freedom’ often extrapolate sample results to the population Populations and Samples – so what?  The two are described/calculated differently…    μ is the population mean, x is the sample mean σ or σn is population SD, s or σn-1 is sample SD Calculating the SD is different for each   most calculators do it for you… as long as you choose the right type (pop vs samp) Populations and Samples – choosing  Analysing the results of a class test…   Analysing the results of a drug trial…   sample, since you expect the conclusions to apply to the larger population A national census collects information about age   population, since you don’t intend extrapolating the results to all students everywhere population, since by definition the census is about the population taking part in the survey If in doubt, use the sample SD  and as n increases, the difference decreases Populations and Samples – implications  The sample mean and SD are estimates of the population mean and the population SD   ie you calculate σn-1 (or s) If the sample observed is the population, then the mean and SD of that sample are the population mean and the population SD  ie you calculate σ (or σn) Implications of Estimating Pop Mean  For a sample, the ‘quality’ of the estimate of the population mean and SD depends on the number of observations made    if you sampled, say, 1 member of the population, it’s unlikely to be close to the population mean if you sampled the whole population, your estimate is the population mean in between, adding extra samples will improve estimate    sampling different amounts  a variety of means that set of means will have its own SD (!) called the Standard Error of the Mean (SEM) The Standard Error of the Mean  Recap…    each sampling of a distribution will produce a different estimate of the population mean the variation in those estimates called the SEM Surprisingly easy to calculate   SEM = sample standard dev / square root of number of samples SEM = s / √ N  eg if N=16, then SEM is 4x smaller than SD Summary  The Binomial Distribution is a basic distribution    With lots of dice Binomial Dist  Normal Dist Normal Dist fully defined just by mean and SD   Transformation to z-scores makes all NDs identical SD calculation differs for sample vs population    eg rolling dice sample is a subset of the whole population population is, erm, the whole population Estimation of population mean from a sample is always prone to uncertainty  Standard Error of Mean (s/√N) reflects uncertainty