Download lecture2

EART20170 Computing, Data Analysis & Communication Dr Paul Connolly (F18 – Sackville Building) skills Lecturer:[email protected] 1. Data analysis (statistics) 3 lectures & practicals statistics open-book test (2 hours) 2. Computing (Excel statistics/modelling) 2 lectures assessed practical work Course notes etc: http://cloudbase.phy.umist.ac.uk/people/connolly Recommended reading: Cheeney. (1983) Statistical methods in Geology. George, Allen & Unwin Recap – last lecture  The four measurement scales: nominal, ordinal, interval and ratio.  There are two types of errors: random errors (precision) and systematic errors (accuracy).  Basic graphs: histograms, frequency polygons, bar charts, pie charts.  Gaussian statistics describe random errors.  The central limit theorem  Central values, dispersion, symmetry  Weighted mean. Some common problems X  1,4,6,3,7,4  [ x1 , x2 , x3 , x4 , x5 , x6 ] N x i 1 N i 2 ( x  x )  i i 1 Use tables xx ( x  x )2 1 -3.1667 10.0278 4 -0.1667 0.0278 6 1.8333 3.3611 3 -1.1667 1.3611 7 2.8333 8.0278 4 -0.1667 0.0278 25 0 22.8333 x  Lecture 2  Correlation between two variables  Classical linear regression  Reduced major axis regression  Propagation of errors in compound quantities. Correlation  Many real-life quantities have a dependence on some thing else. E.g dependence of rock permeability on porosity.  How can we quantify the strength and direction of a linear relationship between X and Y variables? Correlation  Linear correlation (Pearson’s coefficient)  x y N r  2  x 2   2  y 2   x     y   N N      xy        y = sum of all y-values  x = sum of all x-values  x2 = sum of all x2 values  y2 = sum of all y2 values  xy = sum of the x times y values  Like other numerical measures, the population correlation coefficient is (the Greek letter ``rho'‘, ) and the sample correlation coefficient is denoted by r. Correlation  Values of r y r = +1 y x Perfect positive correlation r = -1 y r=0 x Perfect negative correlation x No correlation Correlation  r2 is the amount of variation in x and y that is explained by the r2, fraction of explained variation linear relationship. It is often called the `goodness of fit’  E.g. if an r = 0.97 is obtained then r2 = 0.95 so 100x0.95=95% of the total variation in x and y is explained by the linear relationship, but the remaining 5% variation is due to “other” causes. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 +1.0 +0.5 +0.0 -0.5 Correlation coefficient, r -1.0 Regression analysis  How can we fit an equation to a set of numerical data x, y such that it yields the best fit for all the data? Classical linear regression  An approximate fit yields a straight line that passes through the set of points in the best possible manner without being required to pass exactly through any of the points. Classical linear regression Linear Regression Y=mx+c y { m ei c x  Where ei is the deviation of the data point from the fit line, c is the intercept, m is the gradient.  Assumes that the error is present only in y. How do we define a good fit?  If the sum of all deviations is a minimum? ei  If the sum of all the absolute deviations is a minimum? |ei|  If the maximum deviation is a minimum? emax  If the sum of all the squares of the deviations is a minimum? ei2 Classical linear regression  The best way is to minimise the sum of the squares of the deviation. Formally this involves some Mathematics:  At each value of xi: yi  mxi  c  Therefore the deviations from the curve are: ei  (Yi  yi )  The sum of the squares: S (c, m)   e  i 1 (Yi  c  mxi )2 N 2 i 1 i N Classical linear regression  How do you find the minimum of a function?  Use calculus  Differentiate and set to zero S (c, m) N  i 1 2(Yi  c  mxi )( 1)  0 c S (c, m) N  i 1 2(Yi  c  mxi )(  xi )  0 m  Two simultaneous equations cN  mi 1 xi  i 1 Yi N N ci 1 xi  m x  i 1 xiYi N N 2 i 1 i N Classical linear regression  Solving the two equations yields:  c N Y i 1 i  x    x  N  x   x  2 N i 1 i N N i 1 i i 1 i i 2 N 2 i 1 i i 1 i N i 1 xiYi  i 1 xi i 1 Yi N m N N N x  N 2 i 1 i N  x  N i 1 i 2 xY Classical linear regression  x y xy x2 ? ? ? ? Classical linear regression  Classical linear regression only considered errors in the Y values of the data.  How can we consider errors in both x and y values?  Use Reduced major axis regression Reduced major axis regression dx { y dy { c x  Method to quantify a linear relationship where both variables are dependent and have errors  Instead of minimising e2=(Y-y)2 we minimise e2=dy2+dx2. Reduced major axis regression   y  2 y y m  x 2 2 N 2   x x  N c  y  mx Reduced major axis regression  x y x-x’ y-y’ (x-x’)2 (y-y’)2 ? ? ? ? ? ? Error propagation  Every measurement of a variable has an error.  Often the error quoted is one standard deviation of the mean (mean ± standard deviation)  The standard deviation of the sample mean is usually our best estimate of the population standard deviation Error propagation  Error propagation is a way of combining two or more random errors together to get a third. The equations assume that the errors are Gaussian in nature.  It can be used when you need to measure more than one quantity to get at your final result. For example, if you wanted to predict permeability from a measured porosity and grainsize. The equations introduced here let you propagate the uncertainties on your data through the calculation and come up with an uncertainty on your results.  How then do we combine variables which have errors? Error propagation - quoted Relationship zx y z  x  y z  xy x y z  z  kx Error propagation 2  z        2  z   z 2     z     z 2     z    2  x          x           z  xn z  n x z  log ex z   x z  e z   x x x x z x x z  k x z 2  x  x 2     2         2     2   y    y    y           y y y (k=constant) 2     2     Example of propagation of error  Suppose we measure the thickness of a rock bed using a tape measure.  The tape measure is shorter then the bed thickness so we have to do it in two steps x and y.  We repeat the measurements 100 times and obtain the following mean and standard deviation values for x and y: x=12.1±0.3 cm y=4.2±0.2 cm  The thickness of the bed should be simply: x+y=16.3 cm  But what about the error on the total thickness? Example of propagation of error  It is given by propagating the individual errors as follows:  So the final answer for the total thickness of the bed is: 16.3±0.4 cm  Error propagation formulae are non-intuitive and understanding how they are derived requires some mathematical knowledge More complex examples  What if we have several functions of several variables?  E.g. calculating density using Archimedes Principle: wt . in air (A) wt. in air(A)- wt in water(W)  This equation contains two functions and two variables Density  Error propagation is best done in parts, so first work out value and error in denominator:  Then the value and error of: x  A W A x  In a few of weeks we will use a Monte Carlo method for solving more complex functions Density  Reminder Statistics practical #2  Those not taking BIOL20451: Roscoe 3.5 1100 – 1300 Tuesday  Those taking BIOL20451: Williamson 1.12 1400 – 1600 Tuesday Some common problems  Weighted mean f x What does adding two variables really mean?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecture2