Download Statistics and Error

How many Jelly Beans fill a 0.5-Liter Bottle? Take 60 seconds to calculate on your own. Fermi Problems •Fermi Problems challenge us to ask more questions, not just provide “an answer.” •Enrico Fermi (1901-1954) – Italian physicist best known for contributions to nuclear physics and the development of quantum theory. •Fermi used a process of “zeroing in” on problems by saying that the value in question was certainly larger than one number and less than some other amount – yields a quantified answer within identified limits. •The goal is to get an answer to an order of magnitude by making reasonable assumption about the situation, not necessarily relying upon definite knowledge for an “exact” answer. Fermi Problems Solutions have algorithmic approaches as well as intuitive approaches. A very rough answer is better than no answer What was your model for a jelly bean? For a 0.5-liter bottle? Model is a partial rather than complete representation The design of a model depends as much on circumstances and constraints ($, time, materials, data, personnel, etc.) as it does on the problem being solved A symbolic representation is clean and powerful. It communicates, simply and clearly, what the modeler thinks is important, what information is needed, and how that information is used. (model jelly bean as cylinder or box? Rounded ends?) What other simplifications or assumptions did you make? Fermi Problems Get with a partner and come up with a final estimate. Describe 3 entirely different (but practical) ways for determining the area (in cm2) of the darkened region below (design is on a piece of paper) to within 0.1%. 1. Superimpose a finely-spaced grid over the figure and count squares. 2. Cut out figure and weigh it. Compare that weight to that of piece of paper. If too light, transfer image to another uniformly-dense material. 3. Divide figure into local regions that can be integrated numerically. 4. Computer scan image and count pixels. 5. Build a container whose cross-section is that of the darkened figure. Fill with 1000cc water and measure level. Describe 3 entirely different (but practical) ways for determining the area (in cm2) of the darkened region below (design is on a piece of paper) to within 0.1%. 6. Use a “polar planimeter” – gadget that mechanically integrates the area defined by a close curve. 7. “Throw darts.” Draw rectangle (of calculable area) that encloses image. Pick random points within the rectangle and count which ones fall within the darkened figure. The ratio can be used to estimate area. Why Study Statistics? Statistics: A mathematical science concerned with data collection, presentation, analysis, and interpretation. Statistics can tell us about… Sports Population Economy New York City, NY 4.6% Unemployed Employed Other 32.3% 63.1% Estimated Population Employment Rates: 2006 8,250,000 8,200,000 8,150,000 8,100,000 8,050,000 8,000,000 7,950,000 7,900,000 2000 2001 2002 2003 2004 2005 2006 Year Why Study Statistics? Statistical analysis is also an integral part of scientific research! Are your experimental results believable? Example: Tensile Strength of Spaghetti Data suggests a relationship between Type (size) and breaking strength Not perfect – have random error. Why Study Statistics? Responses and measurements are variable! Due to… Systematic Error – same error value by using an instrument the same way Random Error – may vary from observation to observation Perhaps due to inability to perform measurements in exactly the same way every time. Goal of statistics is to find the model that best describes a target population by taking sample data. Represent randomness using probability. Probability Experiment of chance: a phenomena whose outcome is uncertain. Probabilities Probability Model Chances Sample Space Events Probability of Events Sample Space: Set of all possible outcomes Event: A set of outcomes (a subset of the sample space). An event E occurs if any of its outcomes occurs. Rolling dice, measuring, performing an experiment, etc. Probability: The likelihood that an event will produce a certain outcome. Independence: Events are independent if the occurrence of one does not affect the probability of the occurrence of another. Why important? Probability Consider a deck of playing cards… Sample Space? Event? Set of 52 cards R: The card is red. F: The card is a face card. H: The card is a heart. Probability? 3: The card is a 3. P(R) = 26/52 P(F) = 12/52 P(H) = 13/52 P(3) = 4/52 Events and variables Can be described as random or deterministic: The outcome of a random event cannot be predicted: The sum of two numbers on two rolled dice. The time of emission of the ith particle from radioactive material. The outcome of a deterministic event can be predicted: The measured length of a table to the nearest cm. Motion of macroscopic objects (projectiles, planets, space craft) as predicted by classical mechanics. Extent of randomness A variable can be more random or more deterministic depending on the degree to which you account for relevant parameters: Mostly deterministic: Only a small fraction of the outcome cannot be accounted for. Length of a table: • Temperature/humidity variation • Measurement resolution • Instrument/observer error • Quantum-level intrinsic uncertainty Mostly Random: Most of the outcome cannot be accounted for. • Trajectory of a given molecule in a solution Random variables Can be described as discrete or continuous: • A discrete variable has a countable number of values. Number of customers who enter a store before one purchases a product. • The values of a continuous variable can not be listed: Distance between two oxygen molecules in a room. Consider data collected for undergraduate students: Random Variable Possible Values Gender Male, Female Class Fresh, Soph, Jr, Sr Height (inches) Integer in interval {30,90} College Arts, Education, Engineering, etc. Shoe Size 3, 3.5 … 18 Is the height a discrete or continuous variable? How could you measure height and shoe size to make them continuous variables? Probability Distributions If a random event is repeated many times, it will produce a distribution of outcomes (statistical regularity). (Think about scores on an exam) The distribution can be represented in two ways: • Frequency distribution function: represents the distribution as the number of occurrences of each outcome • Probability distribution function: represents the distribution as the percentage of occurrences of each outcome Discrete Probability Distributions Consider a discrete random variable, X: f(xi) is the probability distribution function What is the range of values of f(xi)? Therefore, Pr(X=xi) = f(xi) Discrete Probability Distributions Properties of discrete probabilities: Pr( X  xi )  f ( xi )  0 k for all i k  Pr( X  x )   f ( x )  1 for k possible discrete outcomes Pr(a  X  b)  F (b)  F (a)   f (x ) i i 1 i 1 i a  xi b Where: F ( x)  Pr( X  x) i Discrete Probability Distributions Example: Waiting for a success Consider an experiment in which we toss a coin until heads turns up. Outcomes, w = {H, TH, TTH, TTTH, TTTTH…} Let X(w) be the number of tails before a heads turns up. f ( x)  1 2 For x = 0, 1, 2…. x 1 0.5 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 Pr(a  X  b)  F (b)  F (a)   f (x ) a  xi b k k  Pr( X  x )   f ( x )  1 i 1 0 1 2 3 Waiting time 4 5 6 i i 1 i Board Example i Cumulative Discrete Probability Distributions j Pr( X  x' )  F ( x' )   f ( xi ) i 1  Where xj is the largest discrete value of X less than or equal to x’ Pr( X  xk )  1 Discrete Probability Distributions Example: Distribution Function for Die/Dice Distribution function for throwing a die: Outcomes, w = {1, 2, 3, 4, 5, 6}  f(xi) = 1/6 for I = 1,6 0.180 0.160 0.140 0.120 0.100 0.080 0.060 0.040 0.020 0.000 1 2 3 4 5 6 Discrete Probability Distributions Example: Distribution Function for Die/Dice Distribution function for the sum of two thrown dice: f(xi) = 1/36 for x1 = 2 2/36 for x2 = 3 … 0.180 0.160 0.140 0.120 0.100 0.080 0.060 0.040 0.020 0.000 2 3 4 5 6 7 8 9 10 11 12 Continuous Probability Density Function Cumulative Distribution Function (cdf): Gives the fraction of the total probability that lies at or to the left of each x Probability Density (Distribution) Function (pdf): Gives the density of concentration of probability at each point x F ( x)  Pr( X  x) Continuous Probability Distributions Properties of the cumulative distribution function: F ()  0 0  F ( x)  1 F ( x)  Pr( X  x) F ( )  1 Properties of the probability density function: b Pr( a  X  b)  F (b)  F (a)   f ( x)dx a Continuous Probability Distributions For continuous variables, the events of interest are intervals rather than isolated values. Consider waiting time for a bus which is equally likely to be anywhere in the next ten minutes: Not interested in probability that the bus will arrive in 3.451233 minutes, but rather the probability that the bus will arrive in the subinterval (a,b) minutes: ba P(a  T  b)  F (b)  F (a)  10 F(t) 1 t 10 Continuous Probability Distributions Z Example: Gaussian (normal) distribution: Z X   X   1 f ( x)  exp 2  X    Z  ( x   )2  1 f ( x)  exp   2 2  2    X    Z Each member of the normal distribution family is described by the mean (μ) and variance (σ2). Standard normal curve: μ = 0, σ = 1. Normal / Gaussian Distribution Normal (Gaussian) Distribution: Can be used to approximately describe any variable that tends to cluster around the mean. Central Limit Theorem: The sum of a (sufficiently) large number of independent random variables will be approximately normally distributed. Importance: Used as a simple model for complex phenomena – statistics, natural science, social science Examples of experiments/measurements that will e.g., Observational error assumed to follow normal distribution produce Gaussian distribution? Standard Error ( X   ) Deviation: d i  xi    N ( X   ) Standard Deviation:   N 2 2 Variance: 2  ( X   ) 2  N Varianceis the average squared distance of the data from the mean. Therefore, the standard deviation measures the spread of data about the mean.  Standard Error: N Standard Error How do we reduce the size of our standard error? 1) Repeated Measurements 2) Different Measurement Strategy Jacob Bernoulli (1731): “For even the most stupid of men, by some instinct of nature, by himself and without any instruction (which is a remarkable thing), is convinced the more observations have been made, the less danger there is of wandering from one’s goal" (Stigler, 1986). Moments Other values in terms of the moments: Skewness: 3   2 3/ 2 ‘lopsidedness’ of the distribution  a symmetric distribution will have a skewness = 0  negative skewness, distribution shifted to the left  positive skewness, distribution shifted to the right Kurtosis:  Describes the shape of the distribution with respect to the height and width of the curve (‘peakedness’) Central Limit Theorem As the sample size goes to infinity, the distribution function of the standardized variable leads to the normal distribution function! http://www.jhu.edu/virtlab/prob-distributions/ Moments In physics, the moment refers to the force applied to a system at a distance from the axis of rotation (as in a lever). In mathematics, the moment is a measure of how far a function is from the origin. The 1st moment about the origin: (mean)  Average value of x The 2nd moment about the mean:  A measure of the ‘spread’ of the data (variance) Two teams measure the height of a pole. Height in cm Team A 183 182 185 181 183 184 avg = 183 std dev = 1.41 Team B 183.0 183.5 182.7 182.5 183.1 183.3 avg = 183.0 std dev = 0.37 • Which team did the better job? • Why do you think so? If you measure something many times, you get random error. Height in cm Team A 183 182 185 181 183 184 avg = 183 std dev = 1.41 Team B 183.0 183.5 182.7 182.5 183.1 183.3 avg = 183.0 std dev = 0.37 • The positive and negative errors should balance out. • The average should be closer to the true value than any one measurement might be. • The deviations from the average for individual measurements give an indication of the reliability of that average value. • Standard deviation measures the reliability of the average. How do we make lines? 6 e6 e5 e4 e3 e2 e1 4 0 0.25 yi  (mxi  b)  yi 0.5 0.75 yi  yi  (mxi  b) 1 Plot ei vs xi 6 e6 e5 e4 e3 e2 e1 4 0 0.25 0.5 0.75 Good lines have random, uncorrelated errors 1 Error, Precision, Accuracy • Error – difference between an observed/measured value and a true value. – We usually don’t know the true value – We usually do have an estimate • Systematic Errors – Faulty calibration – User bias – Change in conditions – e.g., temperature rise • Random Errors – Statistical variation – Precision of measurement Error, Precision, Accuracy • Accuracy – measure of how close result is to the true value – Measure of correctness • Precision – measure of how well the result is determined – Measure of variation in the data, within itself, not relative to true value Error, Precision, Accuracy Low Precision Low Precision High Precision Low Accuracy High Accuracy Low Accuracy Calculators and significant digits: Let the uncertain digit determine the precision to which you quote a result Calculator: 12.6892 Estimated Error: +/- 0.07 Quote: 12.69 +/- 0.07 What is an error? • In data analysis, engineers use – error = uncertainty – error ≠ mistake. • Mistakes in calculation and measurements should always be corrected before calculating experimental error. • Measured value of x = xbest  x – xbest = best estimate or measurement of x – x = uncertainty or error in the measurements All measurements have errors. • What are some sources of measurement errors? – Instrument uncertainty (caliper vs. ruler) • Use half the smallest division. L = 9 ± 0.5 cm L = 8.5 ± 0.3 cm L = 11.8 ± 0.1 cm – Measurement error (using an instrument incorrectly) • Measure your height - not hold ruler level. – Variations in the size of the object (spaghetti is bumpy) • Statistical uncertainty If no error is given, assume half the last significant figure. • That's why you don't write 25.367941 mm. How do you account for errors in calculations? • The way you combine errors depends on the math function – added or subtracted – multiplied or divide – other functions • The sum of two lengths is Leq = L1 + L2. What is error in Leq? • The area is of a room is A = L x W. What is error in A? • A simple error calculation gives the largest probable error. Sum or difference • What is the error if you add or subtract numbers? x  x y  y z  z w x yz • The absolute error is the sum of the absolute errors. w  x  y  z upper bound What is the error in length of molding to put around a room? • L1 = 5.0cm  0.5cm and L2 = 6.0cm  0.3cm. • The perimeter is L  L1  L2  L1  L2  5.0 cm  6.0 cm  5.0 cm  6.0 cm  22 cm • The error (upper bound) is:  L   L1   L2   L1   L2  0.5cm  0.3cm  0.5cm  0.3cm  1.6cm Errors can be large when you subtract similar values. • Weight of container = 30 ± 5 g • Weight of container plus nuts = 35 ± 5 g • Weight of nuts? Weight  35  30g  5 g Error  5  5g  10 g Result  5 g  10 g  200% What is the error in the area of a room? • L = 5.0cm  0.5cm and W = 6.0cm  0.3cm. A  L  W  5.0cm  6.0cm  30.0cm 2 • What is the relative error? A L W   A L W Board Derivation 0.5cm 0.3cm    .15 or 15% 5.0cm 6.0cm • What is the absolute error? A  A  0.15  30.0cm 2  0.15  4.5cm 2 Product or quotient • What is error if you multiply or divide? x  x y  y z  z x y w z ( x  x)  ( y  y ) w  z  z • The relative error is the sum of the relative errors. w x  y z    w x y z upper bound Multiply by constant • What if you multiply a variable x by a constant B? w  Bx • The error is the constant times the absolute error. w  B x What is the error in the circumference of a circle? • C=2πR – For R = 2.15 ± 0.08 cm • C = 2 π (0.08 cm) = 0.50 cm Powers and exponents • What if you square or cube a number? w  xn • The relative error is the exponent times the relative error. w x n w x What is the error in the volume of a sphere? • V = 4/3 π R3 – For R = 2.15 ± 0.08 cm – V = 41.6 cm3 • V/V = 3 * (0.08 cm/2.15 cm) = 0.11 • V = 0.11 * 41.6 cm3 = 4.6 cm3 What is the error in the volume of a sphere? Lab “Calculus of Errors” Explanation How much error did you have in your remote measurement result?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics and Error