Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics What Is Engineering? July 16, 2007 Why Study Statistics? Statistics: A mathematical science concerned with data collection, presentation, analysis, and interpretation. Statistics can tell us about… Sports Population Economy New York City, NY 4.6% Unemployed Employed Other 32.3% 63.1% Estimated Population Employment Rates: 2006 8,250,000 8,200,000 8,150,000 8,100,000 8,050,000 8,000,000 7,950,000 7,900,000 2000 2001 2002 2003 2004 2005 2006 Year Why Study Statistics? Statistical analysis is also an integral part of scientific research! Are your experimental results believable? Example: Breaking strength of Welds Control: Response: Velocity (102 ft/min) Breaking Strength (ksi) 2.00 89 2.5 97, 91 2.75 98 3.00 100, 104, 97 Data suggests a relationship between velocity and breaking strength! Not perfect – have random error. (To make a weld, the operator stops a rotating part by forcing it into a stationary part; the resulting friction generates heat that produces a hot-pressure weld.) Why Study Statistics? Responses and measurements are variable! Due to… Randomness (or individual differences) in sampling population. Inability to perform measurements in exactly the same way every time. Goal of statistics is to find the model that best describes a target population by taking sample data. Represent randomness using probability. Probability Experiment of chance: a phenomena whose outcome is uncertain. Probabilities Chances Sample Space Probability Model Events Probability of Events Sample Space: Set of all possible outcomes Event: A set of outcomes (a subset of the sample space). An event E occurs if any of its outcomes occurs. Probability: The likelihood that an event will produce a certain outcome. Probability Consider a deck of playing cards… Sample Space? Event? Probability? Set of 52 cards R: The card is red. F: The card is a face card. H: The card is a heart. 3: The card is a 3. P(R) = 26/52 P(F) = 12/52 P(H) = 13/52 P(3) = 3/52 Events and variables Can be described as random or deterministic: The outcome of a random event cannot be predicted: The sum of two numbers on two rolled dice. The time of emission of the ith particle from radioactive material. The outcome of a deterministic event can be predicted: The measured length of a table to the nearest cm. Motion of macroscopic objects (projectiles, planets, space craft) as predicted by classical mechanics. Extent of randomness A variable can be more random or more deterministic depending on the degree to which you account for relevant parameters: Mostly deterministic: Only a small fraction of the outcome cannot be accounted for. Length of a table: • Temperature/humidity variation • Measurement resolution • Instrument/observer error • Quantum-level intrinsic uncertainty Mostly Random: Most of the outcome cannot be accounted for. • Trajectory of a given molecule in a solution Random variables Can be described as discrete or continuous: • A discrete variable has a countable number of values. Number of customers who enter a store before one purchases a product. • The values of a continuous variable can not be listed: Distance between two oxygen molecules in a room. Consider data collected for undergraduate students: Random Variable Possible Values Gender Male, Female Class Fresh, Soph, Jr, Sr Height (inches) # in interval {30,90} College Arts, Education, Engineering, etc. Shoe Size 3, 3.5 … 18 Is the height a discrete or continuous variable? How could you measure height and shoe size to make them continuous variables? Probability Distributions If a random event is repeated many times, it will produce a distribution of outcomes (statistical regularity). (Think about scores on an exam) The distribution can be represented in two ways: • Frequency distribution function: represents the distribution as the number of occurrences of each outcome • Probability distribution function: represents the distribution as the percentage of occurrences of each outcome Discrete Probability Distributions Consider a discrete random variable, X: f(xi) is the probability distribution function What is the range of values of f(xi)? Therefore, Pr(X=xi) = f(xi) Discrete Probability Distributions Properties of discrete probabilities: Pr( X xi ) f ( xi ) 0 k for all i k Pr( X x ) f ( x ) 1 i i 1 i 1 i for k possible discrete outcomes Pr(a X b) F (b) F (a) f (x ) a xi b Where: F ( x) Pr( X x) i Discrete Probability Distributions Example: Waiting for a success Consider an experiment in which we toss a coin until heads turns up. Outcomes, w = {H, TH, TTH, TTTH, TTTTH…} Let X(w) be the number of tails before a heads turns up. f ( x) 1 2 For x = 0, 1, 2…. x 1 0.5 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 Waiting time 4 5 6 Cumulative Discrete Probability Distributions j Pr( X x' ) F ( x' ) f ( xi ) i 1 Where xj is the largest discrete value of X less than or equal to x’ Pr( X xk ) 1 Cumulative Continuous Probability Distributions For continuous variables, the events of interest are intervals rather than isolated values. Consider waiting time for a bus which is equally likely to be anywhere in the next ten minutes: Not interested in probability that the bus will arrive in 3.451233 minutes, but rather the probability that the bus will arrive in the subinterval (a,b) minutes: ba P(a T b) F (b) F (a) 10 F(t) 1 t 10 Continuous Probability Density Function c.d.f: Gives the fraction of the total probability that lies at or to the left of each x p.d.f: Gives the density of concentration of probability at each point x In terms of the c.d.f.: P( x X x x) F ( x x) F ( x) F ( x) When F(x) is differentiable at x, and Δx is small, we can approximate ΔF by the differential of F: dF ( x) F ' ( x)x Continuous Probability Distributions Properties of the cumulative distribution function: F () 0 0 F ( x) 1 F ( ) 1 Properties of the probability density function: b Pr( a X b) F (b) F (a) f ( x)dx a Continuous Probability Distributions Example: Gaussian (normal) distribution: ( x )2 1 f ( x) exp 2 2 2 Each member of the normal distribution family is described by the mean (μ) and variance (σ2). Standard normal curve: μ = 0, σ = 1. Central Limit Theorem As the sample size goes to infinity, the distribution function of the standardized variable leads to the normal distribution function! http://www.jhu.edu/virtlab/prob-distributions/ Moments In physics, the moment refers to the force applied to a system at a distance from the axis of rotation (as in a lever). In mathematics, the moment is a measure of how far a function is from the origin. The 1st moment about the origin: (mean) Average value of x The 2nd moment about the mean: A measure of the ‘spread’ of the data (variance) Moments Other values in terms of the moments: Skewness: 3 2 3/ 2 ‘lopsidedness’ of the distribution a symmetric distribution will have a skewness = 0 negative skewness, distribution shifted to the left positive skewness, distribution shifted to the right Kurtosis: 4 ( 2 ) 2 Describes the shape of the distribution with respect to the height and width of the curve (‘peakedness’) Standard Error 2 Standard Deviation: Variance is the average squared distance of the data from the mean. Therefore, the standard deviation measures the spread of data about the mean. Standard Error: N Where N is the sample size How do we reduce the size of our standard error? Independence A measure of whether two variables are related. Random variation or correlation? Consider data collected for arrowhead breakage: Base Middle Tip Fire 21 8 18 47 Other 15 11 4 30 36 19 22 77 Does the location of the fracture depend on the cause of fracture? Or in other words, is the location of fracture independent of the cause?