Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor Kenneth R. Martin Lecture 12 November 16, 2016 Agenda • Housekeeping – – – – Quiz #3 Exam #2 Review HW #7 Readings • Chapter 1, 14, 10, 2, 3, 4, 5 & 6 Confidential - Kenneth R. Martin Housekeeping • Quiz #3 Confidential - Kenneth R. Martin Housekeeping • Exam #2 Review Confidential - Kenneth R. Martin Housekeeping • • • • • • • • Read, Chapter 1.1 – 1.4 Read, Chapter 14.1 – 14.2 Read, Chapter 10.1 Read, Chapter 2 Read, Chapter 3 Read, Chapter 4 Read, Chapter 5 Read, Chapter 6 Confidential - Kenneth R. Martin Continuous vs. Discrete vs. Attribute Data Continuous infinite # of possible measurements in a continuum Discrete: Count Discrete: Ordinal 0 0 1 1 4 3 2 “low”/“small”/“short” Discrete: Nominal or Group A Categorical Attribute: Binary 2 Group B 3 4 5 7 6 5 6 “medium” / “mid” Group C Group D 7 8 8 Group E 10 Group F “good”/“go”/”group #2 defines TWO groups - no order Confidential - Kenneth R. Martin 9 10 “high”/”large”/”tall” defines several groups - no order “bad”/“no-go”/”group #1” 9 Discrete Probability Distributions Poisson Distribution: • Use for discrete single point (Integer) probabilities. • A Poisson probability distribution occurs when n is large and p0 is small. • Used for applications of observations per time, or observations per quantity. Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution: X = occurrences of events in a sample. λ = average count of events occurring per unit. e = 2.718281 Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution (example): Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution Table: Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution (alternate): c (np0 ) np0 P( c ) e c! C = count of some event occurring in a sample, i.e. count of non-conformities. np0 = average count of events occurring in population. e = constant = 2.718281 Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution: • The Poisson distribution formula can be used directly to find probability estimates, or Table C can be used. – The table gives point values, and cumulative (parenthesis from top - down) • Mean = np0 • SD = (np0)1/2 Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution (example): Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution Table: Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution Table: Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution Table: Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution Table: Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution Table: Confidential - Kenneth R. Martin Discrete Probability Distributions Poisson Distribution (example): Confidential - Kenneth R. Martin Continuous vs. Discrete vs. Attribute Data Continuous infinite # of possible measurements in a continuum Discrete: Count Discrete: Ordinal 0 0 1 1 4 3 2 “low”/“small”/“short” Discrete: Nominal or Group A Categorical Attribute: Binary 2 Group B 3 4 5 7 6 5 6 “medium” / “mid” Group C Group D 7 8 8 Group E 10 Group F “good”/“go”/”group #2 defines TWO groups - no order Confidential - Kenneth R. Martin 9 10 “high”/”large”/”tall” defines several groups - no order “bad”/“no-go”/”group #1” 9 Probability - Review Theorem 1: • Probability occurs between 0 - 1 – Probability of 1.000 means an event is certain to occur – Probability of 0 means the event is certain to NOT occur. Confidential - Kenneth R. Martin Probability - Review Theorem 2: If, P(H) = Probability of H occurring Then P(not H) = 1.000 - P(H) or P(H) = 1.000 - P(H) Confidential - Kenneth R. Martin Probability - Review Theorem 5: • The total (sum) of the probabilities, for any discrete distribution, of all situations equals to 1.000 Confidential - Kenneth R. Martin Probability - Review Definition, Theorem 5: • Correspondingly, the total area under a continuous probability distribution (normal curve) is equal to 1.000 also. However, the tails of the curve never touch the x-axis. Thus, area can be used to estimate probabilities. Confidential - Kenneth R. Martin Statistics Histogram – by increasing the quantity of data and thus # bins, the fitted line becomes smoother and more accurate Confidential - Kenneth R. Martin Statistics Histogram – by increasing the quantity of data and thus # bins, the fitted line becomes smoother and more accurate Confidential - Kenneth R. Martin Statistics Histogram – until it begins to resemble a smooth polygon or curve. Confidential - Kenneth R. Martin Statistics By increasing data, you approach the population, and ultimately get a smooth polygon. Confidential - Kenneth R. Martin Statistics Area Under Curve • We can find the area under any curve by 2 methods. 1. We can make a large quantity of really narrow bins, find each individual bin area / rectangle area (under the curve), and add them all up. We can integrate under the curve, to find the area bound by the curve and the X-axis. 2. – This method is simpler, and gives more accurate results. Confidential - Kenneth R. Martin Statistics Equation of a Normal Distribution Y= Confidential - Kenneth R. Martin Probability - Review Definition, Theorem 5: • Correspondingly, the total area under a continuous probability distribution (normal curve) is equal to 1.000 also. However, the tails of the curve never touch the x-axis. Thus, area can be used to estimate probabilities. Confidential - Kenneth R. Martin Statistics Cumulative Density Function – Cross Section f(X) = PDF +∞ f(X) ∫f(X) dx = 1.000 -∞ • Sum under entire curve = 1.000 X Confidential - Kenneth R. Martin Statistics Area Under Curve • We may wish to find the area under the curve when, for example: 1. We want to find the number of students whose final semester grade falls between standard grade lettering schemes, and we have a collection of student scores. 2. Or if we want to find the number of people who arrive at a fast food restaurant chain after 11 am, and we have the associated data. Etc. 3. Confidential - Kenneth R. Martin Statistics Continuous Probability Distribution (aka. CRV) • A function of a Continuous Random Variable that describes the likelihood the variable occurs at a certain value within a given set of points by the integral of its density (prob. density) function (i.e. corresponding area under f(x) curve). – We shall calculate CRV over ranges Confidential - Kenneth R. Martin Statistics Continuous Probability Distribution (aka. CRV) • So we are seeking to find the area under some curve, y=f(x), bounded by the X-axis, between some values along the xaxis. Confidential - Kenneth R. Martin Statistics Probability Density Function (cont. prob. dist.) f(X) = PDF f(X) a b Confidential - Kenneth R. Martin X Statistics Probability Density Function (cont. prob. dist.) f(X) = PDF = p(x≤b) - p(x≤a) = F(b) - F(a) f(X) = Entire area under curve to section(b) minus Entire area under curve to section(a) • Sum under entire curve = 1.0 Curve typically read left to right a b Confidential - Kenneth R. Martin X Statistics Cumulative Density Function f(X) = PDF t P(X<t)=∫f(X) dx = F(t) -∞ f(X) t F(t) X Confidential - Kenneth R. Martin Statistics Cumulative Density Function f(X) = PDF F(t) + R(t) = 1.0 f(X) R(t) F(t) t Confidential - Kenneth R. Martin X Statistics Normal Curve • AKA, Gaussian distribution of CRV. • Mean, Median, and Mode have the approx. same value. – Associated with mean () at center and dispersion () X N(,) [when a random variable x is distributed normally] – – • Observations have equal likelihood on both sides of mean *** When normally distributed, Mean is used to describe Central Tendency The graph of the associated probability density function is called “Bell Shaped” Confidential - Kenneth R. Martin Statistics Normal Curve Developed from a frequency histogram, with sample size, intervals (bin width), the associated curve becomes smooth. Typical of much data and distributions in reality. The basis for most quality control techniques, formulas, and assumptions. However, different Normal Distributions can have varying means and SD’s. • • • • – The means and SD’s are independent (i.e. the mean does not effect the SD, and vice versa) Confidential - Kenneth R. Martin Statistics Various Normal Curves (Different means, common SD) Confidential - Kenneth R. Martin Statistics Various Normal Curves (Different SD’s, common means) Confidential - Kenneth R. Martin Statistics Various Normal Curves Confidential - Kenneth R. Martin Statistics Standardized Normal Value • There are an infinite combination of mean and SD’s for normal curves. – Thus, the shapes of any two normal curves will be different. • To find the area under any normal curve, we can use the two methods previously described (rectangles or integration). – Or, we can use the Standard Normal Approach, thus using tables to find the area under the curve, and thus probabilities. Standard Normal Distribution: N (0,1) Confidential - Kenneth R. Martin Statistics Standardized Normal Value • Standard Normal Distribution has a Mean=0 and a SD=1 • Standard Normal Transformation (z-Transformation), converts any normal distribution with any mean and any SD to a Standard Normal Distribution with mean 0 and SD 1 • Standard Normal Distribution is distributed in “z-score” units, along the associated x-axis. Z-score specifies the number of SD units a value is above or below the mean (i.e. z = +1 indicates a value 1 SD above the mean). • A formula is used to convert your mean and SD to a z-score. Confidential - Kenneth R. Martin Statistics Normal Curve - Distribution of Data Confidential - Kenneth R. Martin Statistics Standard Normal Curve - Distribution of Data (z-scores) Confidential - Kenneth R. Martin Statistics Normal Curve - Distribution of Data Confidential - Kenneth R. Martin Statistics Standard Normal Distribution (z-scores) Confidential - Kenneth R. Martin Statistics Standardized Normal Value Confidential - Kenneth R. Martin Statistics Normal distribution example Confidential - Kenneth R. Martin Statistics Standard Normal Distribution example Confidential - Kenneth R. Martin Statistics Standardized Normal Table Confidential - Kenneth R. Martin Statistics Standardized Normal Table Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example A medical device catheter must have a diameter of 12.50 mm, with a tolerance of 0.05 mm, to function properly. If the process is centered at 12.50 mm, and a dispersion of 0.02mm, what percent of catheters must be scrapped and what percent can be reworked ? How can the process center be changed to eliminate the scrap ? What is the associated rework percentage ? Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Standardized Normal Value Example: Lightbulb burnout time is estimated by monitoring 50 bulbs. Xbar = 60 days; s = 20 days. ***Assume the average and sample SD represent the population, thus & . Assume normal dist. How many bulbs work 100 or more days ? See Example: Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example -∞ Confidential - Kenneth R. Martin +∞