Download Lecture 13 11302016

MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor Kenneth R. Martin Lecture 13 November 30, 2016 Agenda • Housekeeping – HW #8 – Readings – Final Exam Confidential - Kenneth R. Martin Housekeeping • HW #8 – Due Monday, December 5, noon, electronically • HW #8 – Solution posted December 5, 1pm Confidential - Kenneth R. Martin Housekeeping • • • • • • • • • Read, Chapter 1.1 – 1.4 Read, Chapter 14.1 – 14.2 Read, Chapter 10.1 Read, Chapter 2 Read, Chapter 3 Read, Chapter 4 Read, Chapter 5 Read, Chapter 6 Read, Chapter 8 Confidential - Kenneth R. Martin Housekeeping • Final Exam – Wednesday, December 7 – Open book, open notes Confidential - Kenneth R. Martin Continuous vs. Discrete vs. Attribute Data Continuous infinite # of possible measurements in a continuum Discrete: Count Discrete: Ordinal 0 0 1 1 4 3 2 “low”/“small”/“short” Discrete: Nominal or Group A Categorical Attribute: Binary 2 Group B 3 4 5 7 6 5 6 “medium” / “mid” Group C Group D 7 8 8 Group E 10 Group F “good”/“go”/”group #2 defines TWO groups - no order Confidential - Kenneth R. Martin 9 10 “high”/”large”/”tall” defines several groups - no order “bad”/“no-go”/”group #1” 9 Probability - Review Theorem 1: • Probability occurs between 0 - 1 – Probability of 1.000 means an event is certain to occur – Probability of 0 means the event is certain to NOT occur. Confidential - Kenneth R. Martin Probability - Review Theorem 2: If, P(H) = Probability of H occurring Then P(not H) = 1.000 - P(H) or P(H) = 1.000 - P(H) Confidential - Kenneth R. Martin Statistics Histogram – until it begins to resemble a smooth polygon or curve. Confidential - Kenneth R. Martin Probability - Review Definition, Theorem 5: • Correspondingly, the total area under a continuous probability distribution (normal curve) is equal to 1.000 also. However, the tails of the curve never touch the x-axis. Thus, area can be used to estimate probabilities. Confidential - Kenneth R. Martin Statistics Cumulative Density Function – Cross Section f(X) = PDF +∞ f(X) ∫f(X) dx = 1.000 -∞ • Sum under entire curve = 1.000 X Confidential - Kenneth R. Martin Statistics Continuous Probability Distribution (aka. CRV) • A function of a Continuous Random Variable that describes the likelihood the variable occurs at a certain value within a given set of points by the integral of its density (prob. density) function (i.e. corresponding area under f(x) curve). – We shall calculate CRV over ranges Confidential - Kenneth R. Martin Statistics Probability Density Function (cont. prob. dist.) f(X) = PDF = p(x≤b) - p(x≤a) = F(b) - F(a) f(X) = Entire area under curve to section(b) minus Entire area under curve to section(a) • Sum under entire curve = 1.0  Curve typically read left to right a b Confidential - Kenneth R. Martin X Statistics Cumulative Density Function f(X) = PDF t P(X<t)=∫f(X) dx = F(t) -∞ f(X) t F(t) X Confidential - Kenneth R. Martin Statistics Cumulative Density Function f(X) = PDF F(t) + R(t) = 1.0 f(X) R(t) F(t) t Confidential - Kenneth R. Martin X Statistics Normal Curve • AKA, Gaussian distribution of CRV. • Mean, Median, and Mode have the approx. same value. – Associated with mean () at center and dispersion () X  N(,) [when a random variable x is distributed normally] – – • Observations have equal likelihood on both sides of mean *** When normally distributed, Mean is used to describe Central Tendency The graph of the associated probability density function is called “Bell Shaped” Confidential - Kenneth R. Martin Statistics Various Normal Curves Confidential - Kenneth R. Martin Statistics Standardized Normal Value • There are an infinite combination of mean and SD’s for normal curves. – Thus, the shapes of any two normal curves will be different. • To find the area under any normal curve, we can use the two methods previously described (rectangles or integration). – Or, we can use the Standard Normal Approach, thus using tables to find the area under the curve, and thus probabilities. Standard Normal Distribution: N (0,1) Confidential - Kenneth R. Martin Statistics Standardized Normal Value • Standard Normal Distribution has a Mean=0 and a SD=1 • Standard Normal Transformation (z-Transformation), converts any normal distribution with any mean and any SD to a Standard Normal Distribution with mean 0 and SD 1 • Standard Normal Distribution is distributed in “z-score” units, along the associated x-axis. Z-score specifies the number of SD units a value is above or below the mean (i.e. z = +1 indicates a value 1 SD above the mean). • A formula is used to convert your mean and SD to a z-score. Confidential - Kenneth R. Martin Statistics Normal Curve - Distribution of Data Confidential - Kenneth R. Martin Statistics Standard Normal Curve - Distribution of Data (z-scores) Confidential - Kenneth R. Martin Statistics Normal Curve - Distribution of Data Confidential - Kenneth R. Martin Statistics Standard Normal Distribution (z-scores) Confidential - Kenneth R. Martin Statistics Standardized Normal Value Confidential - Kenneth R. Martin Statistics Normal distribution example Confidential - Kenneth R. Martin Statistics Standard Normal Distribution example Confidential - Kenneth R. Martin Statistics Standardized Normal Table Confidential - Kenneth R. Martin Statistics Standardized Normal Table Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example A medical device catheter must have a diameter of 12.50 mm, with a tolerance of 0.05 mm, to function properly. If the process is centered at 12.50 mm, and a dispersion of 0.02mm, what percent of catheters must be scrapped and what percent can be reworked ? How can the process center be changed to eliminate the scrap ? What is the associated rework percentage ? Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Standardized Normal Value Example: Lightbulb burnout time is estimated by monitoring 50 bulbs. Xbar = 60 days; s = 20 days. ***Assume the average and sample SD represent the population, thus  & . Assume normal dist. How many bulbs work 100 or more days ? See Example: Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example Confidential - Kenneth R. Martin Statistics Example -∞ Confidential - Kenneth R. Martin +∞ Inferential Statistics & Sampling Distributions Confidential - Kenneth R. Martin Inferential Statistics & Sampling Distributions Confidential - Kenneth R. Martin Hypothesis Testing • Hypothesis – a statement or proposed explanation for an observation, phenomenon, or a problem that can be tested. • Hypothesis Testing – a method for testing a hypothesis about a parameter in a population, using data measured in a sample. Confidential - Kenneth R. Martin Hypothesis Testing Hypothesis testing helps us decide if the evidence is sufficiently strong to determine if a sample statistic would be selected if the hypothesis regarding the population were true. Confidential - Kenneth R. Martin Hypothesis Testing – Role and Purpose • To provide an OBJECTIVE BASIS for evaluating the evidence in our data • To help us determine if what we THINK WE SEE in the graphical displays is STRONGLY SUPPORTED by the data • To quantify the RISK that our conclusions might be incorrect • Hypothesis tests help us answer the practical question: Is there a real difference between : – the mean (average) of two or more groups – the spread (variation) in one group and the spread in another group – the proportion of defects in one group and proportion of defects in another group – the average count (or rate of occurrence) in one group and average count in another group Confidential - Kenneth R. Martin Hypothesis Testing – Role and Purpose POPULATION SAMPLE Sampling Scheme Measure Hypothesis Testing helps determine if what we see in the sample is likely to be true for the whole population Confidential - Kenneth R. Martin Data! Hypothesis Testing – 4 Steps • Step 1: State the null and alternative Hypothesis • Null Hypothesis (H0) – a statement about the population parameter (such as the mean) that is assumed to be true • Starting point, which we will test, to determine if null is likely to be true or not. There is not a difference between (2) parameters. • Example: Children in the U.S. watch an average of 30 hours of TV per week. Ho: µ=30 • Alternative Hypothesis (Ha) - statement that contradicts the null hypothesis • We think the null is wrong, Ha allows us to state what we think is wrong. There is a difference between (2) parameters. • Example: Children in the U.S. watch more or less than 30 hours of TV per week. Ha: µ≠30 • In any case, can predict Ha to be <, > or ≠ H0 Confidential - Kenneth R. Martin Hypothesis Testing – 4 Steps • Step 2: Set the criteria for a decision • Done by stating the level of significance • Criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis • Typically the level is set at 5% in research studies • Based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true • When the probability of obtaining a sample mean is less than 5%, if the null hypothesis were true, we conclude the sample selected is too unlikely and reject the null hypothesis Confidential - Kenneth R. Martin Hypothesis Testing – 4 Steps • Step 3: Compute the test statistic • The value of test statistic can be used to make a decision regarding the null hypothesis • A mathematical formula that identifies how a sample outcome is from the value stated in the null hypothesis • It helps determine how likely the sample outcome is if the population mean stated in the null is true • The larger the value of the test statistic, the further a sample mean deviates from the population mean stated in null hypothesis Confidential - Kenneth R. Martin Hypothesis Testing – 4 Steps • Step 4: Make a decision • Based on the probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true (represented by p value) 1. Reject the null hypothesis - the sample mean is associated with a low probability of occurrence if the null is true • p value <.05; “reached significance” 2. Retain the null hypothesis - the sample mean is associated with high probability of occurrence when null is true • p value >.05; “failed to reach significance” Confidential - Kenneth R. Martin Hypothesis Testing • HO represents our “assumed working hypothesis” (even if we don’t really think it’s true!) • WHY? “Burden of proof” is placed on HA. – i.e. Need to have strong evidence that HA is true before we will “believe” it. – HA sometimes called the “research hypothesis” or the “research claim” • Two possible outcomes: – Reject HO and accept HA (“statistically significant” results) – Fail to reject HO (“not statistically significant” results) • Reject HO only if the data provides highly convincing evidence that HO is false • How convincing? Typically look for at least 95% confidence that HO is false Confidential - Kenneth R. Martin Hypothesis Testing When a citizen is placed on trial for a given crime, the U.S. legal system operates on the following principle: “The defendant is presumed innocent until proven guilty beyond a reasonable doubt.” Under such an approach, what is the null hypothesis, and what is the alternative hypothesis? Confidential - Kenneth R. Martin Hypothesis Testing • All statistical tests calculate something called a “P-value” • 1 – “P-value” = A Confidence Level we have that H0 is false (and therefore that HA is true) • “P-value” = probability that the observed result is due to a random chance (under the null hypothesis) • Decision rule: We will reject H0 only if the P-value is less than a chosen threshold (often .05, or 5%) – Assures that we have at least 95% confidence that HA is true. • Want more confidence? Specify a lower threshold for the Pvalue – Threshold P-value = significance level (a level) – Lower threshold values means…  Higher confidence when we reject HO  More difficult to reject HO When P-value is “Low”, the “Null must Go” Confidential - Kenneth R. Martin Hypothesis Testing - Summary p-value - Probability that the observed behavior can be explained purely by random variation. Significance Level / Producer’s Risk = a - Threshold which your p-value must be below to reject the null. - Represents the risk assumed for “incorrectly rejecting the null”, or detecting a difference when one does not actually exist. Consumer’s Risk = b - Represents the risk assumed for “incorrectly, not rejecting the null”, or not detecting a difference when one actually exists. Confidence Level (of test) = 1 – a - Confidence you have in rejecting the Ho, or claiming that a difference exists “going into” the test. When rejecting null, the actual confidence in your conclusion is 1 – p (value). Power (of test) = 1 - b - The probability that the test will detect a difference (result in a p-value less than your a) when there is truly a difference, for a given “practical difference” and standard deviation. - You decide the magnitude of the “practical” difference you want to detect. Confidential - Kenneth R. Martin

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 13 11302016