Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Universita’ degli Studi di Milano Corso di Laurea Magistrale in Farmacia Aspetti di economia e marketing dei medicinali e medicinali generici - Modulo: Medicinali generici Prof. Andrea Gazzaniga Richiami di Statistica Dott. Matteo Cerea Basic statistics Dr. Matteo Cerea, PhD Why Statistics? Two Purposes 1. Descriptive Finding ways to summarize the important characteristics of a dataset 2. Inferential How (and when) to generalize from a sample dataset to the larger population Descriptive Statistics Provides graphical and numerical ways to organize, summarize, and characterize a dataset. VARIABLE A characteristic or a property that can vary in value among subjects in a sample or a population. Ex: • The weight of tablets in a batch • The concentration of drug in plasma in patients after the administration of a fixed dose Types of Variables Predictor variable: The antecedent conditions that are going to be used to predict the outcome of interest. If an experimental study, then called an “independent variable”. x Outcome variable: The variable you want to be able to predict. If an experimental study, then called a “dependent variable”. y=f(x) Continuous variable: Can assume an infinite number of possible values that fall between any two observed values: the lowest and the highest. Ex: the drug content of tablets in a batch expressed as microgr Ranked variable: Are continuous variables although they do not represent physical measurement, such scale represent numerically ordered system. Ex: 0 1 2 3 4 5 no encrustation microscopic deposits on <50% of the stent microscopic deposits on >50% of the stent small macroscopic deposits on <50% of the stent small macroscopic deposits on >50% of the stent heavy macroscopic deposits Discrete (discontinous, meristic) variable: Consists of separate, indivisible categories. Discrete variables have integer numbers Ex: # of asthma attacks, # of fatalities, # of colonies of microrganisms Nominal variable (categorical): Cannot be measured because of their qualitative nature Ex: sex, gender, side effects associated with the treatment Nominal ranked (ordinal) variables Ex: side effects associated with the treatment, if ordered How to present data? 1.Describing data with tables and graphs (quantitative or categorical variables) 2.Numerical descriptions of center, variability, position (quantitative variables) 3.Bivariate descriptions (in practice, most studies have several variables) 1. Tables and Graphs There are several types of graphs or plots employed to display scientific data: • Graphs or plots that are employed to describe relationships between a fixed (independent) variable and a dependent variable • Graphs that are employed to pictorially describe distributions of data Frequency distribution lists possible values of variable (or intervals) and number of times each occurs Example Pharmaceutical Statistics, David Jones, Pharmaceutical Press http://books.google.co.ve/books?id=oXZD1GPOJIcC&printsec=frontcover&hl=it&sou rce=gbs_ge_summary_r&cad=0#v=onepage&q&f=false Frequency Tables Frequency Distribution Histogram: Bar graph of frequencies or percentages or relative frequency Frequency Distribution Frequency Distribution 309,1-310,0 308,1-309,0 307,1-308,0 306,1-307,0 305,1-306,0 304,1-305,0 303,1-304,0 302,1-303,0 301,1-302,0 300,1-301,0 299,1-300,0 298,1-299,0 297,1-298,0 296,1-297,0 295,1-296,0 294,1-295,0 293,1-294,0 292,1-293,0 291,1-292,0 290,1-291,0 Relative Frequency Distribution Proportion of tablets in Interval 0,180 0,160 0,140 0,120 0,100 0,080 0,060 0,040 0,020 0,000 Cumulative frequency distribution data Less than More than Cumulative frequency distribution graph Less than More than Qualitative or nominal Civil status X : disconnected qualitative o determined on nominal scale Ex. 4 different possibilities x1 = N x2 = C x3 = V x4 = S xi : k values of X ni : frequency of xi n: total number of observations fi = ni /n relative frequency; pi percent frequency Distribution of frequency of X is xi ni N C V S 6 7 4 3 n=20 fi = ni /n pi = fi · 100% 0.30 0.35 0.20 0.15 30 35 20 15 1.00 100 Pie chart Ex. Annual income in thousand Euro W : continuous quantitative Data (k = 20) are devided in classes (4) ai : wideness (amplitude) of each class li = ni /ai : frequency density Frequency table: xi ni fi Ni ai li 40 ⊣ 50 50 ⊣ 58 58 ⊣ 70 70 ⊣ 95 3 6 4 7 20 0.15 0.30 0.20 0.35 1.00 3 9 13 20 10 8 12 25 0.30 0.75 0.33 0.28 2. Descriptive Measures Numerical descriptions Let X denote a quantitative variable, with observations X1 , X2 , X3 , … , Xn • a-Central Tendency measures. They are computed to give a “center” around which the measurements in the data are distributed. • b-Variation or Variability measures. They describe “data spread” or how far away the measurements are from the center. • c-Relative Standing measures. They describe the relative position of specific measurements in the data. a. Measures of Central Tendency • Mean (average): Sum of all measurements divided by the number of measurements. N is the number of observations • Weighted mean: Each datum point does not contribute proportionally w is the frequency a. Measures of Central Tendency • Median: The central number of a set of data arranged in order of magnitude. • Mode: The most frequent measurement in the data. Calculation of the mean = (2+0+0+6+4+24+9+6+1+0) 10 Calculation of the median 0, 0, 0, 1, 2, 4, 6, 6, 9, 24 3 = 5.2 Properties of mean and median • For symmetric distributions, mean = median • For skewed distributions, mean is drawn in direction of longer tail, relative to median • Mean valid for continuous scales, median for continuous or ordinal scales • Mean sensitive to “outliers” (median often preferred for highly skewed distributions) • When distribution symmetric or mildly skewed or discrete with few values, mean preferred because uses numerical values of observations Ex. Tmax, median (range); AUC, mean (std.dev) In other words… • When the Mean is greater than the Median the data distribution is skewed to the Right. • When the Median is greater than the Mean the data distribution is skewed to the Left. • When Mean and Median are very close to each other the data distribution is approximately symmetric. b. Describing variability Range: Difference between largest and smallest observations (but highly sensitive to outliers, insensitive to shape). Used for non-normally distributed data (Ex: tmax) Mean deviation: The average distance from the mean The deviation of observation j from the mean is yj-y Mean deviation: The average distance from the mean MD: (𝑋𝑗−𝑋𝑚) 𝑁 # drug content Absolute values of errors 1 100,6 0,1 2 98,3 2,2 3 98,9 1,6 4 95,1 5,4 5 104,5 4,0 6 105,5 5,0 mean 100,5 sum 18,3 MD 3,1 Variance: is the “sums of squares” (SS) of the sample ( yi y ) ( y1 y ) ... ( yn y ) s n 1 n 1 2 2 2 2 • It is a measure of “spread”: the larger the deviations (positive or negative) the larger the variance The variance of a sample is the “sums of squares” (SS) 𝑆𝑆 = 𝑌𝑗 − 𝑌 2 The mean sums of squares σ 2= 𝑌𝑗−µ 2 𝑁 The variance of a sample of a population s 2= 𝑌𝑗 −𝑌 2 𝑁−1 The variance of a sample is the “sums of squares” (SS) 𝑆𝑆 = 𝑌𝑗 − 𝑌 2 The mean sums of squares σ 2= 𝑌𝑗−µ 2 𝑁 Population The variance of a sample of a population s 2= 𝑌𝑗 −𝑌 2 𝑁−1 Sample The variance of a sample is the “sums of squares” (SS) ( yi y ) ( y1 y ) ... ( yn y ) s n 1 n 1 2 2 2 2 Standard deviation: s is the square root of the variance, s s 2 • It is a measure of “spread”: the larger the deviations (positive or negative) the larger the variance The variance of a population is ( yi my ) ( y1 y ) ... ( yn y ) s n n 1 n 1 2 2 2 2 2 s The standard deviation s is the square root of the variance, ss 2 2 ss Ex: sample of a population: 100 tablets (sample) are removed from a batch of 1.000.000 tablets (population) and tested. The variance of a single random sample of measurements does not provide a good estimation of variance of the population from which the sample was derived A good estimation of population variance can be achieved from sample data if an average of several sample variance is calculated Concentration of a penicillin antibiotic in 5 bottles Bottle # Concentration of penicillin (mg/5mL) Contribute to the variance 1 125 134,56 2 124 123,21 3 121 92,16 4 123 112,36 5 16 1840,41 mean (mg/ 5mL) Total variance (sample) 101,8 2302,7 standard deviation 48,0 median 123 Standard deviation (error) of the mean: SEM Concentration of amoxicillin in 5 aliquots tested 5 times N = observation in sample Aliquot 1 mean s Aliquot 2 Aliquot 3 Aliquot 4 Aliquot 5 25,1 27,6 24,3 23,9 25,7 25,4 25,5 26,4 24,9 23,5 21,9 25,6 25,1 26,1 24,2 24,5 25 27,1 27 25,7 23,1 24,2 25 25,2 24,3 24,0 25,6 25,6 25,4 24,7 1,5 1,3 1,1 1,2 1,0 Mean total 25,1 SEM 0,70 Standard deviation (error) of the mean: SEM SEM = s/ 𝑁 s = standard deviation of the sample N = # of observations of the sample Standard deviation (error) of the mean: SEM Concentration of amoxicillin in 5 aliquots Aliquot 1 mean s Aliquot 2 Aliquot 3 Aliquot 4 Aliquot 5 25,1 27,6 24,3 23,9 25,7 25,4 25,5 26,4 24,9 23,5 21,9 25,6 25,1 26,1 24,2 24,5 25 27,1 27 25,7 23,1 24,2 25 25,2 24,3 24,0 25,6 25,6 25,4 24,7 1,5 1,3 1,1 1,2 1,0 Mean total 25,1 SEM 0,70 SEM estimed 0,66 Standard deviation of a sample is an estimation of the variability of a population the value does not reduce if the # of observations increases Standard deviation of the mean is an measure of the variability (precision) of the estimation of a defined population parameter (i.e. the mean) As the size of the sample increases, the magnitude of the standard error decreases Coefficient of variation (CV) 𝑆 CV (%) = x 100 𝑋 Accuracy: the closenes of a measured value to the true value (the value in absence of error) Absolute error: error abs = O - E E (true value, exact) O (observed value, or mean) Relative error: error rel = error abs = O – E E E Precision: describes the dispersion (variability) of a set of measurements Typically precision is associated with low dispersion of the value around a central value (low standard deviations) Accuracy: Accuracy is how close a measurement is to the "true" value. In a laboratory setting this is often how far a measured value is from a standard with a known value that was measured by different technology or on a different instrument. Precision: Precision is how close repeated measurements are to each other. Precision has no bearing on a target value, it is simply how close multiple measurements are together. Reproducibility is key to scientific research and precision is important in this aspect. c. Measures of position pth percentile: p percent of observations below it, (100 - p)% above it. • Example, if in a certain data the 85th percentile is 340 means that 15% of the measurements in the data are above 340. It also means that 85% of the measurements are below 340 • Notice that the median is the 50th percentile p = 50: median p = 25: lower quartile (LQ) p = 75: upper quartile (UQ) Interquartile range IQR = UQ - LQ Quartiles portrayed graphically by box plots (John Tukey) Example: weekly TV watching for n=60 from student survey data file, 3 outliers Box plots have box from LQ to UQ, with median marked. They portray a five-number summary of the data: Minimum, LQ, Median, UQ, Maximum except for outliers identified separately Outlier = observation falling below LQ – 1.5(IQR) or above UQ + 1.5(IQR) Ex. If LQ = 2, UQ = 10, then IQR = 8 and outliers above 10 + 1.5(8) = 22 Normal distribution curve Normal distribution curve In a normal distribution of data, also known as a bell curve, - the majority of the data in the distribution approximately 68% will fall within plus or minus one standard deviation of the statistical average. This means that if the standard deviation of a data set is 2, for example, the majority of data in the set will fall within +2 and -2 the average. -95.5% of normally distributed data is within two standard deviations of the mean, -over 99% are within three For any data • At least 75% of the measurements differ from the mean less than twice the standard deviation. • At least 89% of the measurements differ from the mean less than three times the standard deviation. Inferential Statistics …mathematical tools that permit the researcher to generalize to a population of individuals based upon information obtained from a limited number of research participants (observation) Sample statistics / Population parameters • We distinguish between summaries of samples (statistics) and summaries of populations (parameters). • Common to denote statistics by Roman letters, parameters by Greek letters: Population mean =m, standard deviation = s proportion p In practice, parameter values unknown, we make inferences about their values using sample statistics. Sample proportion Definition. The statistic that estimates the parameter 𝜋, a proportion of a population that has some property, is the sample proportion p 𝑝= 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 The sample mean X estimates the population mean m (quantitative variable) The sample standard deviation s estimates the population standard deviation s (quantitative variable) A sample proportion p estimates a population proportion π (categorical variable) Ex. From a population of n individuals 2 samples of 100 individuals are extracted. mean height of the first sample X1 = 168 cm mean height of the first sample X2 = 162 cm … until all the samples are estracted. The sample proportion estimates the mean of population but with uncertainty. The uncertainty dependes upon: 1 - sample dimension 2 - variability of the population the means are different, but how are they distributed? population sample n1 sample n2>n1 Standard deviation of sampling distribution Standard Error SE = s/ 𝑁 SEM = s/ 𝑁 𝑝(1 − 𝑝) 𝑛 4. Probability Distributions Probability: With random sampling or a randomized experiment, the probability an observation takes a particular value is the proportion of times that outcome would occur in a long sequence of observations. Usually corresponds to a population proportion (and thus falls between 0 and 1) for some real or conceptual population. A probability distribution lists all the possible values and their probabilities (which add to 1.0) Basic probability rules Let A, B denotes possible outcomes • • • • P(not A) = 1 – P(A) For distinct (separate) possible outcomes A and B, P(A or B) = P(A) + P(B) If A and B not distinct, P(A and B) = P(A) x P(B given A) For “independent” outcomes, P(B given A) = P(B), so P(A and B) = P(A) x P(B). Probability distribution of a variable Lists the possible outcomes for the “random variable” and their probabilities Discrete variable: Assign probabilities P(y) to individual values y, with 0 P( y) 1, P( y) 1 In practice, probability distributions are often estimated from sample data, and then have the form of frequency distributions Like frequency distributions, probability distributions have descriptive measure like mean and standard deviation m E (Y ) yP( y) Expected value Standard Deviation - Measure of the “typical” distance of an outcome from the mean, denoted by σ If a distribution is approximately bell-shaped, then: • all or nearly all the distribution falls between µ - 3σ and µ + 3σ • Probability about 0.68 falls between µ - σ and µ + σ • • • • Continuous variables: Probabilities assigned to intervals of numbers Most important probability distribution for continuous variables is the normal distribution Symmetric, bell-shaped Characterized by mean (m) and standard deviation (s), representing center and spread Probability within any particular number of standard deviations of m is same for all normal distributions An individual observation from an approximately normal distribution has probability 0.68 of falling within 1 standard deviation of mean 0.95 of falling within 2 standard deviations 0.997 of falling within 3 standard deviations The normal curve is often called the Gaussian distribution, after Carl Friedrich Gauss, who discovered many of its properties. Gauss, commonly viewed as one of the greatest mathematicians of all time, was honoured by Germany on their 10 Deutschmark bill. Properties (cont.) • Has a mean = 0 and standard deviation = 1. • General relationships: ±1 s = about 68.26% ±2 s = about 95.44% ±3 s = about 99.72% 68.26% 95.44% 99.72% -5 -4 -3 -2 -1 0 1 2 3 4 5 Notes about z-scores Are a way of determining the position of a single score under the normal curve. Measured in standard deviations relative to the mean of the curve. The Z-score can be used to determine an area under the curve known as a probability. z = (y - µ) σ • The standard normal distribution is the normal distribution with µ = 0, σ = 1 For that distribution, z = (y - µ)/σ = (y - 0)/1 = y i.e., original score = z-score µ + zσ = 0 + z(1) = z • Why is normal distribution so important? If different studies take random samples and calculate a statistic (e.g. sample mean) to estimate a parameter (e.g. population mean), the collection of statistic values from those studies usually has approximately a normal distribution. (So?) Notes about z-scores Ex. 5000 tablets produced and assayed for content. The mean (µ ± σ) is 200 ± 10 mg and the concentration is normally distributed. Calculate the proportion of tablets that contain 180 mg or less. 68.26% z = (y - µ) σ 95.44% 99.72% -5 -4 -3 -2 -1 -40 -30 -20 -10 160 170 180 190 0 5 z 1 2 3 4 0 10 20 30 40 y-µ (mg) 200 210 220 240 240 Y (mg) z= 𝑦−µ σ 180−200 10 = = -2.00 68.26% 95.44% 99.72% -5 -4 -3 -2 -1 0 1 2 3 4 5 probability distribution z-score table (z probability table) (http://www.bucknam.com/zprob.html) z= - 2.00, probability below 0.023% A sampling distribution lists the possible values of a statistic (e.g., sample mean or sample proportion) and their probabilities How close is sample mean Ῡ to population mean µ? To answer this, we must be able to answer, “What is the probability distribution of the sample mean?” Sampling distribution of sample mean • y is a variable, its value varying from sample to sample about the population mean µ • Standard deviation of sampling distribution of y is called the standard error of y • For random sampling, the sampling distribution of y has mean µ and standard error s population standard deviation sy n sample size Central Limit Theorem: For random sampling with “large” n, the sampling distribution of the sample mean is approximately a normal distribution • Approximate normality applies no matter what the shape of the population distribution. • How “large” n needs to be depends on skew of population distribution, but usually n ≥ 30 sufficient 5. Statistical Inference: Estimation Goal: How can we use sample data to estimate values of population parameters? Point estimate: A single statistic value that is the “best guess” for the parameter value Interval estimate: An interval of numbers around the point estimate, that has a fixed “confidence level” of containing the parameter value. Called a confidence interval. (Based on sampling distribution of the point estimate) Point Estimators – Most common to use sample values • Sample mean estimates population mean m y mˆ y i n • Sample std. dev. estimates population std. dev. s sˆ s 2 ( y y ) i n 1 • Sample proportion pˆ estimates population proportion π Confidence Intervals • A confidence interval (CI) is an interval of numbers believed to contain the parameter value. Ex. When public health practitioners use health statistics, sometimes they are interested in the actual number of health events, but more often they use the statistics to assess the true underlying risk of a health problem in the community. Statistical sampling theory is used to compute a confidence interval to provide an estimate of the potential discrepancy between the true population parameters and observed rates. Understanding the potential size of that discrepancy can provide information about how to interpret the observed statistic. Confidence Intervals • A confidence interval (CI) is an interval of numbers believed to contain the parameter value. • The probability the method produces an interval that contains the parameter is called the confidence level. Most studies use a confidence level close to 1, such as 0.95 or 0.99. • Most CIs have the form point estimate ± margin of error with margin of error based on spread of sampling distribution of the point estimator; e.g., margin of error ±2(standard error) for 95% confidence. Confidence Intervals A 95% confidence interval for a percentage is the range of scores within which the percentage will be found if you went back and got a different sample from the same population. • The sampling distribution of a sample proportion for large random samples is approximately normal (Central Limit Theorem) • So, with probability 0.95, sample proportion pˆ falls within 1.96 standard errors of population proportion π • 0.95 probability that z= 1.96 probability of 97.5% (or better 0.975) pˆ falls between p 1.96s pˆ and p 1.96s pˆ • Once sample selected, we’re 95% confident pˆ 1.96s pˆ to pˆ 1.96s pˆ contains p This is the CI for the population proportion π (almost) Finding a CI in practice • Complication: The true standard error se = s pˆ s / n p (1 p ) / n 𝑝 1−𝑝 𝑛 itself depends on the unknown parameter! In practice, we estimate s pˆ p (1 p ) n by se pˆ 1 pˆ n and then find the 95% CI using the formula pˆ 1.96(se) to pˆ 1.96(se) (1-p)= q Greater confidence requires wider CI Greater sample size gives narrower CI (quadruple n to halve width of CI) Some comments about CIs • Effects of n, confidence coefficient true for CIs for other parameters also • If we repeatedly took random samples of some fixed size n and each time calculated a 95% CI, in the long run about 95% of the CI’s would contain the population proportion π. • The probability that the CI does not contain π is called the error probability, and is denoted by α. • α = 1 – confidence coefficient (1-)100% 90% 95% 99% /2 .10 .05 .01 .050 .025 .005 z/2 1.645 1.96 2.58 Confidence Interval for the Mean • In large random samples, the sample mean has approximately a normal sampling distribution with mean m and standard error s sy n • Thus, P(m 1.96s y y m 1.96s y ) .95 • We can be 95% confident that the sample mean lies within 1.96 standard errors of the (unknown) population mean • Problem: Standard error is unknown (s is also a parameter). It is estimated by replacing s with its point estimate from the sample data: s se n 95% confidence interval for m : s y 1.96(se), which is y 1.96 n This works ok for “large n,” because s then a good estimate of σ (and CLT applies). But for small n, replacing σ by its estimate s introduces extra error, and CI is not quite wide enough unless we replace z-score by a slightly larger “t-score.” The t distribution (Student’s t) The t distribution is used instead of the normal distribution whenever the standard deviation is estimated. • Bell-shaped, symmetric about 0 • Standard deviation a bit larger than 1 (slightly thicker tails than standard normal distribution, which has mean = 0, standard deviation = 1) • Precise shape depends on degrees of freedom (df). For inference about mean, df = n – 1 • Gets narrower and more closely resembles standard normal distribution as df increases (nearly identical when df > 30) • CI for mean has margin of error t(se), (instead of z(se) as in CI for proportion) The t distribution (Student’s t) Part of a t table df 1 10 16 30 100 infinity Confidence Level 90% 95% 98% t.050 t.025 t.010 6.314 12.706 31.821 1.812 2.228 2.764 1.746 2.120 2.583 1.697 2.042 2.457 1.660 1.984 2.364 1.645 1.960 2.326 99% t.005 63.657 3.169 2.921 2.750 2.626 2.576 df = ∞ corresponds to standard normal distribution CI for a population mean • For a random sample from a normal population distribution, a 95% CI for µ is y t.025 (se), with se s / n where df = n - 1 for the t-score • Normal population assumption ensures sampling distribution has bell shape for any n. Comments about CI for population mean µ • The method is robust to violations of the assumption of a normal population distribution (But, be careful if sample data distribution is very highly skewed, or if severe outliers. Look at the data.) • Greater confidence requires wider CI • Greater n produces narrower CI • t methods developed by the statistician William Gosset of Guinness Breweries, Dublin (1908) Choosing the Sample Size • Determine parameter of interest (population mean or population proportion) • Select a margin of error (M) and a confidence level (determines z-score) Proportion (to be “safe,” set p = 0.50): Mean (need a guess for value of s): z n p (1 p ) M z n s M 2 2 2 • We’ve seen that n depends on confidence level (higher confidence requires larger n) and the population variability (more variability requires larger n) • In practice, determining n not so easy, because (1) many parameters to estimate, (2) resources may be limited and we may need to compromise • CI’s can be formed for any parameter. Using CI Inference in Practice • What is the variable of interest? quantitative – inference about mean categorical – inference about proportion • Are conditions satisfied? Randomization (why? Needed so sampling distribution and its standard error are as advertised) 6. Statistical Inference: Significance Tests Goal: Use statistical methods to test hypotheses such as “For treating anorexia, cognitive behavioral and family therapies have same mean weight change as placebo” (no effect) “Mental health tends to be better at higher levels of socioeconomic status (SES)” (i.e., there is an effect) “Spending money on other people has a more positive impact on happiness than spending money on oneself.” Hypotheses: For statistical inference, these are predictions about a population expressed in terms of parameters (e.g., population means or proportions or correlations) for the variables considered in a study A significance test uses data to evaluate a hypothesis by comparing sample point estimates of parameters to values predicted by the hypothesis. We answer a question such as, “If the hypothesis were true, would it be unlikely to get data such as we obtained?” Five Parts of a Significance Test • Assumptions about type of data (quantitative, categorical), sampling method (random), population distribution (e.g., normal, binary), sample size (large enough?) • Hypotheses: Null hypothesis (H0): A statement that parameter(s) take specific value(s) (Usually: “no effect”) Alternative hypothesis (Ha): states that parameter value(s) falls in some alternative range of values (an “effect”) • Test Statistic: Compares data to what null hypotesis H0 predicts, often by finding the number of standard errors between sample point estimate and H0 value of parameter • P-value (P): A probability measure of evidence about H0. The probability (under presumption that H0 true) the test statistic equals observed value or value even more extreme in direction predicted by Ha. – The smaller the P-value, the stronger the evidence against H0. • Conclusion: – If no decision needed, report and interpret P-value – If decision needed, select a cutoff point (such as 0.05 or 0.01) and reject H0 if P-value ≤ that value – The most widely accepted cutoff point is 0.05, and the test is said to be “significant at the .05 level” if the Pvalue ≤ 0.05. – If the P-value is not sufficiently small, we fail to reject H0 (then, H0 is not necessarily true, but it is plausible) – Process is analogous to American judicial system • H0: Defendant is innocent • Ha: Defendant is guilty Fine prima parte Significance Test for Mean • Assumptions: Randomization, quantitative variable, normal population distribution (robustness?) • Null Hypothesis: H0: µ = µ0 where µ0 is particular value for population mean (typically “no effect” or “no change” from a standard) • Alternative Hypothesis: Ha: µ µ0 2-sided alternative includes both > and < H0 value • Test Statistic: The number of standard errors that the sample mean falls from the H0 value y m0 t where se s / n se When H0 is true, the sampling distribution of the t test statistic is the t distribution with df = n - 1. • P-value: Under presumption that H0 true, probability the t test statistic equals observed value or even more extreme (i.e., larger in absolute value), providing stronger evidence against H0 – This is a two-tail probability, for the two-sided Ha • Conclusion: Report and interpret P-value. If needed, make decision about H0 Making a decision: The α-level is a fixed number, also called the significance level, such that if P-value ≤ α, we “reject H0” If P-value > α, we “do not reject H0” Note: We say “Do not reject H0” rather than “Accept H0” because H0 value is only one of many plausible values. A high significance level means there is a large chance that the experiment proves something that is not true. A very small significance level assures the statistician that there is little room to doubt the results. Effect of sample size on tests • With large n (say, n > 30), assumption of normal population distribution not important because of Central Limit Theorem. • For small n, the two-sided t test is robust against violations of that assumption. However, one-sided test is not robust. • For a given observed sample mean and standard deviation, the larger the sample size n, the larger the test statistic (because se in denominator is smaller) and the smaller the P-value. (i.e., we have more evidence with more data) • We’re more likely to reject a false H0 when we have a larger sample size (the test then has more “power”) • With large n, “statistical significance” not the same as “practical significance.” Significance Test for a Proportion π • Assumptions: – Categorical variable – Randomization – Large sample (but two-sided ok for nearly all n) • Hypotheses: – Null hypothesis: H0: p p0 – Alternative hypothesis: Ha: p p0 (2-sided) – Ha: p > p0 Ha: p < p0 (1-sided) – Set up hypotheses before getting the data • Test statistic: Note pˆ p 0 pˆ p 0 z s p 0 (1 p 0 ) / n pˆ s pˆ se0 p 0 (1 p 0 ) / n , not se pˆ (1 pˆ ) / n as in a CI As in test for mean, test statistic has form (estimate of parameter – H0 value)/(standard error) = no. of standard errors the estimate falls from H0 value • P-value: Ha: p p0 P = 2-tail prob. from standard normal dist. Ha: p > p0 P = right-tail prob. from standard normal dist. Ha: p < p0 P = left-tail prob. from standard normal dist. • Conclusion: As in test for mean (e.g., reject H0 if P-value ≤ α) Decisions in Tests -level (significance level): Pre-specified “hurdle” for which one rejects H0 if the P-value falls below it (Typically 0.05 or 0.01) P-Value .05 > .05 H0 Conclusion Reject Do not Reject Ha Conclusion Accept Do not Accept • Rejection Region: Values of the test statistic for which we reject the null hypothesis For 2-sided tests with = 0.05, we reject H0 if |z| 1.96 Error Types • Type I Error: Reject H0 when it is true • Type II Error: Do not reject H0 when it is false Test Result – Reject H0 Don’t Reject H0 Reality H0 True Type I Error Correct H0 False Correct Type II Error P(Type I error) • Suppose -level = 0.05. Then, P(Type I error) = P(reject null, given it is true) = P(|z| > 1.96) = 0.05 • i.e., the -level is the P(Type I error). • Since we “give benefit of doubt to null” in doing test, it’s traditional to take small, usually 0.05 but 0.01 to be very cautious not to reject null when it may be true. • As in CIs, don’t make too small, since as goes down, β = P(Type II error) goes up (Think of analogy with courtroom trial) • Better to report P-value than merely whether reject H0 P(Type II error) • P(Type II error) = b depends on the true value of the parameter (from the range of values in Ha ). • The farther the true parameter value falls from the null value, the easier it is to reject null, and P(Type II error) goes down. • Power of test = 1 -α = P(reject null, given it is false) • In practice, you want a large enough n for your study so that P(Type II error) is small for the size of effect you expect. Practical Applications of Statistics researchers use a test of significance to determine whether to reject or fail to reject the null hypothesis …involves pre-selecting a level of probability, “α” (e.g., α = .05) that serves as the criterion to determine whether to reject or fail to reject the null hypothesis Steps in using inferential statistics… 1. select the test of significance 2. determine whether significance test will be twotailed or one tailed 3. select α (alpha), the probability level 4. compute the test of significance 5. consult table to determine the significance of the results Tests of significance... • statistical formulas that enable the researcher to determine if there was a real difference between the sample means different tests of significance account for different factors including: the scale of measurement represented by the data; method of participant selection, number of groups being compared, and, the number of independent variables …the researcher must first decide whether a parametric or nonparametric test must be selected parametric test... • assumes that the variable measured is normally distributed in the population • the selection of participants is independent • the variances of the population comparison groups are equal used when the data represent a interval or ratio scale nonparametric test... makes no assumption about the distribution of the variable in the population, that is, the shape of the distribution • used when the data represent a nominal or ordinal scale, when a parametric assumption has been greatly violated, or when the nature of the distribution is not known • • usually requires a larger sample size to reach the same level of significance as a parametric test • The most common tests of significance… t-test z-test ANOVA Chi Square t-test... …used to determine whether two means are significantly different at a selected probability level …adjusts for the fact that the distribution of scores for small samples becomes increasingly different from the normal distribution as sample sizes become increasingly smaller the strategy of the t-test is to compare the actual mean difference observed to the difference expected by chance t-test... forms a ratio where the numerator is the difference between the sample means and the denominator is the chance difference that would be expected if the null hypothesis were true after the numerator is divided by the denominator, the resulting t value is compared to the appropriate t table value, depending on the probability level and the degrees of freedom if the t value is equal to or greater than the table value, then the null hypothesis is rejected because the difference is greater than would be expected due to chance (t stat > t tab, H0 rejected) t-test... there are two types of t-tests: the t-test for independent samples (randomly formed) the t-test for nonindependent samples (nonrandomly formed, e.g., matching, performance on a pre-/posttest, different treatments) t-test... Ex. t-test Z-test... Ex. Z-test ANOVA (Analysis of Variance) used to determine whether two or more means are significantly different at a selected probability level avoids the need to compute duplicate t-tests to compare groups (more than 2) the strategy of ANOVA is that total variation, or variance, can be divided into two sources: treatment variance “between groups,” variance caused by the treatment groups error variance “within groups” variance ANOVA (Analysis of Variance) forms a ratio, the F ratio, with the treatment variance as the numerator (between group variance) and error variance as the denominator (within group variance) …the assumption is that randomly formed groups of participants are chosen and are essentially the same at the beginning of a study on a measure of the dependent variable …at the study’s end, the question is whether the variance between the groups differs from the error variance by more than what would be expected by chance • if the treatment variance is sufficiently larger than the error variance, a significant F ratio results, that is, the null hypothesis is rejected and it is concluded that the treatment had a significant effect on the dependent variable • if the treatment variance is not sufficiently larger than the error variance, an insignificant F ratio results, that is, the null hypothesis is accepted and it is concluded that the treatment had no significant effect on the dependent variable when the F ratio is significant and more than two means are involved, researchers use multiple comparison procedures (e.g., Scheffé test, Tukey’s HSD test, Duncan’s multiple range test) ANOVA (Analysis of Variance) Ex. ANOVA One-way Two-way Chi Square (Χ2) Pearson’s test a nonparametric test of significance appropriate for nominal or ordinal data that can be converted to frequencies It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances. Chi Square (Χ2)... compares the proportions actually observed (O) to the proportions expected (E) to see if they are significantly different …the chi square value increases as the difference between observed and expected frequencies increases One- and two- tailed tests of significance... • tests of significance that indicate the direction in which a difference may occur …the word “tail” indicates the area of rejection beneath the normal curve • Ho : The two variables are independent • Ha : The two variables are associated Chi Square (Χ2) calculations • Contrasts observed frequencies in each cell of a contingency table with expected frequencies. • The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). • Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N ( Fo Fe ) Fe 2 2 Determine Degrees of Freedom • df = (R-1)(C-1) Compare computed test statistic against a tabled/critical value • The computed value of the Pearson chisquare statistic is compared with the critical value to determine if the computed value is improbable • The critical tabled values are based on sampling distributions of the Pearson chisquare statistic • If calculated 2 is greater than 2 table value, reject Ho • A=B no difference between means; the direction can be positive or negative direction can be in either tail of the normal curve called a “two-tailed” test divides the α level between the two tails of the normal curve • A > B or A < B there is a difference between means; the direction is either positive or negative called a “one-tailed” test the α level is found in one tail of the normal curve Ex. Chi Square (Χ2) 3. Bivariate description • Usually we want to study associations between two or more variables (e.g., how does number of close friends depend on gender, income, education, age, working status, rural/urban, religiosity…) • Response variable: the outcome variable • Explanatory variable(s): defines groups to compare Ex.: number of close friends is a response variable, while gender, income, … are explanatory variables Response var. also called “dependent variable” Explanatory var. also called “independent variable” Summarizing associations: • Categorical var’s: show data using contingency tables • Quantitative var’s: show data using scatterplots • Mixture of categorical var. and quantitative var. (e.g., number of close friends and gender) can give numerical summaries (mean, standard deviation) or side-by-side box plots for the groups Contingency Tables • Cross classifications of categorical variables in which rows (typically) represent categories of explanatory variable and columns represent categories of response variable. • Counts in “cells” of the table give the numbers of individuals at the corresponding combination of levels of the two variables Another Example Heparin Lock Placement Time: 1 = 72 hrs 2 = 96 hrs Complication Incidence * Heparin Lock Placement Time Group Crosstabulation Complication Incidence Had Compilca Had NO Compilca Total Count Expected Count % within Heparin Lock Placement Time Group Count Expected Count % within Heparin Lock Placement Time Group Count Expected Count % within Heparin Lock Placement Time Group Heparin Lock Placement Time Group 1 2 9 11 10.0 10.0 Total 20 20.0 18.0% 22.0% 20.0% 41 40.0 39 40.0 80 80.0 82.0% 78.0% 80.0% 50 50.0 50 50.0 100 100.0 100.0% 100.0% 100.0% 147 Hypotheses in Heparin Lock Placement • Ho: There is no association between complication incidence and length of heparin lock placement. (The variables are independent). • Ha: There is an association between complication incidence and length of heparin lock placement. (The variables are related). 148 More of SPSS Output 149