Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Student's t-test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
CONFIDANCE INTERVALS The Aim By the end of this lecture, the students will be aware of confidance intervals 2 The Goals • • • • To define the confidence interval To calculate confidence interval for the mean To calculate confidence interval for the proportion To define the confidence interval’s relations with the theoretical distribution • To calculate the standard error of differences • To calculate relative deviates *RD=(mean1 – mean2) / standard error of differences • Interpretation of confidence intervals. *Wile examining the difference between groups, confidence interval contans 0 * At Risk assessment (Odds ratio) confidence interval contains 1 • To explain degrees of freedom • To calculate the confidance intervals of the means and differance of the means by using SPSS 3 3 Confidence intervals ● Confidence interval for the mean -Using the Normal distribution -Using the t-distribution ● Confidence interval for the proportion ● Confidence interval for the differances -for numerical data -for proportions ● Confidance intervals for odds ratio ● Interpretation of confidance intervals ● Degrees of freedom=df ● Applying the relative deviate formula for confidence interval of the differences 4 •Once we have taken a sample from our population, we obtain a point estimate of the parameter of interest, and calculate its standard error to indicate the precision of the estimate. •However, to most people the standard error is not, by itself, particularly useful. •It is more helpful to incorporate this measure of precision into an interval estimate for the population parameter. 5 5 •We do this by making use of our knowledge of the theoretical probability distribution of the sample statistic to calculate a confidence interval for the parameter. •Generally the confidence interval extends either side of the estimate by some multiple of the standard error; the two values (the confidence limits) which define the interval are generally separated by a comma, a dash or the word 'to' and are contained in brackets. 6 6 Confidence interval for the mean Using the Normal distribution •In the previous lecture we stated that the sample mean follows a Normal distribution if the sample size is large. • Therefore we can make use of the properties of the Normal distribution when considering the sample mean. •In particular, 95% of the distribution of sample means lies within 1.96 standard deviations (SD) of the population mean. •We call this SD the standard error of the mean (SEM), and when we have a single sample, the 95% confidence interval (Cl) for the mean is: • (Sample mean -(1.96 x SEM) to Sample mean + (l.96 x SEM)) 7 7 Confidence interval for the mean Using the Normal distribution •If we were to repeat the experiment many times, the range of values determined in this way would contain the true population mean on 95% of occasions. •This range is known as the 95% confidence interval for the mean. •We usually interpret this confidence interval as the range of values within which we are 95% confident that the true population mean lies. •Although not strictly correct (the population mean is a fixed value and therefore cannot have a probability attached to it), we will interpret the confidence interval in this way as it is conceptually easier to understand. 8 8 Confidence interval for the mean Using the t-distribution •Strictly, we should only use the Normal distribution in the calculation if we know the value of the variance, 2, in the population. •Furthermore, if the sample size is small, the sample mean only follows a Normal distribution if the underlying population data are Normally distributed. •Where the data are not Normally distributed, and/or we do not know the population variance but estimate it by s2, the sample mean follows a t-distribution. 9 9 Confidence interval for the mean Using the t-distribution •We calculate the 95% confidence interval for the mean as • where t 0.05 is the percentage point (percentile) of the t-distribution with (n - 1) degrees of freedom which gives a two-tailed probability of 0.05 10 10 Confidence interval for the mean Using the t-distribution • This generally provides a slightly wider confidence interval than that using the Normal distribution to allow for the extra uncertainty that we have introduced by estimating the population standard deviation and/or because of the small sample size. • When the sample size is large, the difference between the two distributions is negligible. 11 11 Confidence interval for the mean Using the t-distribution • Therefore, we always use the t-distribution when calculating a confidence interval for the mean even if the sample size is large. •By convention we usually quote 95% confidence intervals. •We could calculate other confidence intervals, e.g. a 99% confidence interval for the mean. •Instead of multiplying the standard error by the tabulated value of the t-distribution corresponding to a two-tailed probability of 0.05, we multiply it by that corresponding to a two-tailed probability of 0.01 •The 99% confidence interval is wider than a 95% confidence interval, to reflect our increased confidence that the range includes the true population mean. 12 12 Calculation of the confidence interval for the mean by using SPSS • www.aile.net/agep/istat/diyabet.sav • Let’s calculate %95 confidance interval of the mean for age variable. • Analyze > Descriptive Statistics > Explore [“Dependent List” kutusuna “Age” değişkenini koyalım. “Display” kısmında “Statistics” işaretli olsun. >OK. Aşağıdaki çıktıyı elde ederiz: Statistic Age Mean 95% Confidence Interval for Mean 54,44 Lower Bound Upper Bound 55,63 54,33 Median 54,00 Std. Deviation ,603 53,26 5% Trimmed Mean Variance Std. Error 156,069 12,493 Minimum 22 Maximum 99 Range 77 Interquartile Range 18 Skewness ,157 ,118 Kurtosis ,048 ,235 • The borders of 95% confidance intervals are 53.26 and 55.63 • We should use our normal distribution knowledge if we know the variance of the population (σ2). • We should remember that, when the sample size is small, ampric distribution wolud be similar only if the data normally distributed in the population. Let’s calculate the confidance intervals of first and second groups for age by using Table 1 data. Grup 1 Grup 2 Ortalama 1 = 51 Ortalama 2 = 43,76 Standart sapma 1 = 15,39 Standart sapma 2 = 15,07 n1=25 n2 = 25 kişi SD = 24; p = 0,05 için tablo t değeri = 2,064 SD = 24; p = 0,05 için tablo t değeri = 2,064 SEM1 = 15,39 / √25 = 3,078 SEM2 = 15,07 / √25 = 3,014 %95 GA 1 [51 ± 2,064 x 3,078] %95 GA 2 [43,76 ± 2,064 x 3,014] [44,65 – 57,35] [37,54 – 49,98] Kadın Erkek Ortalama 1 = 5,95 Ortalama 2 = 0,75 Standart sapma 1 = 5,19 Standart sapma 2 = 1,14 n1= 38 n2 = 12 kişi SD = 37; p = 0,05 için tablo t değeri ~ 2,02 SD = 11; p = 0,05 için tablo t değeri = 2,201 SEM1 = 5,19 / √38 = 0,84 SEM2 = 0,75 / √12 = 0,22 %95 GA 1 [5,95 ± 2,02 x 0,84] %95 GA 2 [0,75 ± 2,201 x 0,75] [4,25 – 7,65] [-0,90 – 2,40] Confidence interval for the proportion • The sampling distribution of a proportion follows a Binomial distribution. • However, if the sample size, n, is reasonably large, then the sampling distribution of the proportion is approximately Normal with mean μ . 19 19 Confidence interval for the proportion p = r/ n μ: Mean of the population p: Population proportion n: Sample size from the population r: The number of individuals in the sample with the characteristic • We estimate μ by the proportion in the sample, p = r/n (where r is the number of individuals in the sample with the characteristic of interest), and its standard error is 20 20 Confidence interval for the proportion • If the sample size is small (usually when np or n(1 - p) is less than 5) then we have to use the Binomial distribution to calculate exact confidence intervals. • Note that if p is expressed as a percentage, we replace (1 - p) by (100 - p). 21 21 Confidance intervals for the differances (Numerical data) • In order to calculate stadart error for the differances at numerical data two groups: SEM(fark)= √[(s12/n1) + (s22/n2)] • If we aply the example at Tablo 1; SEM = √ [(15,39 x 15,39/25) + (15,07 x 15,07 / 25)] = 4,30 %95 GA = Mean 1 – Mean 2 ± t0,05 x SEM Here degre of freedom is calculated as df= (n1-1) + (n2-1) (24+24)=48 T value is approximately 2,009 for 5% level of significance. 95 % CI = (51 -43,76) ± 2,009 x 4,30 95% CI for the differance between two means: [-1,4 – 15,87] • Note: (While interpreting CI between 2 means, if CI contains ZERO we conclude that there is no significant diference between the means. (i.e. the differance would be +, - or zero. At this time we can not claim that one mean is bigger than the other.) Calculation of the confidence interval for the difference between means by using SPSS • www.aile.net/agep/istat/diyabet.sav • Let us calculate 95% confidance intervals of mean differance between men and women for age varible. • Analyze > Compare Means > Independent-Samples t test [“Test variables” kutusuna “Age” değişkenini, “Grouping variable” kutusuna “sex” değişkenini koyalım. “Define Groups” butonunu tıklayıp “Group 1” için 1, “Group 2” için 2 yazalım > Continue > OK. Aşağıdaki çıktıyı elde ederiz: Sex of the patient Age N Mean Std. Deviation Std. Error Mean Male 235 56,20 12,662 ,826 Female 194 52,31 11,975 ,860 • The Confidance interval of the age differance of men and women is [1,53-6,24]. Confidence intervals for the differences (Proportions) -While we deal with categorical data, the standard error for proportion differences SEM(fark) = √[(p1q1/n1) + (p2q2/n2)] formula is used. According to Tablo 1 1. Group cotains 20 women 2. Group cotains 18 women -Let us calculate the CI for gender of these groups SEM(fark) = √[((20/25)x(5/25)/25) + ((18/25)x(7/25)/25)] = 0,12 %95 GA = %95 GA = (0,8 - 0,72) ± 1,96 x 0,12 The CI betwen 2 persentages: [-0,16 – 0,32] Confidance interval for Odds ratio • Odds ratio defined as -the ratio of the probability of occurence of the event to -the probability of not occurence of the event. -e.g: the probability of developing cancer among smokers / the probability of developing cancer among nonsmokers. • This ratio is an important parameter that is used for calculation of risk factor. • For example, if the odds ratio of lung cancer is 10, we can make a comment that; smokers develop lung cancer 10 times more than non-smokers. • It would be better to give confidance intervals with odds ratio in researches. • If the confidance interval contains 1, there is no significance in terms of risk Interpretation of confidence intervals When interpreting a confidence interval we are interested in a number of issues. • How wide is it? -A wide interval indicates that the estimate is imprecise; a narrow one indicates a precise estimate. -The width of the confidence interval depends on the size of the standard error, which in turn depends on the sample size and, when considering a numerical variable, the variability of the data. -Therefore, small studies on variable data give wider confidence intervals than larger studies on less variable data. 28 28 Interpretation of confidence intervals • What clinical implications can be derived from it? -The upper and lower limits provide a way of assessing whether the results are clinically important. • Does it include any values of particular interest? -We can check whether a hypothesized value for the population parameter falls within the confidence interval. -If so, then our results are consistent with this hypothesized value. -If not, then it is unlikely (for a 95% confidence interval, the chance is at most 5%) that the parameter has this value. 29 29 Degrees of freedom • You will come across the term 'degrees of freedom' in statistics. • In general they can be calculated as the sample size minus the number of constraints in a particular calculation; these constraints may be the parameters that have to be estimated. • As a simple illustration, consider a set of three numbers which add up to a particular total (T). 30 30 x2 )x )2x Degrees of freedom • Two of the numbers are 'free' to take any value but the remaining number is fixed by the constraint imposed by T. • Therefore the numbers have two degrees of freedom. • Similarly, the degrees of freedom of the sample variance, • are the sample size minus one, because we have to calculate the sample mean ( ), an estimate of the population mean, in order to evaluate s2 31 31 Applying the relative deviate formula for confidence interval of the differences Relative Deviate (RD) = (Mean 1 – Mean 2) / standart error of the differaces • For the age differances between Grup 1 ve Grup 2 at Table 1 (51,0-43,76)/4,30 = 1,68 Since the result is less than 1,96, the difference is not statistically significant. • For the gender differances between Grup 1 ve Grup 2 at Table 1 (0,8-0,72)/0,12 = 0,66 Since the result is less than 1,96, the difference is not statistically significant. • For the age differances between males and females in Diyabet.sav data set (56,2-52,31)/1,199 = 3,24 Since the result is higher than 1,96, the difference is statistically significant. Summary Confidence intervals ● Confidence interval for the mean -Using the Normal distribution -Using the t-distribution ● Confidence interval for the proportion ● Confidence interval for the differances -for numerical data -for proportions ● Confidance intervals for odds ratio ● Interpretation of confidance intervals ● Degrees of freedom=df ● Applying the relative deviate formula for confidence interval of the differences 34