Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
مرکز پژوهشهای علمی دانشجویان دانشگاه علوم پزشکی تهران کارگاه آمار وspss Statistics Arsia Jamali Medicine & MPH Student Students’ Scientific Research Center Tehran University of Medical Sciences Outline Descriptive Statistics Analytical Statistics Statistical Tests 6/23/2009 Arsia Jamali-Students' Scientific Research Center 3 Descriptive Statistics Outline Overview Variables Data Presentation Exercise 6/23/2009 Arsia Jamali-Students' Scientific Research Center 4 Types of Statistics Descriptive Statistics: is used to describe characteristics of our sample Inferential (Analytical) Statistics: is used to generalise from our sample to our population 6/23/2009 Arsia Jamali-Students' Scientific Research Center 5 Variables Definition Types: Qualitative Nominal: Blood Group Ordinal: Staging Quantitative Interval : (0C) Ratio: (0K) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 6 Presentation of Data Frequency Tables Graphical Techniques Measures of Central Tendency Measures of Spread (Variability) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 7 Charts Disrtribution of Stage of the Pancreatic Cancer In Patients Disrtribution of Stage of the Pancreatic Cancer In Patients 120 100 IV 80 III 60 IV 40 III 20 II 0 I II I IV Pie Chart 6/23/2009 III II I Bar Chart Arsia Jamali-Students' Scientific Research Center 8 Charts Histogram 6/23/2009 Arsia Jamali-Students' Scientific Research Center Area 9 Charts Box Plot 6/23/2009 Error Bar Arsia Jamali-Students' Scientific Research Center 10 Charts Disrtribution of Stage of the Pancreatic Cancer According To Age In Patients 4500 4000 3500 60 50 40 Male Female 30 Birth Weight Number of The Patients 70 3000 2500 2000 1500 20 1000 10 500 0 0 IV III II I 105 80 20 10 0 10 20 30 40 50 Gestational Week Stage Clustered Bar 6/23/2009 Arsia Jamali-Students' Scientific Research Center Scatter Plot 11 Charts • • • • • • • • Pie chart (Qualitative Variables) Bar Chart (Qualitative Variables) Histogram (Quantitative Variables) Area (Polygon) (Quantitative Variables) Clustered Bar (Two Variables) Box Plot (Two Variables) Error Bar (Two Variables) Scatter Plot (Two Variables) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 12 Significance of Cluster Bar Distributaion of Stage in Pancreatic Cancer Pateints Distribuation of Gender in Pancreatic Cancer Pateints 30% 60% 25% 50% 20% 40% 15% 30% 10% 20% 5% 10% 0% 0% Stage I Stage II Stage III Stage IV 6/23/2009 Male Arsia Jamali-Students' Scientific Research Center Female 13 Significance of Cluster Bar 90.0% 80.0% 70.0% 60.0% 50.0% Male 40.0% Female 30.0% 20.0% 10.0% 0.0% Stage I 5/24/2017 Stage II Stage III Arsia Jamali-Students' Scientific Research Center Stage IV 14 Measures of Central Tendency Mean (Average) Median Mode 6/23/2009 Arsia Jamali-Students' Scientific Research Center 15 Mean • • • • • • µ= ∑X/ N Takes into account all values If describing a population, denoted as µ “mu” If describing a sample, denoted as “x-bar” Appropriate for describing measurement data Seriously affected by outliers 6/23/2009 Arsia Jamali-Students' Scientific Research Center 16 Median • • • • 50th percentile. Appropriate for describing measurement data Is not affected much by unusual values Calculation in Odd & Even samples 6/23/2009 Arsia Jamali-Students' Scientific Research Center 17 Mode • Most frequent value • One data set can have many modes • Appropriate for all types of data, but most useful for categorical data or discrete data with only a few number of possible values • Unaffected by extreme scores • Not useful when there are several values that occur equally often in a set 6/23/2009 Arsia Jamali-Students' Scientific Research Center 18 Exercise 4, 6, 3, 7, 5, 7, 8, 4, 5,10,10, 6, 8, 9, 3, 5, 6, 4, 11, 6 Mode = 6 Median = 6.5 Mean = 6.17 6/23/2009 Arsia Jamali-Students' Scientific Research Center 19 Note The mean is the preferred measure of central tendency, except when There are extreme scores or skewed distributions Non interval data Discrete variables 6/23/2009 Arsia Jamali-Students' Scientific Research Center 20 Measures of Spread (Variability) Range = Highest - Lowest Variance Standard Deviation Coefficient of Variation (CV) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 21 CV Example Consider the following: Mean of the body weight in our sample is 70 and its sigma is 5. Mean of the height in our sample is 165 and its sigma is 8. The variability of which variable is less? CVweight=7.1 & CVheight =5.1 6/23/2009 Arsia Jamali-Students' Scientific Research Center 22 Real Examples • Article 1 (Table) • Article 2 (Table & Graph ) • Your Turn!!! 6/23/2009 Arsia Jamali-Students' Scientific Research Center 23 6/23/2009 Arsia Jamali-Students' Scientific Research Center 24 6/23/2009 Arsia Jamali-Students' Scientific Research Center 26 Analytical Statistics 6/23/2009 Arsia Jamali-Students' Scientific Research Center 27 Frequency Distribution Distribution of frequency of two sides of a coin in 100 time trial بار پرتاب سکه100 نمودار توزیع فراوانی شیر و خط در 6/23/2009 Arsia Jamali-Students' Scientific Research Center 28 Characteristics of A Normal Distribution (1) Bell Shaped Symmetric Unimodal Mean = Mode = Median Extends to +/- infinity Area under the curve=1 6/23/2009 Arsia Jamali-Students' Scientific Research Center 29 Characteristics of A Normal Distribution (2) Can be completely specified by two parameters: Mean Standard Deviation The mean mu controls the center and sigma controls the spread. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 30 Frequency 6/23/2009 Arsia Jamali-Students' Scientific Research Center 31 Frequency 6/23/2009 Arsia Jamali-Students' Scientific Research Center 32 Standard Normal Distribution • The standard normal distribution has mean = 0 and standard deviation sigma=1 6/23/2009 Arsia Jamali-Students' Scientific Research Center 33 Characteristics of A Normal Distribution (3) For any normal curve with mean mu and standard deviation sigma: 68 percent of the observations fall within one standard deviation sigma of the mean ( 1) 95 percent of observation fall within 2 standard deviations ( 2) 99.7 percent of observations fall within 3 standard deviations ( 3) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 34 Frequency Z Transformation 0 6/23/2009 Arsia Jamali-Students' Scientific Research Center 35 Frequency Z Transformation 0 6/23/2009 Arsia Jamali-Students' Scientific Research Center 36 Frequency Z Transformation 0 6/23/2009 Arsia Jamali-Students' Scientific Research Center 37 Frequency Z Transformation 0 6/23/2009 Arsia Jamali-Students' Scientific Research Center 38 Frequency Z Transformation 0 6/23/2009 Arsia Jamali-Students' Scientific Research Center 39 Frequency Z Transformation 0 6/23/2009 Arsia Jamali-Students' Scientific Research Center 40 Z Transformation Z- Transformation Z = Xi - µ δ Application Use of Z Table 6/23/2009 Arsia Jamali-Students' Scientific Research Center 41 Question If Ali’s Math=17 and Science=15. Is he better at math or science? Is this information enough? Mean = 16, 14 SD= 1, 0.5 Consider mean of systolic BP in human being is 120 mmHg and the standard deviation of the population is 10 mmHg. In what percent of the people systolic BP is less than 134? That is, P(Z < 1.40) = ? Answer= 0.5 + P (0< Z <1.40) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 42 Distribution of Means 6/23/2009 Arsia Jamali-Students' Scientific Research Center 43 Central Limit Theorem Regardless of the distribution of the population, the distribution of the means of random samples approach a normal distribution for a large sample size. Xi μ Z σ/ n SEM = σ / √ n Mean Distribution Chart 6/23/2009 Arsia Jamali-Students' Scientific Research Center 44 6/23/2009 Arsia Jamali-Students' Scientific Research Center 45 Estimation In many situations, conducting a census is very difficult and expensive Actually it is not possible to repeat an study for many times So what should we do? We may Estimate… 6/23/2009 Arsia Jamali-Students' Scientific Research Center 46 Confidence Interval (CI) • An example: The mean of the height in a sample is 175cm and the standard error of mean is 5. May we guess what is the mean of the population? • With 95% probability, the mean of the population will reside in 165-185 interval. 6/23/2009 51 Confidence Interval (CI) Parameters: Interval limits Probability In applied practice, confidence intervals are typically stated at the 95% confidence level Remember there is 5% probability that mean of the population does not rely in this interval 6/23/2009 Arsia Jamali-Students' Scientific Research Center 52 Hypothesis In each problem considered, the question of interest is simplified into two competing claims/hypotheses between which we have a choice; the null hypothesis, against the alternative hypothesis. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 54 Hypothesis Example Consider the goal of a study is: Our drug is more effective in reducing pain than morphine So the hypothesis would be: H0: Our drug is not more effective than morphine in relieving pain H1: Our drug is more effective than morphine in relieving pain 6/23/2009 Arsia Jamali-Students' Scientific Research Center 55 Testing The Hypothesis Null hypothesis (H0): The hypothesis to be tested No difference is seen Alternative hypothesis (H1=Ha): The hypothesis to be considered as an alternative to the null hypothesis. Note: The alternative hypothesis is the one believe to be true, or what you are trying to prove is true 6/23/2009 Arsia Jamali-Students' Scientific Research Center 56 Types of Hypothesis Testing (1) • Two Tailed Test • Example: Mean height of the boys and girls does not differ. H0: heightboys= heightgirls H1: heightboys≠ heightgirls 6/23/2009 Arsia Jamali-Students' Scientific Research Center 57 Types of Hypothesis Testing (2) • Left-sided Test • Example: Plasma albumin level in cirrhotic patients is lower than 3.5 mg/dl H0: Albuminpateints > 3.5 H1: Albuminpateints ≤ 3.5 6/23/2009 Arsia Jamali-Students' Scientific Research Center 58 Types of Hypothesis Testing (3) • Right-sided test • Example: Systolic blood pressure of diabetic patients is higher than 120mmHg H0: SBPpateints<120 H1: SBPpateints≥ 120 Note: A hypothesis test is called a one-tailed (directional) test if it is either right- or left-tailed 6/23/2009 Arsia Jamali-Students' Scientific Research Center 59 Results of Hypothesis Testing Possible conclusions from hypothesis-testing analysis are reject H0 or fail to reject H0 But we may make mistakes in the test: Type I error: reject the null hypothesis when in fact it is true.; that is, H is wrongly rejected. 0 probability of type I error is denoted by α Type II error: accept the null hypothesis when it is wrong. probability of type II error is denoted by β Arsia Jamali-Students' Scientific Research Center 60 Types of Errors 6/23/2009 Arsia Jamali-Students' Scientific Research Center 61 Type I error Example Example: in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug; H0: there is no difference between the two drugs on average. A type I error would occur if we concluded that the two drugs produced different effects when in fact there was no difference between them. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 62 Type I error A type I error is often considered to be more serious, and therefore more important to avoid, than a type II error. The hypothesis test procedure is therefore adjusted so that there is a guaranteed 'low' probability of rejecting the null hypothesis wrongly; this probability is never 0. This probability of a type I error can be precisely computed as: P (type I error) = significance level =α 6/23/2009 Arsia Jamali-Students' Scientific Research Center 63 Type II error A type II error occurs when the null hypothesis H0, is not rejected when it is in fact false. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. A type II error would occur if it was concluded that the two drugs produced the same effect, i.e. there is no difference between the two drugs on average, when in fact they produced different ones. P(type II error) = β 6/23/2009 Arsia Jamali-Students' Scientific Research Center 64 Type of Errors • If we do not reject the null hypothesis, it may still be false (a type II error) as the sample may not be big enough to identify the falseness of the null hypothesis (especially if the truth is very close to hypothesis). • For any given set of data, type I and type II errors are inversely related; the smaller the risk of one, the higher the risk of the other 6/23/2009 Arsia Jamali-Students' Scientific Research Center 65 Power • Measures the test's ability to reject the null hypothesis when it is actually false • In other words, the power of a hypothesis test is the probability of not committing a type II error. It is calculated by subtracting the probability of a type II error from 1 • Power = 1 - P(type II error) = 1-β • The maximum power a test can have is 1, the minimum is 0. Ideally we want to have high power, close to 1. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 66 Interrelationship between α, β, and n 6/23/2009 Arsia Jamali-Students' Scientific Research Center 67 Sample Size • How many individuals will I need to study? • An appropriate sample size generally depends on five study design parameters: Minimum expected difference (also known as the effect size) Measurement variability Statistical power Significance level one- or two-tailed analysis Ref: Eng J. Sample size estimation: how many individuals should be studied? Radiology. 2003;227:309-13 68 Sample Sizes For Comparative Studies (1) N is the total sample size (the sum of the sizes of both comparison groups), σ is the assumed SD of each group (assumed to be equal for both groups), Zcrit ----significance criterion, Zpwr ----statistical power, D is the minimum expected difference between the two means Ref: Eng J. Sample size estimation: how many individuals should be studied? Radiology. Arsia Jamali-Students' Scientific Research 6/23/2009 2003;227:309-13 Center 69 Sample Sizes For Comparative Studies (2) Ref: Eng J. Sample size estimation: how many individuals should be studied? Radiology. 70 Sample Sizes For Descriptive Studies (1) N is the sample size of the single study group, σ is the assumed SD for the group, Zcrit value, D is the total width of the expected CI. Note: this equation does not depend on statistical power because this concept only applies to statistical comparisons Scientific Research Ref: Eng J. Sample size estimation: Arsia howJamali-Students' many individuals should be studied? Radiology. 6/23/2009 Center 71 Sample Sizes For Descriptive Studies (2) Scientific Research Ref: Eng J. Sample size estimation: Arsia howJamali-Students' many individuals should be studied? Radiology. 6/23/2009 Center 72 P value • Consider we want to investigate whether the SBP of diabetic patients is higher than general population. We know the mean of SBP in general population is 120mmHg and its standard deviation is 10. • In our study in find that mean of SBP in our 25 diabetic patients is 125mHg. • What can we conclude? 6/23/2009 Arsia Jamali-Students' Scientific Research Center 73 P value • • • • Z= (125-120)/(10/√25) Z=05/2=2.5 α= 0.05 P value = 0.006 6/23/2009 Arsia Jamali-Students' Scientific Research Center 74 P value P value: The probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true. It is the probability of wrongly rejecting the null hypothesis if it is in fact true. The p-value is compared with the actual significance level of our test and, if it is smaller, the result is significant. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 75 P value If the P value is 0.03, that means that there is a 3% chance of observing a difference as large as you observed even if the two population means are identical (the null hypothesis is true) Does not show casual association 6/23/2009 Arsia Jamali-Students' Scientific Research Center 76 P value • Another Example: In a study, the height of 25 girls and 25 boys were measured. The height of girls were 170cm and the height of boys were 175cm. We want to see whether the difference is significant or it was seen by chance. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 77 P value 6/23/2009 Arsia Jamali-Students' Scientific Research Center 78 P value • 95%CI are overlapping. So H0 (the no difference hypothesis) is not rejected. Remember it is not accepted as well. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 79 P value • 95%CI are not overlapping. So H0 (the no difference hypothesis) is rejected. Therefore, the difference is not seen by chance, it is real. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 80 Measures for comparing rates • Odds Ratio (OR) • Relative Risk (RR) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 81 OR • Odds definition: Probability of A/ Probability of  {P(A)/P(Â)} • OR: Exposure odds in disease group divided by exposure odds in non-disease group • Case-Control Study • Interpretation: – OR=1 – OR>1 Smoking OR<1 Not–Smoking Lung Cancer Healthy Total 500 1800 2300 500 Disease+ 7200 Total Exposure+ 1000 a 10000 Exposure Odds Exposure500:500=1 OR b 11000 c 1800:7200=1/4 d Exposure(500:500)/(1800:7200)=4 Odds a/c Odds Ratio 6/23/2009 Control 7900 Arsia Jamali-Students' Scientific Research Center b/d ad/bc 82 RR • Risk of the disease in the exposure group divided by the risk of the disease in the nonexposed group • Cohort Study Healthy Total • Interpretation: Lung Cancer Smoker – RR=1 Not Smoker Total – RR>1 Risk of Cancer in Smokers – RR<1 1000 4000 100 4900 Disease+ a8900 1100 Exposure+ Exposure- 1000:5000=1/5 c Risk of Cancer in Non-Smokers 100:5000=1/50 Risk a/(a+b) RR6/23/2009 5000 5000 Control 10000 b d c/(c+d) Relative(1000:5000)/(100:5000)=10 Risk {a/(a+b)}/{c/(c+d)} Arsia Jamali-Students' Scientific Research Center 83 Websites & Real Examples Some Websites: – http://www.hutchon.net/ConfidORselect.htm ---------------------------------------------------------------– http://www.metanumerics.net/Samples/ContingencyCalculator.aspx ---------------------------------------------------------------- Example 1 The mom Song 6/23/2009 Arsia Jamali-Students' Scientific Research Center 84 Real Examples • Your Turn! • Female Driver 6/23/2009 Arsia Jamali-Students' Scientific Research Center 85 6/23/2009 Arsia Jamali-Students' Scientific Research Center 86 • 6/23/2009 Arsia Jamali-Students' Scientific Research Center 88 Statistical Tests • Allow us to estimate the likelihood that the apparent differences between groups are real and not due to chance. • Since there are two types of variables , we need two types of statistical tests: 6/23/2009 Arsia Jamali-Students' Scientific Research Center 89 T test • Compares means of two groups • T test prerequisites: 1) Two groups of samples should be independent 2) Samples should have a normal distribution • Note: samples greater than 30 do not need to have a normal distribution 3) Samples should have similar standard deviation 6/23/2009 Arsia Jamali-Students' Scientific Research Center 90 Types of Tests 1) Independent-samples: Tests the relationship between two independent populations 2) Paired-samples: Tests the relationship between two linked samples, e.g. means obtained in two conditions by a single group of participants Examples: comparing the SBP of a group of patients before and after administration of propranolol 6/23/2009 Arsia Jamali-Students' Scientific Research Center 91 ANOVA Ali 6/23/2009 Nasim Omid Arsia Jamali-Students' Scientific Research Center Vahid Akram 92 Chi square Test (1) Qualitative data, odds, and risk Chi2 test prerequisites: 1) The variable must be in the form of actual accounts not frequencies 2) The frequency data must have a precise numerical value and must be organized into categories or groups 3) The expected frequency in any one cell of the table must be greater than 5 4) The total number of observations must be greater than 20. 5) Observations must be independent 6/23/2009 Arsia Jamali-Students' Scientific Research Center 93 An Example of Chi square Test • Consider the following study: We have assessed depression and gender of 100 people. The results of the study is summarized in the following table: Male 6/23/2009 Female Total Depressed 10 20 30 Not Depressed 40 30 70 Total 50 50 100 Arsia Jamali-Students' Scientific Research Center 94 Chi square Test (2) χ2 = The value of chi square Obs = The observed value Exp = The expected value ∑ (Obs – Exp)2 = all the values of (O – E) squared then added together df = (R-1)(C-1) 6/23/2009 Arsia Jamali-Students' Scientific Research Center 95 An Example of Chi square Test • If there is no difference between two ganders (in depressed/not depressed people), we expect the cells are filled as follows: Male Female Total Depressed 10 (15) 20 (15) 30 Not Depressed 40 (35) 30 (35) 70 Total 50 50 100 6/23/2009 Arsia Jamali-Students' Scientific Research Center 96 An Example of Chi square Test • χ2 = (10-15)2 + (20-15)2 + (40-35)2 + (30-15)2 =4.76 15 15 35 35 • Critical χ2 with 1 df (at p=.05) = 3.84 • Reject Ho : depression and gender are NOT independent; they are associated. 6/23/2009 Arsia Jamali-Students' Scientific Research Center 97 Is your Dependent Variable (DV) continuous? YES NO Is your Independent Variable (IV) continuous? YES Is your Independent Variable (IV) continuous? NO YES NO Do you have only two groups? YES 6/23/2009 NO Arsia Jamali-Students' Scientific Research enter 98 The road to happiness lies in two simple principles: find what it is that interests you and that you can do well, and when you find it put your whole soul into it – every bit of energy and ambition and natural ability you have. John D. Rockefeller III 6/23/2009 Arsia Jamali-Students' Scientific Research Center 99