* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download bme stats workshop
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Gibbs sampling wikipedia , lookup
Statistical inference wikipedia , lookup
Categorical variable wikipedia , lookup
Resampling (statistics) wikipedia , lookup
BME STATS WORKSHOP Introduction to Statistics Part 1 of workshop The way to think about inferential statistics • They are tools that allow us to make black and white statements even though the data does not clearly provide answers. – This is to say that we will use probabilities which speak of shades of grey but will make statements with respect to rejecting or failing to reject some null hypothesis. Inferencing from data analysis • As scientists we have the unique privilege of using ingenious tools and methods that help us make informed decisions. • One of those tools is statistical analysis. It allows us to more accurately determine the reality of our data. • This workshop should help you make better conclusions on your data by using simple but effective statistical tools to cut through the levels of grey often encountered in research. The Essence of Inferential Statistics 1. We compare a statistic obtained from acquired data to a theoretical distribution of that statistic. Thus, relativity is important in statistics. • You will surely have conducted t-tests in the past to compare measures from a control with an experimental group. • That t value is evaluated against a distribution of ts. • In statistics, size does mater. Large t values increase the likelihood of the investigator stating that he has significant results. Essence con’d 2. Signal to noise ratio. • Most statistics used in this workshop such as the t statistic are made up of differences due to treatment and differences due to individuals (also called error). Error is simply random variation. t X1 X 2 standard error of the difference Essence Con’d 3. Rare events • This is related directly to point one. • In order for treatment to be successful, the obtained statistic has to be sufficiently rare. • We will find out that large statistical values are considered rare. • For a better understanding of these points we will describe a Monte Carlo experiment. The Plan! 1. Constructing a distribution. 2. How to apply a statistic obtained from an experimental. 3. Interpretation of a result. 4. What does a significant result mean? Constructing a Distribution: Some Definitions • Sample distribution: – A distribution of values from some measurement. • This measurement can be of anything, such as height, weight or age to name a few. • Sampling distribution: – A distribution of a statistic obtained from a sample distribution. • This statistic can be a mean, mode, median, variance or anything else that is a calculation from individual measures. • As we will see, the t statistic can be used to construct a sampling distribution. Distributions • Sample distributions are often bell shaped or normal but this is guaranteed. On occasion exponential, rectangular or odd shaped distributions are observed. • Sampling distributions on the other hand are almost always normally shaped. This is true even is the measurements used to calculate a statistic are from non-normal distributions. How to construct a sampling distribution of the t statistic. An example under the null hypothesis of equal means • We first have to have a sample distribution of some measure from some population with specific parameters such as 25 year old women. The measurement of interest could be height. • We then randomly sample from this distribution to make up two groups of individuals of a specified sample size. – Ex. Two groups of ten individual. • From these two groups a t value is calculated. This t value is then plotted. After this calculation, the individuals are returned to the sample distribution. • The process of “sampling” with replacement is repeated as many times as possible. Using computers you might opt for 1000 or more samplings. Thus, you would have a sampling distribution of 1000 ts. How to use a sampling distribution of ts • In any sampling distribution there are a number of values that are extreme. This is normal and we will use this concept to make decisions about our experiments. • Traditionally, we determine the t value at which point all values greater make up 5% of all values in that distribution. If we are concerned about both tails of that distribution we will find the value at which point all values greater make up 2.5% of all values on the positive tail and 2.5% on the negative tail. How to use con’d. • We then conduct an experiment in which we have a control and an experimental group. • We calculate a t statistic from this experiment. • This t value is evaluated against the sampling distribution of ts we have constructed. • If our obtained value is greater than the value from the distribution that marks the 5% cutoff we end up stating that the experiment produced a significant result. In other words the control was significantly different from the experimental group. Sig. Not Sig. Sig. Some specifics about using a t distribution. What does stating significance really mean? • First of all when find a t value that is outside of the critical values in a distribution we should really start by saying, “the obtained value is rare if when calculated from two groups obtained from the same population.” • We would then follow up that statement with, “Since that value is rare and is obtained from an experiment, it is reasonable to conclude that the groups do not come from the same population.” – This is indeed saying that the treatment was effective. Thus, we have a significant result. Monte Carlo How will building a distribution help us understand statistics Monte Carlo Building a t distribution Distributions: ts How do you build distributions of a statistic? In this case t. 1) You start with a population of interest. 2) Calculate means from two samples with a specific number of individuals. 3) Calculate the t statistic using those two samples. 4) Do this again and again. Possibly 1000 times or more. Remember that these distributions are built under the null hypothesis. Population n1=xx n2=xx x1 x2 Calculate t Repeat the process as often as you can. Family of ts The larger the sample size used the less variability in the results. As we can see here, the greater the degrees of freedom (df) the less extreme are the obtain values resulting in a tighter distribution. Note: Degrees of freedom when using the t-test are calculated at n1+n2-2. Thus, for a sample size of 10 per group the dfs are 18. Theoretical Distribution of ts. We use this table to determine the critical values. The computer uses the density functions. Variables • Independent variable: – That variable you manipulate. • Subjects are allocated to groups • Dependent variable – That variable which depends on the manipulation. • Measures such as weight or height or some other variable that varies depending on treatment Cause and effect • Cause can only be inferred when subjects are randomly allocated to groups. – Random allocation ensures that all characteristics are evenly distributed across all groups. • This way, differences between groups cannot be due to biases in the subject selection, a very important element of experimental design. An example of data analysis Comparing Reaction time Following Alcohol Consumption. • University males were recruited to participate in an experiment in which they consumed a specific amount of alcohol. • The males were randomly separated into two groups. One group consumed the alcohol and the other some non-alcoholic drink. • Ten minutes after the second drink was consumed the subjects were asked to push a button on a box the moment they heard a buzzer. • When the button was pushed the buzzer stopped. The investigator recorded the amount of time the buzzer sounded in milliseconds. Hypotheses We state hypotheses in terms of populations. This is to say that we are making statements on what we think exists in the real world. From our sample we will reject or fail to reject the null hypothesis. Here we have a situation in which we are predicting differences only. This is a nondirectional hypothesis. H0: c = a H1: c ≠ a The data (Time in ms) • Control • • • • • • 150 110 200 135 90 111 Alcohol group 200 250 220 225 250 234 Results from an output provided by SPSS Probability of a Type1 error is provided inside the red box added by myself (not SPSS). Commonly, investigators call this the significance level. It should be noted that statisticians would not label that value as such. Critical Values • A critical value is that value using a theoretical distribution that marks the point beyond which less than a specific percent of values can be found. – We typically use 5%. • In our example we have 12 scores from 12 individuals, thus 10 degrees of freedom. – From the distribution of all ts we can determine how large a calculated t from our experiment must be for us to reject the null hypothesis of equal means. – That value (see table previously shown) is 2.228. – Our obtained t is larger than the critical value (-5.465). We reject the null hypothesis in favour of the alternate. – You will notice that the t value is negative for our experiment. What is important is the magnitude, not the direction. If we were to reverse the groups in our calculations the value would have been positive. Interpretation of the results • Alcohol increases the amount of time needed to turn off the buzzer suggesting that the subjects are impaired in their reactions. • We are able to make this statement because the t value obtained here would be rare if the samples came from the same population. Due to this situation, we give ourselves permission to reject the null hypothesis of equal means in the population. Some Important Concepts The standard deviation • The concept of variance and standard deviation (SD) is everything in statistics. • It is used to determine if individuals or samples are inside or outside of normal. • Anyone that is more than 1.96 SD away from the population mean of some measure is said to not belong to that population. However, this is only true when we have population parameters (more on this later). A few formulas to help us along. Variance: ( X X ) S Variance n 1 2 Standard Deviation (SD): SD S 2 Standard error of the mean=SEM: SD SEM n 2 Variability is Important •The greater the variability the greater the noise. Note here that with greater variability in the data, more overlap of the sample distributions is observed. •This will result in smaller signal to noise ratios. Thus, when we have more variability we will need larger sample sizes to detect mean differences (more on this later). Keep this in mind when reviewing the upcoming slides. T-Test • Two Sample t-test • Comparing two sample means. t X1 X 2 S X2 1 n1 S X2 2 n2 It is evident from the formula that the smaller the variability, the larger the t value. Hypothesis Testing revisited. • We always determine whether or not a statistic is rare given the null hypothesis never from the alternate hypothesis. You might remember this from the Monte Carlo studies. • Thus we have to deal with the concept of the Type1 and the Type2 error. Type 1 error • The probability of being wrong when stating that samples are from different populations. • This is the p<.05 that we use to reject the null hypothesis of equal means in the population. – If we have a p of .02, it means that the probability of being wrong when stating that two samples come from different populations is .02. – The .05 is a cutoff that is said to be acceptable. Type 2 error. • The probability of failing to reject the null hypothesis when the null is not true. • In truth, the samples are most likely from different populations. Often, we simply don’t have enough power or the tools are not sensitive enough to detect these differences. Assumptions of a Distribution What are they and why are they important? Assumptions are rules • They are the rules by which distributions are constructed. • These rules must be followed in order for a statistic obtained from an experiment to be compared to the theoretical distribution. • If your experiment breaks these rules, it is possible that you will either to conservative or to liberal when making a statement about the reality of the population. Assumptions 1. Samples come from a normally distributed population 2. Both samples have equal variances (homogeneity of variance) 3. Samples are made up of randomly selected individuals 4. Both samples be of equal sample size. What to do when we violate assumptions • 1. We can transform the data so that the sample can have the characteristics desired. • 2. We can use distribution free statistics. – These statistics are insensitive to violations of assumptions. • However, they do have limitations (more in later sessions). Part 2 of workshop Starting out with PASW (formerly SPSS but now SPSS again) An introduction What is SPSS • It is “Statistical Package for the Social Sciences). • It started life as a text driven program (SPSSx), migrated to the PC as line code and, finally made it to the Windows environment. This is the version we enjoy today. Do you need the latest version? • No. • With each new version there are graphical changes and on occasion additional statistical tools. – However, the basics do not change. An analysis of variance conducted with version 10 will produce the same results as those with version 19 (the latest at the time of this workshop). Latest version cont’d • One problem is with the output of different versions. – Older versions of SPSS cannot read the output of newer versions. Thus, the outputs are not backward compatible. – One way to get around this issue is to use the export function in the newer versions to save the outputs as PDF, DOC, or PPT so that the results can be read. Getting started • If you’ve used Excel in the past, then you have a base from which to work. • SPSS uses a worksheet that is similar but not identical to Excel. – However, the similarities end there. Learning Curve • If you use SPSS on a regular basis, you should be somewhat proficient in a week or two. – Developing an expertise will take you somewhat longer depending on your interest and statistics knowledge. – Lets get started! This is what you see when you start the program. In front of you is the worksheet in the “data view”. You enter all your data in the worksheet. You also have the option of “variable view” by clicking on the tab below or clicking on the column heading “var”. The variable view is where you write down the name of your variable (variable name). Also in this view you have the option of providing variable labels and other descriptors that can help you recognize your data. Name your variable. Let’s start with a short review on variables. • Independent variable (IV): That variable which is manipulated. • Dependent variable (DV): That variable whose measures depend on some manipulation. • Any experiment can have more than one IV or DV. • These variables have to be set up correctly in a worksheet in order to properly analyze data. Let’s say that the study is designed to determine if a certain drug facilitates weight loss • We will need an independent variable….say Drug Type. – We could have two groups based on drug treatment. • Drug 1 • Drug 2 • We will also need a dependent variable…say weight. – In the worksheet we will indicate the weight for each individual after being on the drug for a period of time. Entering data. We simply click on an empty box and begin typing as appropriate. Shown here are the designations for group membership for the IV in our fictitious experiment with two groups. Back to the variable view where we change the variable name and add a label which will help us remember what that variable means for future reference. Also, the variable label is the text that will be printed on the output following an analysis. Clicking on the empty square under values allows for the user to specify group names. The number value is assigned a label by the user. On returning to the worksheet, the group labels and the variable name specified by you replace the default labels. We will now add the dependent variable with data DV IV Some Descriptive Statistics • PASW easily allows us to produce descriptive statistics. – Mean – Standard deviation – Standard error – Median – Etc…. You conduct all analyses from the Analyze option. Here we are asking for PASW to show descriptive statistics using the Means sub-option. Many options for descriptive statistics are available Relevant output table is shown here. Note that the statistics requested in the earlier slide are displayed in this table. Report This is the dependent variable for weight This is the independent variable Std. Error of Mean dimension1 N Std. Deviation Mean Drug 1 48.3000 10 7.95892 2.51683 Drug 2 80.8000 10 9.17484 2.90134 Total 64.5500 20 18.65046 4.17037 Graphs: Can be constructed from a number of options You may wish to use the chart builder option but users who are familiar with older versions of this program sometimes find it difficult to change. I like the legacy option which retains the old method. In the next slide we will see a graph using the error bar option. Here we have the 95% intervals but typically you would want the error bars to represent one standard error. Finally an analysis • We will conduct a two sample independent t-test. Here we specify the tests of means in the compare means option You must indicate which groups will be compared. You must use the number assigned to the groups. Independent Samples Test Levene's Test for Equality of Levene’s test determines if the variance in one group is different from the other. This is an important assumption. Variances F This is the dependent Equal variances assumed variable for weight Equal variances not Sig. 1.138 .300 assumed Independent Samples Test t-test for Equality of Means t df Sig. (2-tailed) This is the dependent Equal variances assumed -8.462 18 .000 variable for weight Equal variances not -8.462 17.648 .000 assumed Independent Samples Test t-test for Equality of Means Mean Std. Error Difference Difference This is the dependent Equal variances assumed -32.50000 3.84086 variable for weight Equal variances not -32.50000 3.84086 assumed Independent Samples Test t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper This is the dependent Equal variances assumed -40.56935 -24.43065 variable for weight Equal variances not -40.58090 -24.41910 assumed The results are significant. Sig. (2-tailed) is the Type 1 error. Let’s add a third group • The same method as building the database in the first place applies to adding a group. • With the addition of a third group we will need to perform an analysis of variance (ANOVA) with posthoc tests. ANOVA This is the dependent variable for weight Sum of Squares Between Groups Within Groups Total df Mean Square 14581.400 2 7290.700 2492.600 27 92.319 17074.000 29 F 78.973 Sig. .000 Multiple Comparisons This is the dependent variable for weight Tukey HSD (I) This is the (J) This is the independent variable independent variable Drug 1 Mean Difference (I-J) 4.29694 .000 -53.60000 * 4.29694 .000 Drug 1 32.50000 * 4.29694 .000 Drug 3 -21.10000 * 4.29694 .000 Drug 1 53.60000 * 4.29694 .000 Drug 2 21.10000 * 4.29694 .000 Drug 3 dimension2 dimension3 Drug 3 dimension3 *. The mean difference is significant at the 0.05 level. Multiple Comparisons This is the dependent variable for weight Tukey HSD (I) This is the (J) This is the independent variable independent variable Drug 1 95% Confidence Interval Lower Bound Upper Bound Drug 2 -43.1539 -21.8461 Drug 3 -64.2539 -42.9461 Drug 1 21.8461 43.1539 Drug 3 -31.7539 -10.4461 Drug 1 42.9461 64.2539 Drug 2 10.4461 31.7539 dimension3 Drug 2 dimension2 dimension3 Drug 3 Sig. -32.50000 Drug 2 dimension3 Drug 2 Std. Error * dimension3 Significant results. Interpretation • The ANOVA indicates that there are differences between the groups. • This result allowed for conducting a posthoc Tukey test. – All groups are considered different from one another. – This is shown by the observation that all comparisons are significant. A graph of the results obtained from the Univariate suboption is shown here. Adding a second IV will allow us to conduct an interaction analysis using the Univariate sub-option. Tests of Between-Subjects Effects Dependent Variable:This is the dependent variable for weight Source Type III Sum of Squares df Mean Square F Sig. Corrected Model a 14581.400 2 7290.700 78.973 .000 Intercept 177870.000 1 177870.000 1926.699 .000 14581.400 2 7290.700 78.973 .000 Error 2492.600 27 92.319 Total 194944.000 30 17074.000 29 IndpendentVar Corrected Total a. R Squared = .854 (Adjusted R Squared = .843) We observe a significant main effect for IV1 but not IV2. Also, there is no significant interaction between IV1 and IV2 on the dependent variable. See graph. After all this you might want to explore the interaction • You would run simple main effect analysis which can be done through a syntax window. • You write a program • This was the norm when PASW was SPSSx. SPSS was text driven. Syntax This program allows us to determine if there are differences on the dependent variable of one IV at levels (groups) of another variable. Results of the Simple Main Effects Analysis • Next slide. The default error term in MANOVA has been changed from WITHIN CELLS to WITHIN+RESIDUAL. Note that these are the same for all full factorial designs. *****************Analysis of Variance***************** 30 cases accepted. 0 cases rejected because of out-of-range factor values. 0 cases rejected because of missing data. 6 non-empty cells. 1 design will be processed. -----------------------------------------------------------* * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * * **** Tests of Significance for DependentVar using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 2469.20 24 102.88 INDEPENDENTVAR2 WITH 16.90 1 16.90 .16 .689 IN INDEPENDENTVAR(1) INDEPENDENTVAR2 WITH 6.40 1 6.40 .06 .805 IN INDEPENDENTVAR(2) INDEPENDENTVAR2 WITH .10 1 .10 .00 .975 IN INDEPENDENTVAR(3) INDEPENDENTVAR 14581.40 2 7290.70 70.86 .000 (Model) 14604.80 5 2920.96 28.39 .000 (Total) 17074.00 29 588.76 R-Squared = .855 Adjusted R-Squared = .825 Here is how you would set up a database for a repeated measures design 1. Arrange groups in columns so that group one has data in column 1, group 2 in column 2 and so on. 2. Specify the IV in PASW. 3. Define the groups by specifying which column belongs to which group. 4. Click on OK. Use repeated measures option Group data are in columns Give the variable a name and indicate the number of groups (3 in this case) Click on add to get this popup. Results are significant. We can say that there are mean differences between the groups but we cannot say which pairs of groups differ. Tests of Within-Subjects Effects Measure:MEASURE_1 Source Type III Sum of Squares Drug Always interpret using the GreenhouseGeisser. Error(Drug) df Mean Square Sig. Sphericity Assumed 5806.333 2 2903.167 102.344 .000 Greenhouse-Geisser 5806.333 1.276 4549.675 102.344 .000 Huynh-Feldt 5806.333 1.519 3821.923 102.344 .000 Lower-bound 5806.333 1.000 5806.333 102.344 .000 Sphericity Assumed 283.667 10 28.367 Greenhouse-Geisser 283.667 6.381 44.455 Huynh-Feldt 283.667 7.596 37.344 Lower-bound 283.667 5.000 56.733 Tests of Within-Subjects Effects Measure:MEASURE_1 Source Drug F Partial Eta Noncent. Observed Squared Parameter Power a Sphericity Assumed .953 204.689 1.000 Greenhouse-Geisser .953 130.613 1.000 Huynh-Feldt .953 155.483 1.000 Lower-bound .953 102.344 1.000 a. Computed using alpha = .05