Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics made simple Dr. Jennifer Capers Modified from Dr. Tammy Frank’s presentation, NOVA Why do we need statistics? • Example: – Chemical may increase growth of animal – Will be tested on housefly – A colony of 20,000 houseflies are divided into 2 groups – Group 1 gets chemical in food – Group 2 gets a placebo in same food What comes next? • 2 weeks later – take random sample of 25 house flies from each group, measure wingspan • What are the results? Housefly results • 25 houseflies from each group • Group 1 (with chemical) – 7.5mm wingspan • Group 2 (without) – 7.2mm wingspan • What does this mean? • Are group 1 flies really bigger? • Some might say yes, some might say no • Did you, by chance, happen to pick some larger flies from group 2? • Was there sampling error or bias? • One way to be sure is to measure all 20,000 flies……not feasible • So what do we do? Statistics • You say the flies are bigger, I say not • Statistics provide rules to help us find out • Statistics will help tell us if these are significant (real) differences • Is there bias? Where bigger ones in group 2 picked by chance? • Statistics will tell us what the chances are that the results are due to sampling bias or random chance Significant Difference • Real difference • Due to chemical, not chance • If test shows probability of getting results by chance or random error is <5%, we accept claim that chemical produced larger fly • If test shows that the probability of getting results by chance or random error is >5%, we reject claim that chemical produced larger fly • 5% is arbitrary cut-off point that is generally accepted • However, if the cost of making an incorrect decision is very high, there will be higher cutoff like 1% » such as research with cancer drugs, etc. • Probability value is the p-value • Measure of probability that the pattern we see in our data is due to sampling error or random chance Scientific Method • Remember that we cannot “prove” anything. We can only accept or reject a hypothesis • A theory is the closest that a biologist can come to “proving” a hypothesis • Supported and validated by data and scientific community Null and Alternative Hypotheses • For any experiment/survey/study, there must be a null hypothesis and an alternative hypothesis • Set up so that one of them must be true, and one must be false • Null hypothesis (H0): = or ≤ or ≥ • Example: – The average weight of hermit crab group A is the same as that of hermit crab group B (=) – OR – The average weight of hermit crab group A is the same or greater than that of hermit crab group B (≥) – OR – The average weight of hermit crab group A is the same or less thank that of hermit crab group B (≤) If null is true, then alternative must be false • Ho: average weight of hermit crab group A = average weight of group B • HA: average weight of hermit crab group A ≠ average weight of group B Two-tailed hypotheses • Use if you have no expectations – You are trying to find out if weights are different but have no reason for them to be • Ho : average weight of hermit crab group A = average weight of group B • HA: average weight of hermit crab group A ≠ average weight of group B One-tailed hypothesis • Use if you have an expectation of the outcome, based on previous studies or information • For example, previous studies have demonstrated that Group A area has more hermit crab food that Group B • Ho: average weight of hermit crab group A ≤ average weight of group B • HA: average weight of hermit crab group A › average weight of group B • Alternative hypothesis corresponds to what you expect • Always reject or accept the null hypothesis, never reject the alternative • If you accept or support the null, then don’t mention the alternative • If you reject the null, then accept or support the alternative • We never prove a hypothesis • We just gain a measure of how confident we are with our hypothesis p-value • The measure of the probability that the pattern we see in our data is due to random chance or sampling error • 0.05 is the value most commonly used • If p-value is ›0.05 (high p-value), accept null » Weight is not significantly different • If p-value is ≤0.05 (low p-value), reject null and accept alternative » Weight is significantly different Important terms: x = measurement value ∑ = sum of n = sample size df = degrees of freedom = n – 1 X = mean or average = ∑x/n √ s2 = Standard deviation = average distance from mean • s2 = Variance = mean of sum of squares • • • • • • • ∑(x – X)2/df • Tells you how much your values varied from mean – Large variance means there is large spread in data, small variance means data points are closer to mean • What test do you use to get p? • Depends on what type of data you are collecting – Measurement variable or nominal variables? • Measurement variables • Something that can be counted or measured • Involves numbers • Examples: length, weight, quantity • What are examples of tests that can be used? t-test • Used to determine if two sets of data have the same mean • Paired t-test – when measurements are linked • Patient before and after using drug • The null would state there is no difference • Unpaired t-test – when you have before and after within 2 different groups • Patients with drug (group 1) and patients without drug (group 2) What do you do when there are more than 2 sets of data? • ANOVA – analysis of variance • Null would state that the means are equal • Example would be if you had 5 groups of patients taking drugs at different dosages per group • Single factor ANOVA • Only vary one parameter – drug dosage • Two factor ANOVA with or without replication • Vary dosage and time of day • Nominal variables • Usually involves categories • A nominal variable is often a word or percentage • Examples: color, sex, genotypes • What are examples of tests that can be used? • Goodness of fitness test • Chi-square • Graphing your results – Using standard deviation bars versus standard error bars – Standard error (SE) = SD divided by square root of sample size http://mathbench.umd.edu/modules/prob-stat_bargraph/page01.htm SD Bars or SE bars on your graph? Standard Deviation – SD Standard Error - SE How far members of the population deviate from the average How far off is your estimate of the mean? Quantifies the population Quantifies your experiment Does NOT depend on sample size DOES depend on sample size (a lot!!!!) Use to characterize the population Use to test your results – see next row Overlap doesn’t necessarily mean insignificance Overlap means insignificance (most of the time) Graphing Data on Excel Enter data points Highlight data set Click on charts Then click on which graph you want This will be the result Right click on one of the lines and then Click on Format Data Series Click on secondary axis And there is your graph Figure Title would go here