* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download If the data is shown to be statistically significant then the data
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Time series wikipedia , lookup
Omnibus test wikipedia , lookup
Student's t-test wikipedia , lookup
Statistical Analysis Scientists analyze data collected in an experiment to look for patterns or relationships among variables. In order to determine that the patterns we observe are real, and not due to chance and our own preconceived notions, we must test the perceived pattern for significance. Statistical analysis allows scientists to test whether or not patterns are real, and not due to chance or preconceived notions of the observer. We can never be 100% sure, but we can set some level of certainty to our observations. A level of certainty accepted by most scientists is 95%. We will be using tests that allow us to say we are 95% confident in our results. Types of Data Quantitative Data – represented by a number Continuous quantitative – measurement scale divisible into partial units Example - Distance in kilometers, volume in liters Discrete quantitative - measurement scale with whole integers only Example- People that can touch their toes, number of wolves born in given year Quantitative data can also be subdivided based on zero point of the measuring scale. Ratio data - measured using standard scale with equal divisible intervals & absolute zero. Example- Temp of a gas on Kelvin scale, Velocity of an object in m/sec Interval data - If the scale does not have absolute zero. Example- Temp of sub on Celsius scale, pH Qualitative Data (Nominal or Ordinal) Nominal - When objects are named or can’t be ranked. Example- Gender (male/female), color of hair (red, black, brown) Ordinal - When objects are placed into categories that can be ranked. Example- (activity of an animal on a scale of 1 to 5), Moh’s hardness scale for minerals Describing data Statisticians describe a set of data in 2 ways 1. Compute a measure of central tendency (number that is most typical of the entire set of data) Mode value that occurs most often (in a tie, use both) Median middle value when ranked highest to lowest Mean mathematical average 2. Describe variation (spread within the data - how closely the individual data points cluster around the mean) For quantitative data – Range (difference between smallest & largest DV), Standard Deviation (σ), Variance σ2 Example - Dog height For qualitative data - Frequency distribution (number of cases falling into each category of variable) Example - color of tomatoes produced with different ground colors Making decisions about descriptive statistics & Graphs Quantitative Data Parameters Type of data Central tendency Variation Degrees of freedom Level of significance Qualitative Ratio data Interval data Nominal data Ordinal data data collected using a scale with equal intervals and with an absolute zero (distance, velocity) using a scale with equal intervals but no absolute zero (temp0C, pH) objects are placed into categories that cannot be ranked (male/female or brown, black, red hair) objects are placed into categories that can be ranked (Moh’s hardness scale or color ranked 1- 10) Mean Mean Mode Median Range Standard deviation Variance Total # of samples -2 (ex. 15+15-2 = 28) Range Standard deviation Variance Total # of samples -2 (ex. 15+15-2 = 28) Frequency Distribution 0.025 0.025 0.05 (#rows –1) (#columns-1) Frequency Distribution x (#rows –1) (#columns-1) x 0.05 Inferential Statistics - to determine if the data is statistically significant. It limits the possibility that the data differences occurred by random chance or due to some unknown, uncontrolled variable. If the data is shown to be statistically significant then the data differences can be explained by changes in the independent variable. Statistical Tests and Graphs 1. The t-test (or Analysis of Variance): when you have two or more groups/sets and you want to compare measurements of each group. 2. The Chi-square test: when you have counts that can be placed into yes or no categories, or other simple categories such as quadrants. 3. The Pearson R Correlation: to test how the values of one event or object relates to the values of another event or object. (for comparing two events such as nighttime temperatures and number of patients in an emergency room) Is Dependent Variable (DV) continuous, ordinal, or nominal? Dependent Variable (DV) Continuous Continuous IV T-test or ANOVA Nominal IV T-test or ANOVA Ordinal IV T-test Scatter plot Bar graph of means Bar graph of means Bar graph of means tDependent Variable (DV) Continuous Dependent Variable (DV) Ordinal Continuous IV Chi-square Nominal IV Mann-Whitney’s test Ordinal IV Spearman’s test Scatter plot Or Histogram Bar of means Scatter plot Bar graph of means tDependent Variable (DV) Continuous Dependent Variable (DV) Nominal Continuous IV T-test or F-test Nominal IV Paired-Mcnemar’s Unpaired-Chi-square Ordinal IV Spearman’s test Bar graph of means Bar graph of proportions Scatter plot Null Hypothesis (μ) - Basically states that there is no difference between the mean of your control group and the mean of your experimental group. Therefore any observed difference between the two sample means occurred by chance and is not significant. If you can reject your null hypothesis then there is a significant difference between your control and experimental groups. Write your null hypothesis here: ________________________________________________________________________________ __________________________________________________________________________________ Level of significance () - It communicates probability of error in rejecting Null hypothesis is affected by sample size. Each test statistic has associated a p-value that would reflect how ‘comfortable’ is the researcher in rejecting the null hypothesis in support of the alternative hypothesis. This ‘comfort zone’ is attained when the p-value of a test statistic is below 0.05. If the p-value is not in the ‘comfort zone’ then it is concluded that there is not statistical evidence to reject the null state. We will use p-value < 0.05 which means that the probability of error in the research is 5/100 (95% results have no error). Degree of freedom (df) - It is number of independent observations in a sample. For t-test df = (n1-1) + (n2-1) For Chi-square test df = (#rows – 1) (#columns – 1) For Pearson R correlation df = (n-2) subtract 2 from the number of comparisons made. The larger the sample (df), smaller the difference between means. The scientists have more confidence in experiments with larger sample size & repeated trials. The influence of the level of significance () can be seen by examining numerical relationships in a horizontal row of the same excerpted t sampling distribution. The smaller the level of significance () & error rate, the larger the difference between means required for significance. Use the tables for the t-test and the Chi-square test to find the table value. Use your calculated degrees of freedom and the Level of Significance of 0.05 (95%) to find the correct value. Determine if the calculated value is greater or less than the table value. If the calculated value is smaller than table value, the Null hypothesis is NOT rejected. When calculated value equals or exceeds the table value, the Null hypothesis is rejected. For t-test: Refer to null hypothesis descriptions for decision to accept or reject the null hypothesis. For Chi-square: If x2 Calculated > x2 Table, then the null hypothesis is rejected. For Pearson R Correlation: If the calculated value is greater than the table value, then reject the null hypothesis. If the r = 0.00 there is zero correlation. If the r = 1.00 there is a perfect correlation. Values can be + or - .Positive values indicate increase in X corresponds to increase in Y. Negative values indicate increases in one value are associated with decreases in the other. What Does it Mean to Accept or Reject the Null Hypothesis? The null hypothesis generally states that there is no significant difference between your two sets of data. If it is accepted, it means that any differences in your data are not significant and probably due to random chance. If the null hypothesis is rejected, it means that there is a significant difference in your two sets of data and these differences are due to the factors (independent variable) that you changed. Rejection of Null hypothesis supports the alternative decision that a true difference exists between means and leads to support the original research hypothesis. Decide whether to accept or reject your null hypothesis. Accept _________ Reject ________