Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure of workshop Lecture/presentation – focus on concepts, brief review Practical exercises – using the most common and accessible software – Excel; Hands-on, to maximise transfer of knowledge and develop skill Discussion and interpretation of sample studies Collaborative setting • If you don’t know how to do something, seek help. • If you know how to do something, provide help. • If you’re not sure, interact. Drawing knowledge from reality Measurement Theory Data Statistical Theory Statistics allows us to draw knowledge or conclusions from the data. Measurement theory allows us to draw meaningful data from reality. What is Measurement? A formal definition: ‘Measurement may be regarded as the construction of homomorphisms (scales) from empirical relational structures of interest into numerical relational structures that are useful.’ (Krantz et al., 1971, p.9) In other words: Measurement is a process where a variable (or construct) can be converted into a number in a consistent manner. Data and Models DATA MODELS Observations Measurements Sensory perceptions Theories Interpretations Generalisations Data and Models DATA What you see (observed) MODELS What Google Maps says Data and Models If there is a mismatch between data and model, which is more likely to be wrong? Data cannot be changed but the methods to collect data can be improved to increase the quality of data. Drawing knowledge from reality We can do statistical analysis ONLY AFTER we’re confident that our data is reliable Measurement Theory Data Statistical Theory Better data = better fitting models Better fitting models = better understanding of reality Fundamentals: Statistics • Statistics is the study of data. • It concerns with the: – – – – Collection Analysis Presentation Interpretation of data. Data in education research • In educational context – records are usually students or schools or parents – variables are usually • responses of the students to the test items or • responses of students or school principals or parents to the questionnaire items Data: values for variables & records • In the educational data – responses to a particular item from all respondents form the values for the corresponding variable • a column of values in our imaginable table – responses from a particular respondent to all items form the values for the corresponding record • a row of values in our imaginable table Levels of measurement • Nominal: Denote a category; statistics include counts such as mode and frequency distributions • Ordinal: Rank order is described but successive categories do not denote equal differences of the measured attribute; statistics include median • Interval: Where the measurement is presumed to denote equal intervals between scores. Both the base point and unit of measurement are arbitrary • Ratio: Note that ratio scales have a natural base value that cannot be changed (i.e., a zero in one unit means the same in all other units). Only the unit of measurement is arbitrary. Non-metric -categorical measures which describe differences in type or kind; arithmetical operations are not applicable Metric -- continuous measures which reflect differences in amount or degree In a nutshell Level of measurement has direct implications for how relationships within and between variables can be contained and identified Levels of measurement and measures of distribution characteristics Level Central tendency Spread Nominal Mode Percent distribution Ordinal Median Mode Minimum/Maximum Range Percentiles Percent distribution Interval/Ratio Mean Median Mode Variance Standard deviation Minimum/Maximum Range Percentiles Percent distribution Measures of central tendency Level Definition Example Mode The attribute of a variable that occurs most often in the data set Variable = Nationality Mode = Indonesian Median The value of the middle case when the cases have been placed in order or in line from low to high Variable = Rank (1st, 2nd, 3rd, … 7th) Median = 5th Mean The arithmetic mean or average. Computed as the sum of all the valid cases together and dividing by the number of valid cases. Variable = Age Mean = 24.35 Levels of measurement determine the possible statistical analyses INCOME Nominal 1.00 2.00 – Cross-tabulations – Chi-square – Frequencies 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 Total Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count MARITAL STATUS SINGLE MARRIED 7 64 16 55 3 36 9 30 5 74 18 61 13 71 19 65 7 39 10 36 11 25 8 28 17 19 8 28 12 7 4 15 12 2 3 11 12 3 3 12 99 340 99 340 Total 71 71 39 39 79 79 84 84 46 46 36 36 36 36 19 19 14 14 15 15 439 439 Levels of measurement determine the possible statistical analyses Ordinal – Spearman correlations – Non-parametric analyses Interval and ratio – Pearson correlations – Parametric analyses Measures of central tendency • Mode [=MODE(target range)] • Median [=MEDIAN(target range)] • Mean [=AVERAGE(target range)] • Lets try that using the sample data! – TIMSS Country_X Grade 8 data.xlsx Measures of spread • Standard deviation [=STDEV(target range)] • Min [=MIN(target range)] • Max [=MAX(target range)] • Percentiles [=PERCENTILE(target range, kth percentile)] where k ranges from 0.00 to 1.00 Characteristics of a distribution Skewness: a measure of the asymmetry of a distribution. [=SKEW(target range)] The normal distribution is symmetric and has a skewness value of zero. – Positive skewness: a long right tail. – Negative skewness: a long left tail. Characteristics of a distribution Kurtosis: A measure of the extent to which observations cluster around a central point. • For a normal distribution, the value of the kurtosis statistic is zero. • [=KURT(target range)] – Leptokurtic data values are more peaked (positive kurtosis) – Platykurtic data values are flatter and more dispersed along the X axis (negative kurtosis) Measures of spread • Frequencies [=FREQUENCY(target range, groups)] where bin ranges are groups that includes values less than and up to each bin value – – – – – – – Bins = 10, 20, 30, will result in 4 groups (bin +1) Group 1= less than or equal to 10 Group 2= 11 to 20 Group 3= 21 to 30 Group 4= more than 30 Enter as array formula Write the formula in the first cell of the output range, select output range equal to number of groups, press F2, then CTRL+SHIFT+ENTER) • Percent distribution can be computed by dividing Frequencies with total cases [=COUNTIF(target range, value)] • Lets try that using the sample data! Practical exercise! • TIMSS Country_X grade 8.xlsx • Complete the Descriptive statistics for Boys, Girls, and the whole sample • Save your results as we will use them in later sessions. Analysing and Understanding Learning Assessment for Evidence-based Policy Making Inferential statistics and Hypothesis testing Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Confidence intervals Standard error of the mean The standard error is an indicator of how precise the statistic is, and how close it is ‘probabilistically’ to the parameter (e.g., the true mean). Confidence intervals are based on the SE 𝑆𝑆𝑆𝑆𝑋𝑋� = 𝑠𝑠 𝑛𝑛 𝐶𝐶𝐶𝐶lower = 𝑋𝑋� − 𝑍𝑍(𝑆𝑆𝑆𝑆𝑋𝑋� ) 𝐶𝐶𝐶𝐶upper = 𝑋𝑋� + 𝑍𝑍(𝑆𝑆𝑆𝑆𝑋𝑋� ) Z=1.96 corresponds to a 95% CI Confidence intervals Confidence intervals Confidence intervals • Lets try computing SEs and CIs with data! • Means and SDs for boys and girls on Math achievement • Standard error of the means • Confidence intervals Inferential Statistics Estimating population parameters Inferential statistics can show how closely the sample statistics approximate parameters of the overall population. • The sample is randomly chosen and representative of the total population. • The means we might obtain from an infinite number of samples form a normal distribution. Source: Johnson, B. & Christensen, L. (2012). Educational Research: Quantitative, Qualitative, and Mixed Approaches. Thousand Oaks, CA: Sage. Inferential Statistics What can we say if we have a sample, and it’s confidence interval does not overlap with the confidence interval of another sample? Inferential Statistics Testing Hypotheses (1) Research hypothesis vs. statistical hypothesis Statistical hypothesis testing: comparing the distribution of data collected by a researcher with an ideal, or hypothetical distribution - significance level/alpha (α): e.g., .05, .01 must be set before testing! - “statistically significant” means there is sufficient evidence to reject the null hypothesis. - it does NOT mean that the alternative hypothesis is true. Statistical hypothesis testing Testing Hypotheses (2) Making errors in hypothesis testing - Type I error: alpha error, or false positive - Type II error: beta error, or false negative Relationship between α and β Statistical hypothesis testing Our conclusions Not guilty Guilty Reality Not guilty Correct conclusion Guilty Type I error (false positive) Type II error (false negative) Correct conclusion Sometimes, false positive is worse Not guilty Guilty Reality “It’s better for ten guilty men to go free than for one innocent man to be executed” -- anonymous wise lawyer Our conclusions Not guilty Guilty Correct conclusion Condemned innocent Freed guilty Correct conclusion In fire alarms, false positive is better Our conclusions No fire Fire Reality A false negative would REALLY suck. -- anonymous wise homeowner No fire Fire Correct conclusion Unnecessary panic Burned to death Correct conclusion Statistical hypothesis testing Testing for significance • Remember: statistical significance is NOT the same as practical significance! • In inferential statistics, we are only concerned with statistical significance. Practical significance is a judgment call to be made by the researcher and audience. Statistical hypothesis testing Making errors in hypothesis testing Increase the power of a statistical test • • • Use as large a sample size as is reasonably possible Maximize the validity and reliability of your measures. Use parametric rather than non parametric statistics whenever possible. • Whenever we test more than one statistical hypothesis, we increase the probability of making at least one Type 1 error. • For multiple hypotheses, a correction (e.g., Bonferroni) needs to be applied to our statistical tests. Statistical hypothesis testing Steps in Hypothesis Testing 1. State the null and alternative hypotheses. 2. Set the significance level before the research study. (Most educational researches use .05 as the significance level. Note that the significance level is also called the alpha level or more simply, alpha.) 3. Obtain the probability value using a computer program such as SPSS . 4. Compare the probability value to the significance level and make the statistical decision. Rule 1:If probability value is less than alpha, reject the null hypothesis. Conclude that finding is significant. Rule 2: If probability value is greater than alpha, fail to reject the null hypothesis. Conclude that the finding is not significant. 5. lnterpret the results. That is, make a substantive real-world decision and determine practical significance. Source: Johnson, B. & Christensen, L. (2012). Educational Research: Quantitative, Qualitative, and Mixed Approaches. Thousand Oaks, CA: Sage. Statistical hypothesis testing Exercise 1 Let’s compare two sets of data and test the hypothesis that their means are equal or different Simple T-Test for independent means [T.TEST(data range 1, data range 2)] Data: data check.xlsx Sets to compare: Science scores for sets A vs B, and B vs D Statistical hypothesis testing Exercise 2 Let’s compare the science achievement of those parents completed only lower secondary (PHEL=4) vs those whose parents have a university degree (PHEL=1), and test the hypothesis that their means are equal or different Simple T-Test for independent means [T.TEST(data range 1, data range 2)] Data: data check.xlsx Sets to compare: Science scores for sets PHEL=4, and PHEL=1