Download introduction to statistics and statistical inference

STATISTICAL TOOLS NEEDED IN ANALYZING TEST RESULTS Prof. Yonardo Agustin Gabuyo Statistics is a branch of science which deals with the collection, presentation, analysis and interpretation of quantitative data. Branches of Statistics Descriptive statistics  methods concerned w/ collecting, describing, and analyzing a set of data without drawing conclusions (or inferences) about a large group Inferential statistics  methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data or population. Examples of Descriptive Statistics  Presenting the Philippine population by constructing a graph indicating the total number of Filipinos counted during the last census by age group and sex  The Department of Social Welfare and Development (DSWD) cited statistics showing an increase in the number of child abuse cases during the past five years. Examples of Inferential Statistics Source: Pilot Training Course on Teaching Basic Statistics by Statistical Research and Training Center Philippine Statistical Association , Inc. A new milk formulation designed to improve the psychomotor development of infants was tested on randomly selected infants. Based on the results, it was concluded that the new milk formulation is effective in improving the psychomotor development of infants. Example Teacher Ron-nick gave a personality test measuring shyness to 25,000 students. What is the average degree of shyness and what is the degree to which the students differ in shyness are the concerns of _________ statistics. A. inferential B. graphic C. correlational D. descriptive Example This is a type of statistics that give/s information about the sample being studied. a. Inferential and co-relational b. Inferential c. Descriptive d. Co relational Inferential Statistics Source: Pilot Training Course on Teaching Basic Statistics by Statistical Research and Training Center Philippine Statistical Association , Inc. Larger Set (N units/observations) Smaller Set (n units/observations) Inferences and Generalizations Types of Variables VARIABLES Qualitative Discrete Quantitative Continuous Qualitative variables  variables that can be express in terms of properties, characteristics, or classification(non-numerical values). Quantitative Variables  variables that can be express in terms of numerical values. a)Discrete- variables that can be express in terms of whole number. b)Continuous- variables that can be express in terms whole number, fraction or decimal number. Levels of Measurement 1. Nominal  Numbers or symbols used to classify 2. Ordinal scale  Accounts for order; no indication of distance between positions 3. Interval scale  Equal intervals; no absolute zero 4. Ratio scale  Has absolute zero Methods of Collecting Data  Objective Subjective Method Method Use of Existing Records Methods of Presenting Data  Textual  Tabular  Graphical Summary Measures Location Variation Skewness Kurtosis Percentile Maximum Minimum Quartile Decile Range Variance Central Tendency Mean Inter-quartile Range Mode Median Coefficient of Variation Standard Deviation Measures of Location A Measure of Location summarizes a data set by giving a “typical value” within the range of the data values that describes its location relative to entire data set. Some Common Measures: Minimum, Maximum Central Tendency Percentiles, Deciles, Quartiles Maximum and Minimum  Minimum is the smallest value in the data set, denoted as MIN.  Maximum is the largest value in the data set, denoted as MAX. Measure of Central Tendency  A single value that is used to identify the “center” of the data it is thought of as a typical value of the distribution precise yet simple most representative value of the data Mean  Most common measure of the center  Also known as arithmetic average Population Mean Sample Mean Properties of the Mean  may not be an actual observation in the data set.  can be applied in at least interval level.  easy to compute.  every observation contributes to the value of the mean. Properties of the Mean subgroup means can be combined to come up with a group mean  easily affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Mean = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 6 Median  Divides the observations into two equal parts.  If n is odd, the median is the middle number.  If n is even, the median is the average of the 2 middle numbers.  Sample median denoted as while population median is denoted as ~ x ~  Properties of a Median  may not be an actual observation in the data set  can be applied in at least ordinal level  a positional measure; not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Mode  the score/s that occurs most frequently  nominal average  computation of the mode for ungrouped or raw data 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 No Mode Properties of a Mode  can be used for qualitative as well as quantitative data  may not be unique  not affected by extreme values  may not exist Mean, Median & Mode Use the mean when:  sampling stability is desired  other measures are to be computed Mean, Median & Mode Use the median when:  the exact midpoint of the distribution is desired  there are extreme observations Mean, Median & Mode Use the mode when:  when the "typical" value is desired  when the dataset is measured on a nominal scale Example Which measure(s) of central tendency is(are) most appropriate when the score distribution is skewed? A. Mode B. Mean and mode C. Median D. Mean Example In one hundred-item test, what does Jay-R’s score of 70 mean? A. He surpassed 70 of his classmate in terms of score B. He surpassed 30 of his classmates in terms of score C. He got a score above mean D. He got 70 items correct Example Which of the following measures is more affected by an extreme score? A. Semi- inter quartile range B. Median C. Mode D. Mean Example The sum of all the scores in a distribution always equals a. The mean times the interval size b. The mean divided by the interval size c. The mean times N d. The mean divided by N Example Teacher B is researching on family income distribution which is symmetrical. Which measure/s of central tendency will be most informative and appropriate? A. Mode B. Mean C. Median D. Mean and Median Example What measure/s of central tendency does the number 16 represent in the following score distribution? 14,15,17,16,19,20,16,14,16? a. Mode only b. Mode and median c. Median only d. Mean and mode Example What is the mean of this score distribution: 40, 42, 45, 48, 50, 52, 54, 68? a. 51.88 b. 50.88 c. 49.88 d. 68 Example Which is the correct about MEDIAN? a. It is measure of variability b. It is the most stable measure of central tendency c. It is the 50th percentile d. It is significantly affected by extreme values Example Which measure(s) of central tendency can be determined by mere inspection? a. Median b. Mode c. Mean d. Mode and Median Example Here is a score distribution: 98,93,93,93,90,88,88,85,85,85,86, 70,70,51,34,34,34,, 20,18,15,12,9,8,3,1. Which is a characteristics of the scores distribution? A. Bi-modal B. Tri-modal C. Skewed to the right D. No discernible pattern Example Which is true of a bimodal score distribution? a. the group tested has two identical scores that appeared most. b. the scores are either high or low. c. the scores are high. d. the scores are low. Example STUDY THE TABLE THEN ANSWER THE QUESTION: Scores 0-59 60-69 70-79 80-89 90-100 Percent of Students 2% 8% 39% 38% 13% In which scores interval is the median? a. In the interval 80 to 89 b. In between the intervals of 60-69 and 70-79 c. In the interval 70-79 d. In the interval 60-69 How many percent of the students got a score below 70? a. 2% b. 8% c. 10% d. 39% Percentiles  Numerical measures that give the relative position of a data value relative to the entire data set.  Percentage of the students in the reference group who fall below student’s raw score. Divides the scores in the distribution into 100 equal parts (raw data arranged in increasing or decreasing order of magnitude).  The jth percentile, denoted as Pj, is the data value in the data set that separates the bottom j% of the data from the top (100-j)%. EXAMPLE Suppose JM was told that relative to the other scores on a certain test, his score was the 97th percentile.  This means that 97% of those who took the test had scores less than JM’s score, while 3% had scores higher than JM’s. Deciles Divides the scores in the distribution into ten equal parts, each part having ten percent of the distribution of the data values below the indicated decile.  The 1st decile is the 10th percentile; the 2nd decile is the 20th percentile…..  9th decile is the 90th percentile. Quartiles  Divides the scores in the distribution into four equal parts, each part having 25% of the scores in the distribution of the data values below the indicated quartile.  The 1st quartile is the 25th percentile; the 2nd quartile is the 50th percentile, also the median and the 3rd quartile is the 75th percentile. Example Robert Joseph’s raw score in the mathematics class is 45 which equal to 96th percentile. What does this mean? a. 96% of Robert Joseph’s classmates got a score higher than 45. b. 96% of Robert Joseph’s classmates got a score lower than 45. c. Robert Joseph’s score is less than 45% of his classmates. d. Roberts Joseph’s is higher than 96% of his classmates. Example Which one describes the percentile rank of a given score? a. The percent of cases of a distribution below and above a given score. b. The percent of cases of a distribution below the given score. c. The percent of cases of a distribution above the given score. d. The percent of cases of a distribution within the given score. Example Biboy obtained a score of 85 in Mathematics multiple choice tests. What does this mean? a. He has a rating of 85 b. He answered 85 items in the test correctly c. He answered 85% of the test item correctly d. His performance is 15% better than the group Example Median is the 50th percentile as Q3 is to a. 45th percentile b. 70th percentile c. 75th percentile d. 25th percentile Example Karl Vince obtained a NEAT percentile rank of 95. This means that a. They have a zero reference point b. They have scales of equal units c. They indicate an individual’s relative standing in a group d. They indicate specific points in the normal curve Example Markie obtained a NEAT percentile rank of 95. This means that a. He got a score of 95. b. He answered 95 items correctly. c. He surpassed in performance of 95% of his fellow examinees. d. He surpassed in performance 0f 5% of his fellow examinees. Example What is/are important to state when explaining percentile-ranked tests to parents? I. What group took the test II. That the scores show how students performed in relation to other students III. That the scores show how students performed in relation to an absolute measure A. II only B. I & III C. I & II D. III only Measures of Variation A measure of variation is a single value that is used to describe the spread of the distribution.  A measure of central tendency alone does not uniquely describe a distribution. A look at dispersion… Pilot Source: Training Course on Teaching Basic Statistics by Statistical Research and Training Center Philippine Statistical Association , Inc. Section A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.338 20 21 Mean = 15.5 s = .9258 Section B 11 12 13 14 15 16 17 18 19 Section C 11 12 13 14 Mean = 15.5 s = 4.57 15 16 17 18 19 20 21 Two Types of Measures of Dispersion Absolute Measures of Dispersion:  Range  Inter-quartile Range  Variance  Standard Deviation Relative Measure of Dispersion:  Coefficient of Variation Range (R) The difference between the maximum and minimum value in a data set, i.e. R = MAX – MIN Example: Scores of 15 students in mathematics quiz. 54 58 58 60 62 65 66 71 74 75 77 78 80 82 85 R = 85 - 54 = 31 Some Properties of the Range The larger the value of the range, the more dispersed the observations are.  It is quick and easy to understand.  A rough measure of dispersion.  Inter-Quartile Range (IQR) The difference between the third quartile and first quartile, i.e. IQR = Q3 – Q1 Example: Scores of 15 students in mathematics quiz. 54 74 58 75 58 60 62 65 66 71 77 79 82 82 85 IQR = 78 - 61 = 17 Some Properties of IQR  Reduces the influence of extreme values.  Not as easy to calculate as the Range.  Consider only the middle 50% of the scores in the distribution Quartile deviation (QD) is based on the range of the middle 50% of the scores, instead of the range of the entire set. it indicates the distance we need to go above and below the median to include approximately the middle 50% of the scores. Variance  important measure of variation  shows variation about the mean Population variance Sample variance Standard Deviation (SD)  most important measure of variation  square root of Variance has the same units as the original data  is the average of the degree to which a set of scores deviate from the mean value it is the most stable measures of variability Population SD Sample SD Computation of Standard Deviation Data: 10 12 14 15 17 students in mathematics quiz. n=8 18 18 24 are the scores of Mean =16 2 2 2 2 2 2 2 2 (10 16)  (12 16)  (14 16)  (15 16)  (17 16)  (18 16)  (18 16)  (24 16) s 7  4.309 Remarks on Standard Deviation  If there is a large amount of variation, then on average, the data values will be far from the mean. Hence, the SD will be large.  If there is only a small amount of variation, then on average, the data values will be close to the mean. Hence, the SD will be small. Comparing Standard Deviation Section A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.338 Section B 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = .9258 Section C 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Comparing Standard Deviation Example: Team A - Heights of five marathon players in inches 65” Mean = 65 S =0 65 “ 65 “ 65 “ 65 “ 65 “ Comparing Standard Deviation Example: Team B - Heights of five marathon players in inches Mean = 65” s = 4.0” 62 “ 67 “ 66 “ 70 “ 60 “ Properties of Standard Deviation  It is the most widely used measure of dispersion. (Chebychev’s Inequality)  It is based on all the items and is rigidly defined.  It is used to test the reliability of measures calculated from samples.  The standard deviation is sensitive to the presence of extreme values.  It is not easy to calculate by hand (unlike the range). Chebyshev’s Rule  It permits us to make statements about the percentage of observations that must be within a specified number of standard deviation from the mean  The proportion of any distribution that lies within k standard deviations of the mean is at least 1-(1/k2) where k is any positive number larger than 1.  This rule applies to any distribution. Chebyshev’s Rule  For any data set with mean () and standard deviation (SD), the following statements apply:  At least 75% of the observations are within 2SD of its mean.  At least 88.9% of the observations are within 3SD of its mean. Illustration At least 75% At least 75% of the observations are within 2SD of its mean. Example The pre-test scores of the 125 LET reviewees last year had a mean of 70 and a standard deviation of 7 points. Applying the Chebyshev’s Rule, we can say that: 1. At least 75% of the students had scores between 56 and 84. 2. At least 88.9% of the students had scores between 49 and 91. Coefficient of Variation (CV)  measure of relative variation  usually expressed in percent  shows variation relative to mean  used to compare 2 or more groups  Formula :  SD  CV     100%  Mean  Comparing CVs  Group A: Average Score = 90 SD = 15 CV = 16.67%  Group B: Average Score = 92 SD = 10 CV = 10.86% Example Mark Erick is one-half standard deviation above the mean of his group in math and one standard deviation above English. What does this imply? a. He excels in both English and Math. b. He is better in Math than English. c. He does not excel in English nor in Math. d. He is better is English than Math. Example Which statement about the standard deviation is CORRECT? a. The lower the standard deviation the more spread the scores are. b. The higher the standard deviation the less the scores spread c. The higher the standard deviation the more the spread the scores are d. It is a measure of central tendency Example Which group of scores is most varied? The group with________. a. sd = 9 b. sd = 5 c. sd = 1 d. sd = 7 Example Mean is to Measure of Central Tendency as___________ is to measure of variability. a. Quartile Deviation b. Quartile c. Correlation d. Skewness Example HERE ARE TWO SETS OF SCORES: SET A : 1,2,3,4,5,6,7,8,9 SET B : 3,4,4,5,5,6,6,7,9 Which statement correctly applies to the two sets of score distribution? a. The scores in Set A are more spread out than those in set B. b. The range for Set B is 5. c. The range for Set A is 8. d. The scores in Set B are more spread out than those in Set A. Measure of Skewness  Describes the degree of departures of the  distribution of the data from symmetry. The degree of skewness is measured by the coefficient of skewness, denoted as SK and computed as, 3Mean  Median SK  SD What is Symmetry? A distribution is said to be symmetric about the mean, if the distribution to the left of mean is the “mirror image” of the distribution to the right of the mean. Likewise, a symmetric distribution has SK=0 since its mean is equal to its median and its mode. Measure of Skewness SK > 0 positively skewed SK < 0 negatively skewed Areas Under the Normal Curve Correlation refers to the extent to which the distributions are related or associated. the extent of correlation is indicated by the numerically by the coefficient of correlation. the coefficient of correlation ranges from -1 to +1. Types of Correlation 1. Positive Correlation a) High scores in distribution A are associated with high scores in distribution B. b) Low scores in distribution A are associated with low scores in distribution B. 2. Negative Correlation a) High scores in distribution A are associated with low scores in distribution B. b) Low scores in distribution A are associated with high scores in distribution B. 3. Zero Correlation a) No association between distribution A and distribution B. No discernable pattern. Positive Correlation Science Score 30 25 20 15 10 5 0 0 5 Math Score 10 15 20 25 Science Score Math Score Negative Correlation Science Math No Correlation Example Skewed score distribution means: a. The scores are normally distributed. b. The mean and the median are equal. c. Consist of academically poor students. d. The scores are concentrated more at one end or the other end Example Skewed score distribution means: a. The scores are normally distributed. b. The mean and the median are equal. c. Consist of academically poor students. d. The scores are concentrated more at one end or the other end Example What would be most likely most the distribution if a class is composed of bright students? a. platykurtic b. skewed to the right c. skewed to the left d. very normal Example All the students who took the examination, got scores above the mean. What is the graphical representation of the score distribution? a. normal curve b. mesokurtic c. positively skewed d. negatively skewed A class is composed of academically poor students. The distribution most likely to be______________. a. skewed to the right b. a bell curve c. leptokurtic d. skewed to the left Z-SCORE In statistics, a standard score (also called zscore) is a dimensionless quantity derived by subtracting the population mean from an individual (raw) score and then dividing the difference by the population standard deviation.  The Z-score reveals how many units of the standard deviation a case is above or below the mean. The z-score allows us to compare the results of different normal distributions, something done frequently in research.  The Standard score is : where X is a raw score to be standardized σ is the standard deviation of the population µ is the mean of the population The quantity z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above. A key point is that calculating z requires the population mean and the population standard deviation, not the sample mean or sample deviation. It requires knowing the population parameters, not the statistics of a sample drawn from the population of interest. N) T-SCORE it is equivalent to ten times the Z-score plus fifty T=10Z + 50 EXAMPLE: Based on the table shown, who performed better, JR or JM? Assume a normal distribution. Student JR JM Raw Score 75 58 Mean 65 52 Standard Deviation 4 2 For JR For JM JM performed better than JR due to a greater value of z. From the previous example, the T-score of JR is T JR = 10(2.5) + 50 = 75 While the T-score of JM is T JM = 10(3) + 50 = 80 Therefore, JM performed better than JR due to higher T-score O) STANINE Stanine (Standard NINE) Is a method of scaling test scores on a nine-point standard scale in a normal distribution. Percentage Distribution 4% 7% 12% 17% 20% 17% 12% 7% 4% Cumulative Percentage Distribution 4% 11% 23% 40% 60% 77% 89% 96% 100% STANINE 1 2 3 4 5 6 7 8 9 Example Study this group of test which was administered to a class to whom Jar-R belongs, then answer the question: Subject Math Physics English PE Mean 56 55 80 75 SD Jay-R’s Score 10 43 9.5 51 11.25 88 9.75 82 In which subject (s) did Jay-R perform most poorly in relation to the group’s mean performance? A. English B. Physics C. PE D. Math Based on the data given , what type of learner is Jay-R? A. Logical B. Spatial C. Linguistic D. Bodily-Kinesthetic Based on the data given , in which subject (s) were scores most widespread? A. Math B. Physics C. PE D. English References Pilot Training Course on Teaching Basic Statistics by Statistical Research and Training Center Philippine Statistical Association , Inc. (Power point presentation on the different concepts of Statistics) Elementary Statistics by Yonardo A. Gabuyo et. al. Rex Book Store Assessment of Learning I and II by Dr. Rosita De Guzman-Santos, LORIMAR Publishing, 2007 Ed. Measurement and Evaluation Concepts and Principles by Abubakar S. Asaad and Wilham M. Hailaya, Rex Book Store LET Reviewer by Yonardo A. Gabuyo, MET Review Center

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download introduction to statistics and statistical inference