Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Statistics for Education Research Lecture 1 Scales/Graph/Central Tendency Instructor: Dr. Tung-hsien He [email protected] Types of Data: 1. Nominal [名詞性] Scales 2. Ordinal (序數) Scales 3. Interval (區間) Scales 4. Ratio [比例] Scales Population vs. Samples Types of Graphs Distribution of Frequency 1. Shape 2. Kurtosis [峭度]/Skewness [偏度] Types of Scores in Distribution of Frequency 1. Percentiles (百分位數]: 2. Percentile Rank (PR]: [百分等級) Measures of Central Tendency 1. Mode [眾數] : 2. Median [中位數]: 3. Mean [平均數]: Measure of Variation 1. Variation [變異數] 2. Standard Deviation: SD [標準差] 3. Standard Scores [標準分數]: e.g., t scores; z scores Nominal [名詞性] Scales: Demographic Variables 1. Objects are classified based on defined characteristics (e.g.: genders); 2. No logical ordering among categories; 3. Cases or categories are mutually exclusive. Ordinal (序數) Scales: 1. Objects are classified and given a logical order among categories (e.g.: letter grades; 特優、優); 2. Cases or categorizes are mutually exclusive; 3. Differences in categories are not in any equal unit (e.g.: the difference between Grade A & B may/may not be the same as that between Grade B & C). Interval (Equal Unit) (區間) Scales: 1. Objects are classified and given a logical order (Temperatures measured by thermometers); 2. Differences between levels of categories are in identical units; 3. Point 0 is only a point on the scale rather than its starting point; 4. Categories are mutually exclusive. Ratio [比例] Scales: 1. The highest level in the hierarchy of measurement; 2. Objects are classified and given a logical order; 3. Differences between levels of categories are in identical units; 4. Point 0 reflects an absence of the characteristic (a true zero point); 5. Data show the proportional amounts of the characteristic. Differences in Interval & Ratio Data: E.g.: 1. Weight scales give ratio data because 25KG is twice lighter than 50 KG. 0 kg means “0”. 2. Thermometers give interval data because 25F is not twice cooler than 50F. 0 degree does not mean “no temperature”. Population: All members in a specific group Sample: A subset of the population Parameter: Characteristic, for instance, mean of a population Statistic: Characteristic, for instance, means of a sample Notation Systems: 1. Greek for Population measure; Roman for Sample 2. for population mean; sample means = x Why sample means rather than population means? Descriptive vs. Inferential Statistics 1. Descriptive Statistics: Classify/Summarize Numerical data 2. Inferential Statistics: Make generalizations about a population by studying a sample drawn from this population Bar Graphs (長條圖): Relation between two variables (X axis: Independent Variable; Y axis: Dependent) measured by nominal scales (p. 36) Scattergrams (分散圖) (Scatterplot): Relation between two quantitative variables (p. 38) Frequency Distribution Graphs (次數分佈圖) (Three-Quarter-High Rule) 1. Histograms (長條統計圖) : Using exact limits & Bars are not separated (p. 39) 2. Frequency Polygons (次數分佈多邊圖): Using midpoint (p. 40) 3. Ogive (頻率曲線圖) (cumulative frequency polygon) (p. 42): 1. Shapes of Frequency Distributions (p. 44) Uniform (Rectangular) Distribution: Positively Skewed Distribution (Skewed to the right) Negatively Skewed Distribution: Normal Distribution (Mesokurtic): Leptokurtic Distribution [高狹峰] -> A non-normal distribution with more outliers [極端值]. than a normal distribution Platykurtic Distribution: [低濶峰分配] -> A nonnormal distribution with fewer outliers than a normal distribution [極端值]. Bimodal Distribution (雙峰分佈): If many outliers are clustered at the two extreme ends of a distribution, it will form a bimodal distribution. 2. If there are too many outliers, the distribution will not be normalized. In other words, many stat techniques may not yield trustworthy data. 3. How to locate outliers? Just look at the frequency graphs to find the “lone wolves”. 4. Degrees of Peakedness: A computed value that shows the degrees of homogeneity of a set of data; the higher the score, the more homogeneous the data are. Kurtosis (峭度) & Skewness (偏度) are used to measure “Normality”. Kurtosis & Skewness 1. Expected Values: 1~-1 Normal Distribution =0 is perfect score for normality 2. Kurtosis >0: Leptokurtic Distribution 3. Kurtosis <0: Platykurtic Distribution 4. Skewness >0: Positively skewed 5. Skewness <0: Negatively Skewed 6. Some researchers use 3~-3 to indicate normal distributions To describe a distribution is to indicate its shape, average score (mean score), and variation. Graphs determine shapes, means indicate central tendency, and variations indicates how widely scores are spread. Percentiles (百分位數]: 1. the point (scores) in the distribution at or below which a give percentage of scores is found; 2. For example, 75th percentile of 180 scores means the scores at or below which 75 percent (3/4) of the entire scores will fall. Percentile Rank (Different from Percentile:某一分 數之百分等級值; PR): 1. the percentage of scores less than or equal to that score 2. PR63 means the percentage of scores in the distribution that falls at or below a score of 63: [分 數為63分之PR值是. . . ] Example: 基測250分之PR值為98 -> PR250=98 2. At the center of a normal distribution, percentile scores would underestimate actual differences: 基測PR值為50 之分數為200分，基測PR值為51之 分數可能為210分，實際量尺分數差距為10分， 但在PR值上只差距1分。 3. Percentiles can not be used to determine differences in ranks: The difference in two together percentiles does not represent the difference in two interval scores ->基 測PR值為50 之分數為200分，基測PR值為51之分 數為210分，在PR值上只差距1分，但是實際量尺 分數差距為10分。 Measures of Central Tendency: 1. Mode [眾數] : a. the most frequent score in a distribution b. Simplest index of central tendency 2. Median [中位數]: a. the 50th percentile b. if the number of scores is odd, choose the middle one; if the number of scores is even, compute the mean of the two central scores 3. Mean: X bar: x a. Arithmetic Average b. Formula: See p. 60 c. Properties: (See Table 3.4, p. 61) (1) the sum of deviation of all scores from the mean is 0 (2) the sum of squares of the deviation from the mean is smaller than that of any other value in the distribution Comparisons of Measures of Central Tendency 1. Mode Nominal Data: 出現頻率最多數 Median & Mode Ordinal Data All Three Interval or Ratio 2. Mode/Median vs. Sample Sizes Rule of Thumb: the bigger, the better! Measures of Variation: Lengths of intervals that indicate the spread of scores in a distribution 1. Mean Deviation (變異數] : (1) Variance for Population: 2 = SS/N = (xi)2/N (2) Variance for Sample: s2 (標準差]= SS/N-1 = (xi)2/N-1 2. Standard Deviation (標準差]: (square root of the variance): Formula on p. 74 (1) the square root of the variance (2) the mean of the distances of all scores to the mean. (3) the greater the value is, the great variability of scores around the mean is Standard Scores: 1. z score: Formula p. 75 2. Meanings: to indicate how many units of SD which a raw score falls either above or below from the mean. 3. Properties: (1) Shape of standard scores is identical to that of raw scores; (2) Mean of z-score distribution is 0; (3) Values of variance & SD are 1. 4. T score: 10 z + 50 (Why t score?)