• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Gibbs sampling wikipedia, lookup

Transcript
```Statistics for Education
Research
Lecture 1
Scales/Graph/Central Tendency
Instructor: Dr. Tung-hsien He
[email protected]
Types of Data:
1. Nominal [名詞性] Scales
2. Ordinal (序數) Scales
3. Interval (區間) Scales
4. Ratio [比例] Scales
 Population vs. Samples
 Types of Graphs
 Distribution of Frequency
1. Shape
2. Kurtosis [峭度]/Skewness [偏度]

Types of Scores in Distribution of Frequency
1. Percentiles (百分位數]:
2. Percentile Rank (PR]: [百分等級)
 Measures of Central Tendency
1. Mode [眾數] :
2. Median [中位數]:
3. Mean [平均數]:

Measure of Variation
1. Variation [變異數]
2. Standard Deviation: SD [標準差]
3. Standard Scores [標準分數]: e.g., t scores; z scores

Nominal [名詞性] Scales: Demographic Variables
1. Objects are classified based on defined
characteristics (e.g.: genders);
2. No logical ordering among categories;
3. Cases or categories are mutually exclusive.
 Ordinal (序數) Scales:
1. Objects are classified and given a logical order
among categories (e.g.: letter grades; 特優、優);
2. Cases or categorizes are mutually exclusive;

3. Differences in categories are not in any equal unit
(e.g.: the difference between Grade A & B may/may
not be the same as that between Grade B & C).
 Interval (Equal Unit) (區間) Scales:
1. Objects are classified and given a logical order
(Temperatures measured by thermometers);
2. Differences between levels of categories are in
identical units;
3. Point 0 is only a point on the scale rather than its
starting point;
4. Categories are mutually exclusive.
 Ratio [比例] Scales:
1. The highest level in the hierarchy of measurement;
2. Objects are classified and given a logical order;
3. Differences between levels of categories are in
identical units;
4. Point 0 reflects an absence of the characteristic (a
true zero point);
5. Data show the proportional amounts of the
characteristic.
 Differences in Interval & Ratio Data:
E.g.:
1. Weight scales give ratio data because 25KG is
twice lighter than 50 KG. 0 kg means “0”.
2. Thermometers give interval data because 25F is
not twice cooler than 50F. 0 degree does not mean
“no temperature”.
Population: All members in a specific group
 Sample: A subset of the population
 Parameter: Characteristic, for instance, mean of a
population
 Statistic: Characteristic, for instance, means of a
sample
 Notation Systems:
1. Greek for Population measure; Roman for Sample
2.  for population mean; sample means = x
 Why sample means rather than population means?

Descriptive vs. Inferential Statistics
1. Descriptive Statistics: Classify/Summarize
Numerical data
2. Inferential Statistics: Make generalizations about a
population by studying a sample drawn from this
population



Bar Graphs (長條圖):
Relation between two variables (X axis:
Independent Variable; Y axis: Dependent) measured
by nominal scales (p. 36)
Scattergrams (分散圖) (Scatterplot):
Relation between two quantitative variables (p. 38)
Frequency Distribution Graphs (次數分佈圖)
(Three-Quarter-High Rule)
1. Histograms (長條統計圖) : Using exact limits &
Bars are not separated (p. 39)
2. Frequency Polygons (次數分佈多邊圖): Using
midpoint (p. 40)

3. Ogive (頻率曲線圖) (cumulative frequency polygon)
(p. 42):
1. Shapes of Frequency Distributions (p. 44)
 Uniform (Rectangular) Distribution:
 Positively Skewed Distribution (Skewed to the right)
 Negatively Skewed Distribution:
 Normal Distribution (Mesokurtic):
 Leptokurtic Distribution [高狹峰] -> A non-normal
distribution with more outliers [極端值]. than a
normal distribution


Platykurtic Distribution: [低濶峰分配] -> A nonnormal distribution with fewer outliers than a
normal distribution [極端值].
Bimodal Distribution (雙峰分佈):
If many outliers are clustered at the two extreme
ends of a distribution, it will form a bimodal
distribution.
2. If there are too many outliers, the distribution will
not be normalized. In other words, many stat
techniques may not yield trustworthy data.
3. How to locate outliers? Just look at the frequency
graphs to find the “lone wolves”.
4. Degrees of Peakedness: A computed value that
shows the degrees of homogeneity of a set of data;
the higher the score, the more homogeneous the
data are. Kurtosis (峭度) & Skewness (偏度) are
used to measure “Normality”.
Kurtosis & Skewness
1. Expected Values: 1~-1
Normal Distribution =0 is perfect score for
normality
2. Kurtosis >0: Leptokurtic Distribution
3. Kurtosis <0: Platykurtic Distribution
4. Skewness >0: Positively skewed
5. Skewness <0: Negatively Skewed
6. Some researchers use 3~-3 to indicate normal
distributions

To describe a distribution is to indicate its shape,
average score (mean score), and variation. Graphs
determine shapes, means indicate central
tendency, and variations indicates how widely
 Percentiles (百分位數]:
1. the point (scores) in the distribution at or below
which a give percentage of scores is found;
2. For example, 75th percentile of 180 scores means
the scores at or below which 75 percent (3/4) of the
entire scores will fall.

Percentile Rank (Different from Percentile:某一分

1. the percentage of scores less than or equal to that
score
2. PR63 means the percentage of scores in the
distribution that falls at or below a score of 63: [分

 Example: 基測250分之PR值為98 -> PR250=98

2. At the center of a normal distribution, percentile
scores would underestimate actual differences:

3. Percentiles can not be used to determine
differences in ranks:
The difference in two together percentiles does not
represent the difference in two interval scores ->基

Measures of Central Tendency:
1. Mode [眾數] :
a. the most frequent score in a distribution
b. Simplest index of central tendency
2. Median [中位數]:
a. the 50th percentile
b. if the number of scores is odd, choose the middle
one; if the number of scores is even, compute the
mean of the two central scores

3. Mean: X bar: x
a. Arithmetic Average
b. Formula: See p. 60
c. Properties: (See Table 3.4, p. 61)
(1) the sum of deviation of all scores from the mean
is 0
(2) the sum of squares of the deviation from the
mean is smaller than that of any other value in the
distribution
Comparisons of Measures of Central Tendency
1. Mode  Nominal Data: 出現頻率最多數
Median & Mode  Ordinal Data
All Three  Interval or Ratio
2. Mode/Median vs. Sample Sizes
Rule of Thumb: the bigger, the better!

Measures of Variation: Lengths of intervals that
indicate the spread of scores in a distribution
1. Mean Deviation (變異數] :
(1) Variance for Population: 2 = SS/N =  (xi)2/N
(2) Variance for Sample: s2 (標準差]= SS/N-1 = 
(xi)2/N-1

2. Standard Deviation (標準差]: (square root of the
variance): Formula on p. 74
(1) the square root of the variance
(2) the mean of the distances of all scores to the
mean.
(3) the greater the value is, the great variability of
scores around the mean is
Standard Scores:
1. z score: Formula p. 75
2. Meanings: to indicate how many units of SD which a
raw score falls either above or below from the mean.
3. Properties:
(1) Shape of standard scores is identical to that of
raw scores;
(2) Mean of z-score distribution is 0;
(3) Values of variance & SD are 1.
4. T score: 10 z + 50 (Why t score?)

```
Related documents