Download Summation Notation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
APPENDIX B
Data Preparation and Univariate Statistics
• How are computer used in data collection and analysis?
• How are collected data prepared for statistical analysis?
• How are missing data treated in statistical analyses?
• When is it appropriate to delete data before they are analyzed?
• What are descriptive statistics and inferential statistics?
• What determines how well the data in a sample can be used to
predict population parameters?
Preparing Data for Analysis
• Collecting the data
1. Ask participants to fill out a questionnaire
2. Ask participants to enter their response via keyboard into a computer.
• Analyzing the data
1. SPSS contains a spreadsheet data editor,a output editor, and a syntax editor.
2. SPSS contains subprogram to compute the statistical analyses such as Frequency
Distribution, Descriptive statistics, ANOVA, Correlation, Regression
• Entering the data into the computer
1. Use coding systems> Label variables.
2. Keep notes> You will forget which variable name refers to which data
3. Save and back up the data
4. Check and clean the data
Missing Data
• When the respondent has decided not to answer a question
because it is inappropriate or because the respondent has personal
reasons for not doing so.
1. Think carefully about whether all questions are appropriate
2. Save respondents from embarrassing situations.
• When the respondent forgot to answer the question or completely
missed an entire page of the questionnaire.
1. Test the research procedure before you carry out it
2. Check the respondents answers before they leaves
• When the research requires the respondents to participate in it
at more than one time.
Attrition Problem
Deleting and Retaining Data
• When do we delete variables?
Cases in which the reliability analysis indicates that the variable did not
measure the same things that other variable measured.
• When do we delete responses?
Cases in which the respondents gave a very extreme score>outlier
• When do we delete participants?
Cases in which the respondents did not understand the instruction or wasn’t
able to perform the task
• How do we trim the data?
Cases in which the scores that are more than 3 standard deviation above or
below the variable’s mean.
• When do we transform the data?
Cases in which you use reverse-score, or you have skewed data
Conducting Statistical Analysis
Descriptive Statistics
Statistical approach in which
the researcher summarize the
pattern of scores observed on
a measured variable.
Inferential Statistics
Statistical approach in which
the researcher infers statistical
significance in total population
based on the pattern of scores
observed in your sample of
respondents
Your Data
Analysis
Population
Your Data
Analysis
Summation Notation
Sample data
( X , X , X , X , X )
X1 = 6
X2 = 5
X3 = 2
X4 = 7
X5 = 3
1
2
3
4
5
= 6 + 5 + 2 + 7 + 3 = 23
N
 Xi  23  X  23
i 1
Summation Starts from 1
To N (in this case, N = 5)
Rounding
APA Publication manual generally suggests to round the presented
figures (including both descriptive and inferential statistics) to
two decimal places.
 = 3.14159265……
3=
1.732...
p = 0.0041...
3.14
1.73
.004
Computing Descriptive Statistics
Frequency Distribution:
A table that indicates how many, and in most cases what percentage,
of individual in the sample fall into each of a set of categories.
(e.g. bar chart, grouped frequency distribution, histogram, frequency
curve, stem and leaf plot)
Central Tendency
The point in the distribution around which the data are centered.
(e.g. mean, median, mode)
Dispersion:
The extent to which the scores are all tightly clustered around the
central tendency (e.g. range, variance, standard deviation)
Frequency Distribution
X1 = 6
X2 = 5
X3 = 2
X4 = 7
X5 = 3
X6 = 4
X7 = 6
X8 = 2
X9 = 1
X10 = 8
Bar Chart
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
3
2
Histogram
1
0
1
2
3
4
5
6
7
8
2.5
2
Frequency Curve
1.5
1
0.5
0
1
2
3
4
5
6
7
8
Central Tendency
Sample Data
X1 = 6
X2 = 5
X3 = 2
X4 = 7
X5 = 3
X6 = 4
X7 = 6
X8 = 2
X9 = 1
X10 = 8
The Mean (average): the value in which the sum of
all of the scores devided by the sample size.
X
X 1  X 2  X 3... XN

=
X =
N
N
(6  5  2  7  3  4  6  2  1  8)
44
=
= 4.4
10
10
The Median: The score at which half of the
observations are greater and half are smaller.
1, 2, 2, 3, 4, 5, 6, 6, 7, 8
45
= 4.5
2
The Mode: the most frequently occurring value in
a variable.
1, 2, 2, 3, 4, 5, 6, 6, 7, 8
Dispersion
The Range
The Distance between the largest (the maximum) and the
smallest (the minimum) observed values of the variable.
The variance
= S2
The sum of squares ( sum of (Xi - mean)2 )divided by N
The Standard Deviation
=S
The square root of the variance
The variance and the Standard Deviation
Mean Deviation Score
X
X1 = 6
X2 = 5
X3 = 2
X4 = 7
X5 = 3
X6 = 4
X7 = 6
X8 = 2
X9 = 1
X10 = 8
= 4.4
(6 - 4.4)
(5 - 4.4)
(2 - 4.4)
(7 - 4.4)
(3 - 4.4)
(4 - 4.4)
(6 - 4.4)
(2 - 4.4)
(1 - 4.4)
(8 - 4.4)
1 2 3 4 5 6 7 8
(X  X ) = 0
Sum of Squares
SS =
SS =
( X  X )
( X
= 244
= 50.4
-
2
)
-
(6 - 4.4)2 = 2.56
(5 - 4.4)2 = 0.36
(2 - 4.4)2 = 5.76
(7 - 4.4)2 = 6.76
(3 - 4.4)2 = 1.96
(4 - 4.4)2 = 0.16
(6 - 4.4)2 = 2.56
(2 - 4.4)2 = 5.76
(1 - 4.4)2 = 11.56
(8 - 4.4)2 = 12.96
2
= 50.4
( X )
N
1.936
10
2
Variance and Standard Deviation
Variance
S2 =
SS
50.4
=
= 5.04
N
10
Standard Deviation (SD)
S=
2
S = 5.04 = 2.24
Standard Score
(Z score)
The distance of a score from the mean of the variable expressed in
standard deviation unit.
To compare two scores that have different mean and different
standard deviation (SD).
Taro had received a score of 80 on a test. The average was 50, and
standard deviation was 15.
50
80
Susan had received a score of 75 on a test. The average was 60,
and standard deviation was 10.
60
Z=
X X
s
ZTaro =
ZSusan =
75
80  50
= 2.0
15
75  60
10
2.0
= 1.5
0 1.5
Standard Nominal Distribution
Hypothetical population distribution of standard scores when
the original scores are normally distributed.
 = 0,  = 1
-1 < Z < 0, or 0 < Z < 1
34.13%
-2 < Z < -1, or 1< Z < 2
13.59 %
-3 < Z < -2, or 2 < Z < 3
2.15%
Z > -3, or 3 < Z
0.13%
Working with Inferential Statistics
Example. A researcher estimate the average GPA of all of
the psychology majors at UM.
Mean of the
population


Standard deviation
of the population
Population
Descriptive Statistics
of 100 students.
MMMMM
MMMMM
WWWWW
WWWWW
WWWW
X
= 3.40
S
= 2.23
Mean of the sample
Standard deviation
of the sample
Unbiased Estimator
X
The sample mean (
mean .
) is an unbiased estimator of the population
The sample standard deviation ( s ) , however, is not an unbiased
estimator of the population standard deviation .
How can we estimate , using the sample standard deviation?
^
S=
SS
N 1
The standard error
If we take all possible samples of N = 100 from a given
population, the resulting distribution of the sample means
have X = 
The distribution would be normally distributed with a
standard deviation known as standard error of mean (or
simply the standard error). The standard error is symbolized
as S X
SX =
s
N 1
Confidence Intervals
The range of scores within which the population mean is
likely to fall.
The exact width of the confidence interval is determined with
a statistic known as Student’s t
Example. Now, we sampled 100 students.
Degree of freedom = 100 - 1 = 99
If we set alpha = .05,
The appropriate t value = 1.99 (see Table C, Appendix E)
Lower limit  = X - t(s X ) = 3.40 - 1.99 ( .22) = 2.96
Upper limit  = X - t(s X ) = 3.40 +1.99 ( .22) = 3.84