Download Project 2c - emilyyleak

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Emily Leak
Project 2c
Part One
There were 374,521 valid responses to this survey question with 9294 (2.4%) missing responses.
The category that had the highest frequency was “College Grad” (109,217 responses, 28.5% of
total responses) indicating that 28.5% of the mothers of the respondents had graduated college.
The second highest response was HS Graduate with 83,449 responses (21.7%). The mean is
5.21 which leans toward the College Graduate value and the median is 6.0 which indicates the
middle value of all the responses which is consistent with College Graduate the most frequently
selected value. The minimum is 1 (Grammar or less) and the maximum is 8 (Graduate degree).
These numbers indicate the minimum value (1) that could be rated for the question to the
maximum value (8) that could be rated for this question. Any value that falls out of the range
from 1-8 could possibly indicate an error in data entry. The skewness value is -.147 with a
skewness standard error range of -.008 to +.008. This indicates results are not normally
distributed (negative skewness value) because the skewness value does not fall within the
standard error range. This is visually represented in the Histogram where the results are
distributed on the outer limits of the curve specifically to the left. In addition, the Kurtosis value
is -.863 with a Kurtosis standard error range of -.016 to +.016. This also indicates that the results
are not normally distributed because the kurtosis value does not fall within the kurtosis standard
error range. Visually represented in the histogram, this is evident by the peaks of the columns
that fall outside the normal curve which are flat and wide.
Statistics
MOTHERS EDUCATION
N
Valid
Missing
374521
9294
Mean
5.21
Median
6.00
Std. Deviation
1.872
Skewness
-.147
Std. Error of Skewness
Kurtosis
Std. Error of Kurtosis
.004
-.863
.008
Minimum
1
Maximum
8
Emily Leak
MOTHERS EDUCATION
Cumulative
Frequency
Valid
GRAMMAR OR LESS
Total
Valid Percent
Percent
7932
2.1
2.1
2.1
SOME HS
11601
3.0
3.1
5.2
HS GRADUATE
83449
21.7
22.3
27.5
POSTSECONDARY
19638
5.1
5.2
32.7
SOME COLLEGE
64514
16.8
17.2
50.0
COLLEGE GRAD
109217
28.5
29.2
79.1
SOME GRAD SCHL
13710
3.6
3.7
82.8
GRAD DEGREE
64460
16.8
17.2
100.0
374521
97.6
100.0
9294
2.4
383815
100.0
Total
Missing
Percent
System
Emily Leak
Part two
RECODE motheduc (3=2) (6=4) (1 thru 2=1) (4 thru 5=3) (7 thru 8=5) INTO mothereduc5cat.
The new variable name is mothereduc5cat. There are 5 categories. Category 1 is grammar school or
less or some high school. Category 2 is High School graduate. Category 3 is Postsecondary and some
College. Category 4 is College Graduate and Category 5 is some graduate school and graduate school.
The original categories were collapsed from 8 to 5 categories. They were grouped based on having
completed some of the education versus obtaining a degree. For example, category 1 contains
descriptors of those who have some grammar school education or some high school education.
Category 2 contains a descriptor of having graduated high school or obtaining a degree.
Emily Leak
Because of the recoding, new values are assigned to the categories. The number of valid responses and
the number of missing responses has stayed the same. The most frequent response (the highest value)
continues to be College Graduate with 109,217 responses which is 28.5% of the total responses. High
School Graduate had 83,449 (21.7%) and Post Secondary or College had 84,152 (21.9%). The category of
Post Secondary or College increased in percentage. The Mean is 3.3819 which leans toward Category 4,
College Graduate. The median is 4.0 which is consistent with the fourth option, College Graduate. The
minimum is 1.0 and the maximum is 5.0.
The results are essentially the same as the previous analysis. College Graduate received the most
responses to the survey question. Modifying the categories in part two help validate the results from
part one. The skewness Value is -0210 and the skewness standard error range is -.008 to +.008. The
kurtosis value is -1.009 and the Kurtosis standard error range is -.016 to +.016. These numbers are
different than in part one. These numbers indicate that there is a greater negative skewness (the
numbers fall outside the normal distribution toward the negative side) and that there is greater negative
Kurtosis which is represented by a flat and wide distribution in the histograph than in part one.
Statistics
motheredrecodedto5cat
N
Valid
Missing
374521
9294
Mean
3.3819
Median
4.0000
Std. Deviation
Skewness
Std. Error of Skewness
Kurtosis
1.18830
-.210
.004
-1.009
Std. Error of Kurtosis
.008
Minimum
1.00
Maximum
5.00
Emily Leak
motheredrecodedto5cat
Cumulative
Frequency
Valid
Percent
Valid Percent
Percent
grammer or some hs
19533
5.1
5.2
5.2
hs grad
83449
21.7
22.3
27.5
post secondary or some
84152
21.9
22.5
50.0
109217
28.5
29.2
79.1
78170
20.4
20.9
100.0
374521
97.6
100.0
9294
2.4
383815
100.0
college
college grad
some grad school or grad
degree
Total
Missing
Total
System
Emily Leak
Part three
This section was add to see if a better configuration could be obtained. The categories were
recoded. The new categories are Category 1 some grammar school and some high school,
Category 2 High School graduate or Post Secondary, Category 3 College Graduate, Category 4
some Graduate School or Graduate Degree. The new configuration indicates a greater negative
skewness than previous configurations. The skewness for this configuration is -.571 with a
standard error range of -.008 to +.008. The kkurtosis is -1.098 with a standard error of -.018 to
+.016. Once again the kurtosis is more negative than previous configurations. The mean and
median continued to be consistent with previous findings that College Degree is the value that
was most selected by the participants of the survey.
RECODE motheduc (5=3) (1 thru 2=1) (3 thru 4=2) (6 thru 8=4) INTO
mothereducationredo5.
Statistics
attempt5toredomother education
N
Valid
374521
Missing
9294
Mean
3.1208
Median
4.0000
Std. Deviation
.98469
Skewness
-.571
Std. Error of Skewness
.004
Kurtosis
-1.098
Std. Error of Kurtosis
.008
Minimum
1.00
Maximum
4.00
attempt5toredomother education
Cumulative
Frequency
Valid
Percent
Valid Percent
Percent
1.00
19533
5.1
5.2
5.2
2.00
103087
26.9
27.5
32.7
3.00
64514
16.8
17.2
50.0
Emily Leak
Missing
Total
4.00
187387
48.8
50.0
Total
374521
97.6
100.0
9294
2.4
383815
100.0
System
100.0
I am afraid to add but I will at great risk…
According to the Trochim text, “many statistical analyses are based on the assumption that the
data is distributed normally-that the population from which it is drawn would be distributed
according to a normal or bell-shaped curve. If that assumption is not true for your data and you
use that statistical test, you are likely to get an incorrect estimate of the true relationship”. If this
is taken into consideration, I would conclude that there is an incorrect assumption that the
majority of the mothers of the students who took the survey, has an education level of college
graduate because the data was not distributed normally across the bell-shaped curve.