Download None of the above!!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Why stats in psych?
Student notes
Fill in the blanks on your slides as I proceed through the lecture!
1. Stats allow psychologists to make sense of their data and draw
more ____________________ than _______________
observations would provide
2. They create an ____________, ___________ way to prove
whether a set of data from an experiment presents us with
significant or insignificant results
3. They help psychologists determine if their ______________ was
correct or incorrect
Descriptive Statistics
Descriptive statistics: Describe the qualities of a set of data
1. __________________
a) Is there a _________________________ between 2 factors? Does the
presence or absence of 1 thing predict the presence or absence of the other?
2. Measurement of ____________________
a) How common/frequent was a given score?
b) What was the __________ performance for the group?
c) How might the data be __________ by an unusually high or low score
(________)?
3. Measure of _________________
a) By how much do the scores in a data set vary from the average (mean)?
b) How big is the _____________ of scores (what is the difference between the
lowest and highest score?)
c) How can I compare scores from 2 different distributions? ________________
Inferential Statistics
Inferential statistics: Tell whether data can be applied to general population
(e.g. “What can we infer about human behavior or mental processes in
general based on the data from this study?”)
1. Measure of _____________________
a) Does the data allow us to make conclusions related to the
hypothesis?
b) How certain are we that the data proves the hypothesis rather than
just being coincidence or chance results?
Big dog to remember with inferential stats :______________!
Descriptive Stats Part 1 Correlations: Fill in the blanks in the statements.
1. ______________________: number between -1 and +1 indicates the ___________ of the
correlation between 2 variables (coefficient of 0 means _______________!)
2. Correlation graphs are called: ____________________
3. Line of best fit (AKA ______________ ) drawn near and in direction of majority of dots on
graph
4. Upward sloping = ___________ correlation (correlation coefficient between +0.1 and +1)

the presence or increase of one indicates the presence or increase of the other
5. Downward sloping = ____________ correlation (correlation coefficient between -0.1 and -1)
•
the presence or increase of one indicates the absence or decrease of the other
6. Describe the correlations you see in graphs the 3 below using the following terms:
 weak, strong, or perfect
 none, negative or positive
Descriptive Stats Part 2
Measurements of Central Tendency:
Mean, mode, and median
A. ________: most frequently occurring # in a set of data
B. __________: “average”-- sum of all values divided by total # of
values in a set of data
C.
_________: the middle value in a range of data when data is lined
up from lowest value to highest—half of the data are higher in
value and half are lower in value than the median
Inferential Stats
____________( _______ is the magic number! It means we can say we are
95% sure the results were NOT just a matter of chance.)
•
•
•
Scientists have chosen a p value of 0.05 as the maximum p value to be able to say
results of a study are statistically significant rather than due to chance
p values must be greater than 0 and smaller than 1.
The smaller the p value, the more we can say that random chance did NOT create
the results
You do NOT need to know how to calculate a p value; just remember the magic
number and what it means
Questions we might ask about John’s experiment to determine if his results were just a
result of chance:
a)Is 20 participants really enough for him to draw a conclusion? What might be a better
sample size for this type of study?
b)Is a 4 second difference between the means of the females and males statistically
significant for this kind of task?
–As a ratio of the total time on average, 4 seconds is approximately 7%.
–If the task had taken 24 hours on average, 4 seconds difference would probably be
considered statistically insignificant!
Descriptive Stats Part 3: Standard deviation and z scores!
A better way to compare classes than looking at the mean
To put it simply, standard of deviation states the average distance away from __________
that the other scores fall
a) A large standard deviation means the spread/range of data is large
b) A small standard deviation means the scores (data) are all bunched near the mean
c) If you want data that is statistically significant, a SMALL standard deviation is what
you want!!
d) How are standard deviation and p value related?
•Small standard deviation (derived in descriptive stats) is used to determine low p
value (part of inferential stats!!)
a)___________: a way to compare data in units of standard deviation
a) can be negative (means score falls to left of the mean on a graph)
b) can be positive (score falls to right of the mean on a graph)
John’s Experiment
John runs an experiment in which he is recording how long (in
seconds) it takes each male and female participant to sort
shapes into a Venn diagram.
His hypothesis is that girls will perform the task faster than
boys.
Here’s what you should have done to compute mode:
1.
2.
3.
John’s data
Female Mode: 55 is only score that appears
more than once so it is the mode for the
females times.
Male mode: The male times have no repeat
values, so there is no mode.
42, 55 and 66 (each of these numbers
appears twice)
So what?
4.What do the modes in these sets of data tell us about female and male performance on
this test? Nothing meaningful in terms of hypothesis.
5.Would the mode be more helpful to John if the score repeated nearly every trial?
•
No. Having a repeated score is not relevant to his hypothesis.
6.What kind of data sets would you want the mode to be a very frequently occurring
number?
•
When you are looking for consistency!
Here’s what you should have done to compute the mean...
1.
Compute the mean for the females. 59.4
1.
Compute the mean for the males. 63.4
1.
Compute the mean for the males and females
combined.
(594+634)/20=61.4
4. What conclusions can be drawn about female vs. male
time on the sorting task based on the means we just
computed?
There is a difference between average time of girls
and boys; girls sort 4 seconds faster on average.
5. Can John conclude that his hypothesis was correct based
on the values of the means?
NO. Mean alone is not a statistically valid measure.
It must be looked at in conjunction with other
Here’s what you should have done to compute
median...
Median:
First you must rearrange the data for each column in
ascending order.
Because there is an even amount of data (10 values per
gender), you must find the median by adding the values at
positions 5 and 6 and dividing by 2.
Females: (55+56)/2 = 55.5
Males: (63+65)/2 = 64
3. What conclusions can you draw about these data sets by knowing the medians?
• Not many. You only know that half of the girls times were faster and half slower than
55.5 seconds while half of the boys times were faster and half slower than 64
seconds. We need to know the shape of the data in order for median to have any
value to us.
4. Do the medians allow you to know which group of the 2 had the fastest or slowest times?
• NO.
5. Do the medians allow you to determine if your hypothesis that girls will perform the task
faster than boys is correct?
• NO.
Skews...Tell the shape of the curve
Positive (AKA skewed
right): The whale is
swimming to
California; it is happy
(positive)!!
1. On which side is the majority of the data in
graph A? Left side.
1. On which side of the graph are their outliers?
Right side
3. If the graph A represented the scores on a
unit test, and the teacher relied on the
“mean” of the test scores to determine how
well her students did, would the teacher be
happy or sad to have positive skew?
Happy because it makes her test scores look
better than they actually were!
Negative (AKA
skewed left): The
whale is swimming
to New York. It is
angry (negative!!)
1.On which side is the majority of the data in
graph B? Right side
1.On which side of the graph are their
outliers? Left side.
1.If graph B represented the scores on a unit
test, and the teacher relied on the “mean” of
the test scores to determine how well her
students did, would the teacher be happy or
sad to have negative skew? WHY?
Sad because it makes her test scores look
worse than they actually were!
Why “mean” can be deceptive...
Mr. Smith and Mrs. Anderson gave the same final exam for their AP Economics class. Both
teachers have 100 students in their classes. Both teachers reported an average score
(mean) for their classes of 75 out of 100 on the final exam.
From this set of data, which of the following can we conclude?
A. Mr. Smith’s and Mrs. Anderson’s students performed equally well on the final
exam.
B. The majority of the students in both classes scored a C on the final exam
(assuming they grade on a standard grade scale.)
C. None of the above!!
Why none of the above?
In order to compare the 2 classes we need to know the range of scores.
We would also want to know if the distribution (and therefore the mean) was
skewed due to an extreme high or extreme low score.
Descriptive Stats Part 3
Measurements of variability:
Range
Which of the teacher’s has the
biggest range?
•Mrs. Anderson! Her range is 100!
(+/- 10)
•Mr. Smith’s range is only 39,
meaning the scores in his data set
all fall within 39 points of each
other (+/- 10)
SO WHAT? Which of the following can we conclude based on knowing just the range?
A. Mr. Smith did a better job of preparing his students for the test
B. Mr. Smith has students that are better test takers
C. Mr. Smith’s students studied more effectively for the test
D. On that particular day, Mr. Smith’s students were able to score better.
E. In essence, we can’t conclude much!! The data don’t tell us WHY the classes
received the scores they did; it just paints a picture of how the class scores
compare.
Descriptive Stats Part 3 continued
Measurements of variability: Standard deviation and z scores...
The mean on Mrs. Hunter’s chemistry test is 75. The standard deviation for her data is 15.
Her data is skewed right.
1.Let’s first draw a rough graph of her data.
1.What does the “15” tell us about the scores in her set of data?
They are spread out quite far from the mean—the average distance of all scores less
than and greater than 75 is 15
3.What does the right skew tell us about outliers and the mean?
•A few high scores are making her mean artificially high
4.Horace has a z score of +1.0. What was his actual test score?
•the z score tells us Horace’s test score falls to the right of the mean 1 unit of standard
deviation. Standard dev. (15) + mean (75)= real score. His score was 90.
5.Jillian has a z score of -1.5. What was her actual test score?
•The z score tells us Jillian’s test score falls 1.5 units of standard deviation to the left of
the mean.
•1.5x15 = 22.5
•Mean of 75 – 22.5 = 52.5
Meet your new best friends: The Normal Curve and the Empirical Rule
If you add percentages, you will see that approximately:
• ________ % of the distribution lies within one standard deviation of the mean.
• _______ % of the distribution lies within two standard deviations of the mean.
• ________ % of the distribution lies within three standard deviations of the mean.
These percentages are known as the "empirical rule"
What you need to know and be able to do...
•
•
For purposes of this class and the AP exam, you do not have to understand when or why a
normal curve is used
You DO NEED TO KNOW
– the empirical rule values
– how much data falls within 1, 2, and 3 standard deviations of a normal curve
– the percentiles associated with each z score
John uses the means of both the male and female groups to conclude that girls are faster than
boys at the sorting task so his hypothesis was correct. His teacher tells him to analyze whether
his experiment was designed to eliminate sampling error.
1. Did he use proper sampling techniques to generate his test participants (sample)?
A. first identify the population to which this hypothesis applies
a) the characteristics required for a sample may be dependent upon the
independent variable (if the ind. var is a human characteristics such as height,
weight, gender, race, age) OR
b) the goal of the sample may be to make it diverse and representative of the
general population
c) to ensure representative sampling, an experimenter may use stratified
sampling to make sure that the sample proportionally represents the general
population for a given characteristic
2. Once the sample population was identified, did he use random assignment to produce
his female and male participant lists?
A. Out of 1,000 students at his school, did he randomly choose 10 male and 10
female students?
3. Was the study blind or double-blind in order to eliminate participant or experimenter
bias?
Solutions to “Andrea’s” results
1.
Andrea just got her ACT
results (which are
distributed normally)
and they say she scored
in the 99.87 percentile
for reading.
a) What does that tell
her about her
score compared to
the rest of the ACT
test takers?
b) Approx how many
a. She got a higher
standard
score than 99.87% of
deviations is she
those who took the
from the mean?
same test.
b. Her score is 3 standard
deviations from the mean
(z score = +3.0)
ANSWERS: Reviewing the rules of the normal curve
2.
What percent of data
always falls within 1
standard deviation of
the mean on a normal
curve?
68% You have
to account for
1 standard
deviation in
both
directions.
ANSWERS: Reviewing the rules of the normal curve
2.
What percent of data
always falls within 2
standard deviations of
the mean on a normal
curve?
95% You have
to account for
2 standard
deviation in
both
directions.
ANSWERS: Reviewing the rules of the normal curve
3. What kind of skew does
a normal curve have?
NO skew. It is not
being pulled left or
right because it is
SYMMETRICAL
Because the mean =
the median, we know
it is not skewed.
5.
Dana just got back is CSAP
scores which were normally
distributed. His z score for
math was -3.0. His raw
score was 316. His brother
had a z score of +3.0 and a
raw score of 516.
a)
b)
Solutions Dana’s CSAP scores
What was the mean for the
CSAP math test?
What was the standard
deviation?
a. Because mean will be
symmetrically located between z
scores or -3 and + 3, mean must be
half way between 316 and 516.
Mean = (316+516)/2 = 416
b. Measure the difference (distance) between one of the scores and the mean.
•516-416=100
• The difference, 100, represents 3 standard deviations from the mean
•100/3 = 33.3 Standard deviation is 33.3
Computing standard of deviation in 6 simple steps!!
Step 1: Calculate the mean for the data set
Step 2: find the distance of
each score from the mean
Step 3: Square each
deviation
Step 4: Sum the squared deviations
Step 5: Divide your sum by total # of scores
Step 6: Square root of
the answer from step 5
SUM= 2324
2324/10 = 232.4
sq root of 232.4 = standard of deviation of 15.2