Download day2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Predictive analytics wikipedia , lookup

Hardware random number generator wikipedia , lookup

Pattern recognition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Edpsy 511
Basic concepts
Exploratory Data Analysis
Populations vs. Samples
• Population
– The complete set of individuals
• Characteristics are called parameters
• Sample
– A subset of the population
• Characteristics are called statistics.
– In most cases we cannot study all the
members of a population
Descriptive vs. Inferential
• Descriptive statistics
– Summarize/organize a group of numbers from
a research study
• Inferential statistics
– Draw conclusions/make inferences that go
beyond the numbers from a research study
– Determine if a causal relationship exists
between the IV and DV
Random Sampling vs. Random
Assignment
• Simple random sampling
– Each member of the population has an equal
likelihood of being selected.
• Helps ensure that our sample will represent the
population of interest.
• Random assignment
– Assigning subjects to different conditions in a
way that they have equal chance of being
placed in either condition.
• Controls for confounding
Goals of Scientific Research
• Exploratory
– What is out there?
• Descriptive
– What does this group look like?
• Explanatory
– Why and how are these constructs related?
• Evaluation
– Does this program work?
• Prediction
– Who will become depressed?
Common Research Designs
• Correlational
– Do two qualities “go together”.
• Comparing intact groups
– a.k.a. causal-comparative and ex post facto designs.
• Quasi-experiments
– Researcher manipulates IV
• True experiments
– Must have random assignment.
• Why?
– Researcher manipulates IV
Measurement
• Is the assignment of numerals to objects.
– Nominal
• Examples: Gender, party affiliation, and place of
birth
• Ordinal
– Examples: SES, Student rank, and Place in race
• Interval
– Examples: Test scores, personality and attitude scales.
• Ratio
– Examples: Weight, length, reaction time, and number of
responses
Categorical, Continuous and
Discontinuous
• Categorical (nominal)
– Gender, party affiliation, etc.
• Discontinuous
– No intermediate values
• Children, deaths, accidents, etc.
• Continuous
– Variable may assume an value
• Age, weight, blood sugar, etc.
Values
• Exhaustive
– Must be able to assign a value to all objects.
• Mutually Exclusive
– Each object can only be assigned one of a set
of values.
• A variable with only one value is not a
variable.
– It is a constant.
Statistical Notation
•
Nouns, Adjectives, Verbs and
Adverbs.
–
•
Say what?
Here’s what you need to know
–
X
•
–
Xi = a specific observation
N
•
–
# of observations
∑
•
Sigma
–
–
Means to sum
Work from left to right
•
•
•
•
•
•
Perform operations in
parentheses first
Exponentiation and square
roots
Perform summing operations
Simplify numerator and divisor
Multiplication and division
Addition and subtraction
N
X
i 1
i
• Pop Quiz (non graded)
– In groups of three or four
• Perform the indicated operations.
• What was that?
N  X  ( X )
2
N ( N  1)
2
Exploratory Data Analysis
• A set of tools to help us exam data
– Visually representing data makes it easy to
see patterns.
• 49, 10, 8, 26, 16, 18, 47, 41, 45, 36, 12, 42, 46, 6,
4, 23, 2, 43, 35, 32
– Can you see a pattern in the above data?
• Imagine if the data set was larger.
– 100 cases
– 1000 cases
Three goals
• Central tendency
– What is the most common score?
– What number best represents the data?
• Dispersion
– What is the spread of the scores?
• What is the shape of the distribution?
Frequency Tables
• Let say a teacher gives her students a
spelling test and wants to understand the
distribution of the resultant scores.
– 5, 4, 6, 3, 5, 7, 2, 4, 3, 4
Value
F
Cumulative F
%
Cum%
7
1
1
10%
10%
6
1
2
10%
20%
5
2
4
20%
40%
4
3
7
30%
70%
3
2
9
20%
90%
2
1
10
10%
100%
N=10
As groups
• Create a frequency table using the
following values.
– 20, 19, 17, 16, 15, 14, 12, 11, 10, 9
Banded Intervals
• A.k.a. Grouped frequency tables
• With the previous data the frequency table
did not help.
– Why?
• Solution: Create intervals
• Try building a table using the following
intervals
<=13, 14 – 18, 19+
Stem-and-leaf plots
• Babe Ruth
– Hit the following number of Home Runs from 1920 –
1934.
• 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22
– As a group let’ build a stem and leaf plot
– With two classes’ spelling scores on a 50 item
test.
• Class 1: 49, 46, 42, 38, 34, 33, 32, 30, 29, 25
• Class 2: 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
– As a group let’ build a stem and leaf plot
Landmarks in the data
• Quartiles
– We’re often interested in the 25th, 50th and 75th
percentiles.
• 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
– Steps
• First, order the scores from least to greatest.
• Second, Add 1 to the sample size.
– Why?
• Third, Multiply sample size by percentile to find location.
– Q1 = (10 + 1) * .25
– Q2 = (10 + 1) * .50
– Q3 = (10 + 1) * .75
» If the value obtained is a fraction take the average of the
two adjacent X values.
Box-and-Whiskers Plots (a.k.a.,
Boxplots)
Shapes of Distributions
• Normal distribution
• Positive Skew
– Or right skewed
• Negative Skew
– Or left skewed
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 4.3
Std. Dev. = 1.494
N = 10
0.0
1
2
3
4
5
score
6
7
8
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 2.80
Std. Dev. = 1.75119
N = 10
0.0
0.00
1.00
2.00
3.00
4.00
right
5.00
6.00
7.00
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 5.40
Std. Dev. = 1.42984
N = 10
0.0
2.00
3.00
4.00
5.00
left
6.00
7.00
8.00
A little on SPSS
• The assignments require hand
calculations and SPSS practice
– Typically I have you check your answers
using SPSS
– Do not buy SPSS
– Do not leave the SPSS work for night before
the due date.
– You will need a TEC center account
• Do that after class today