Download Variable

Document related concepts

Neuroinformatics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Variable
 An item of data
 Examples:
– gender
– test scores
– weight
 Value varies from one observation to
another
Types/Classifications of Variables
 Qualitative
 Quantitative
– Discrete
– Continuous
Qualitative Data
 Describes the quality
 Non-numerical format
Counts
Cannot order or measure
 Examples
– gender
– marital status
– geographical region
– job title….
Categorical data
 Non-overlapping categories or
characteristics
 Examples:
– Completes/Incompletes
– Professions
– Gender
Quantitative Data
 Frequencies
 Measurements
Discrete
 Measurements are integers
 Examples:
– number of employees of a company
– number of incorrect answers on a test
– number of participants in a program…
Continuous
 Measurements can take on any value -
usually within some range
 Examples:
– Age
– Income
 Arithmetic operations such as differences
and averages make sense.
Qualitatiave or Quantitative?
Discrete or Continuous?








Score on a placement exam
Preferred restaurant
Dollar amount of a loan
Height
Salary
Length of time to complete a task
Number of applicants
Ethnic origin
Treatment as Ranks
 Natural order
 Not strictly measured
 Examples:
– Age group
– Likert Scale data
 Distinction between adjacent points on the
scale is not necessarily the same
Analysis
Qualitative Data
 Frequency tables
 Modes - most frequently occurring
 Graphs: Bar Charts and Pie Charts
Analysis
Quantitative Data
 Any form
 Create groups or categories and generate
frequency tables
 All descriptive statistics
Effective Graphs:
Quantitative Data
 Histograms
 Stem-and-Leaf plots
 Dot Plots
 Box plots
 XY Scatter Plots (2 variables).
Examples of Graphs
Pie Chart
Performance Appraisals
10%
More Difficult
38%
14%
Difficult
Same
Much Easier
Easier
33%
90
80
70
60
50
East
West
North
40
30
20
10
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Histogram
Histogram
12
Frequency
10
8
6
4
2
0
49
59
69
79
Score
89
99
Boxplot
Boxplot of C1
20
30
40
50
60
70
C1
80
90
100
110
Stem and Leaf Plot
Stem and Leaf Plot
Weight of Meat
7
8
8
9
9
10
10
11
11
12
12
13
13
14
5
3
7999
23
66789
688
2244
788
4
8
8
1
Analyze Ranked Data
 Frequency tables
 Mode, Median, Quartiles
 Graphs:
– Bar Charts
– Dot Plots, Pie Charts
– Line Charts (2 variables)
Data Example
Suggest some ways you could analyze these items.








Score on a placement exam
Preferred restaurant
Dollar amount of a loan
Height
Salary
Length of time to complete a task
Number of applicants
Ethnic origin
Tables and Graphs
Note Excel will create any graph that you
specify
Consider the type of data before selecting
your graph.
Frequency Table/Frequency
Distribution
Summarize data:
 categorical
 nominal
 Continuous data - the data set has been
divided into meaningful groups
Frequency Distribution
Count the number of observations that fall
into each category.
Frequency: the number associated with each
category
Relative Frequency Distribution
Proportion of observations falling in a given
category
Report relative frequencies or percentages
Example
Frequency Distribution
No. of Defective Parts
0 &<
2
4
6
8
10
12
14
n
2
4
6
8
10
12
14
16
0
4
5
12
14
9
10
6
Graphs
Categorical/Qualitative Data
Pie Charts
 Circle - divided
proportionately
 Segment - percentage of
the whole that falls into
each category
Viet
Namese
15%
Spanish
25%
Swedish
5%
English
55%
Native Language
Bar Charts
 Bar charts - % in various
Average Units Sold (per person) by Product
20
Average Sold/Person
categories
 Vertical scale frequencies, relative
frequencies
 Horizontal scale categories
 Allows comparisons
15
10
5
Bef ore Training
A f ter Training
0
B41
BA 42
B41F
Product
C21
Other
Constructing Bar Charts
 All boxes should have the same width
 Gaps between the boxes - no connection
between
 Any order.
 Use to represent two categorical variables
simultaneously
Graphs: Measured
Continues Quantitative Data
 Histograms
 Stem and Leaf
 Box plots
 Line Graphs
 XY Scatter Charts (2 variables)
Histograms
 Frequency
 Drawn without gaps
between the bars
Frequency
distributions of
continuous variables
Grade Distribution
12
10
8
6
4
2
0
Grade
59
69
79
89
99
Constructing Histograms
 Non-overlapping
Grade Distribution
Frequency
intervals
 Intervals - generally
the same length
 Number of values in
each interval -class
frequency
 Relative frequencies o
12
10
8
6
4
2
0
Grade
59
69
79
89
99
XY Scatter Chart
 Two variables
Abscent by Age
 Variables: quantitative and
continuous.
coordinate system
 Examine the relationship
between two variables
15
Days Absent
 Plot pairs - rectangular
20
10
5
0
0
10
20
30
40
Age
50
60
70
Line Chart
 Similar to the scatter
1997 Monthly Sales
Month
June
May
Apr
Mar
Feb
170
165
160
155
150
145
140
135
130
125
Jan
Sales (x$10,000)
chart
 Values of the
independent variable
(shown on the
horizontal axis) can be
ranked values (i.e..
they do not have to be
continuous variables).
Basic Principles for Constructing
All Plots
 Data should stand out clearly from
background
 The information should be clearly labeled
– title
– axes, bars, pie segments, etc. - include units that
are needed to interpret data
– scale including starting points.
Principles cont.
 Source
 No clutter
 Minimize information or data on one graph.
 Try several approaches
Describing Data
 Shape of the Distribution
– Symmetry
– Skewness
– Modality: most frequently occurring value
– Unimodal or bimodal or uniform
Right Skewed
Left Skewed
Histogram
Histogram
12
10
Frequency
Frequency
12
10
8
6
4
2
0
8
6
4
2
0
59
69
79
89
99
Grade
59
69
79
Grade
Histogram
Frequency
12
10
8
6
4
Symmetrical
2
0
59
69
79
Grade
89
99
89
99
Describing Data
 Centrality
 Spread
 Extreme values
Measures of Centrality
 Mean
 Median
 Mode
Mean
 Most common measure
 Extremely large values in a data set will
increase the value of the mean
 Extremely low values will decrease it.
Calculating the
Mean
T1
85
90
75
90
340
85
T2
85
90
35
90
300
75
T3
85
90
75
110
360 Sum
90 Mean
Median
 Central point .
 Half of the data has a value than the median
 Half of the data has a higher value than the
median
 Not affected by extremely large or small
values
Find the Median
85
90
75
92
95
Data
75
85
90
92
95
Sorted Data
Median is 90.
Find the Median
95
90
92
85
Data
85
90
92
95
Sorted Data
Median:
(90 + 92)/2 = 91
Measures of Spread
Range
 Subtract the smallest value from the largest
 Report the smallest and largest values.
85
90
Range:
or
75
92
75 to 95
20
95
Scores
Variance/Standard Deviation
 Average variation of the data values from
the mean of the values
 Variance.
The Empirical Rule
 Symmetrical Data
 At least:
68% of the data values are within one standard
deviation of the mean
90% of the data values are within two standard
deviation of the mean
99% of the data values are within three standard
deviations of the mean
Tchybychef’s Inequality
 Skewed Data
 At least:
75% of the data values are within two standard
deviation of the mean.
90% of the data values are within one standard
deviation of the mean.
Measures of Relative Standing
 Percentiles
 Quartiles
Quartiles
 The lower quartile is the same as the 25th
percentile.
– 25% of the scores are lower and
– 75% of the scores are higher than the lower
quartile.
 The upper quartile is the same as the 75th
percentile.
– 75% of the scores are lower and
Correlation
Describes the strength of the relationship
between two (or more) variables
Pearson Product-moment Correlation
Coefficient - assumes continuous
quantitative data
Relationship between Variables
 Positive
 Negative
 No relationship.
Interpreting Correlation
Coefficients.
 0.20 to 0.35- show
a slight relationship
(little value in practical prediction situations)
 0.50 - crude group prediction
(Correlations this low do not suggest a good relationship)
 0.65 to 0.85 - group predictions that are good
 Over 0.85 - a close relationship between the two
variables.
Even a high correlation
coefficient does not establish a
cause and effect
relationship!!!!!
Coefficient of Determination
 Square root of the correlation coefficient
 Gives the percent of variation in the
dependent variable that is ‘explained’ by the
independent variable.
 Look at an XY scatter plot
Least Square Line
 Describe the relationship between the two
variables
 Make predictions of the dependent variable
from the independent variable
Positive Relationship
r will be a positive
number.
Y
8
7
6
5
4
3
2
1
0
0
2
4
X
6
Negative Relationship
r will be a negative
number.
8
7
6
Y
5
4
3
2
1
0
0
2
4
X
6