Download VARIABLE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Generalized linear model wikipedia , lookup

Predictive analytics wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Simplex algorithm wikipedia , lookup

Transcript
Data
Variables & Units
1
Statistics
(The field of) Statistics is the systematic study of
data.
The word “data” is plural…
“The data are the price gains of 200 stocks on the
NYSE.”
Singular? “Datum.” (Uncommon.)
Shares of Exxon-Mobil gained 2.3%. The datum is 2.3%.
What characterizes data is variability.
2
Variables / Statistical Units
Units of observation: Set of entities (things /
objects) being studied
Variable: An attribute of each unit
Suppose X describes a variable and U describes
the units.
“X varies among the (statistical) units.”
3
Units:
Math 158-800 students.
Variable:
Gender.
Gender is a Categorical Variable
Gender varies among Math 158-800 students.
4
Units:
Math 158-800 students.
Variable:
Number of FB friends.
Number of FB friends is a Quantitative
Variable
Number of FB friends varies among Math 158800 students.
5
1. An experiment was conducted to test the performance
of four brands of batteries in three different
environments (room temperature; hot and humid; cold).
For each combination of brand and environment,
batteries were put into a flashlight. The flashlight was
then turned on and allowed to run until the light went
out. The amount of time until the flashlight stopped
shining (in minutes) was recorded. Do brand and
environment play a role in the lifetime of these
batteries?
Minutes are measurement units. Most quantitative variables
have a measurement unit. If I want the measurement unit,
I’ll say exactly that. By “unit” I mean “unit of observation”
= thing / object that is studied.
6
2. 55 year old men are recruited into a study about heart
attacks. The heart rate of each man is recorded. Each is
tracked for a one-year period, and whether or not he has
a heart attack is determined.
7
3. A student runs an experiment to study the effect of
tire pressure on gas mileage. He devises a system so that
his car uses gasoline from a one-liter container. Each
time the container is filled, he randomly selects a
tire pressure between 20 and 35 psi, then drives the car
at 60 mph on a divided highway. When he runs out of
gas, he records the distance driven on that fill. Does tire
pressure impact the distance driven?
Something like “drives” would also suffice for the units.
8
Variables / Statistical Units
The units are the countries (of the world).
Describe a variable.
Write the sentence
_________ varies among __________.
Is the variable quantitative or categorical?
9
Variables / Statistical Units
The units are the countries (of the world).
Describe a variable.
Write the sentence
_________ varies among countries.
Is the variable quantitative or categorical?
10
GDP per capita and Longevity
GDP / person
Longevity (years)
Qatar
$86006
75.6
U. S.
$47440
78.2
Spain
$30589
80.9
[world average]
$10433
67.2
$1317
60.9
:
:
Country
Haiti
:
11
GDP per capita and Longevity
GDP / person
Longevity (years)
Qatar
$86006
75.6
U. S.
$47440
78.2
Spain
$30589
80.9
[world average]
$10433
67.2
$1317
60.9
:
:
Country
Haiti
:
NOT a unit
12
Types of variables
Quantitative Variable
Naturally measured as numbers for which ordering and
at least some of the usual operations (addition,
multiplication, subtraction, etc.) make sense.
Discrete
All the possible values are easily listed
Frequent “ties”
Often count or related to counts
Continuous
Technically: “ties” are impossible
In practice ties are uncommon
13
Types of variables
Categorical Variable
Not quantitative (usually verbal, but sometimes
expressed as numbers having little or no number
meaning).
Virtually all categorical variables are discrete. So,
the term discrete is rarely used in speaking about
categorical variables – it is redundant.
14
Distribution
A variable’s distribution is a description of what
values it takes and how often it takes them.
Categorical Variables
Distributions are always summarized in terms of
percents (falling into each category).
Quantitative Variables
There are many ways to summarize quantitative
variables. Among them:
Mean + Standard Deviation
Median + Interquartile Range
15
Purposes of variables
Explanatory and Response Variable
Changing the value of the explanatory variable
(EV) results in a change in the distribution of the
response variable (RV).
Loosely: A change in the explanatory variable
alters the prediction of the response variable.
16
17
Variable:
Form of study.
Units:
The (200) college students involved in the
experiment.
Form of study varies from student to student.
18
Variable:
Score on the short answer test.
Units:
The (200) college students involved in the
experiment.
Score on the short answer test varies from
student to student.
19
20
Experimental study
The explanatory variable is assigned (often by
the people conducting the study).
Units do not enter the study with a value for
this variable.
Observational study
The explanatory variable is a characteristic of
the unit.
21
Statistics
Data vary
A population is a collection of all the units of
interest. If we have information on all the units of
a population we have a complete description of the
variation in the data. Such a description of a
population is a census. Characteristics of
populations are parameters.
A sample is an incomplete collection of units from
the population. A sample necessarily provides
incomplete information. Characteristics of samples
are called (the word) statistics.
22
In Class Survey
G: Your gender (M or F)
A: Guess your instructor’s age
F: Which finger is longer?
I = Index R = Ring S = same
S: How many people are there who have the
same mother and father as you do?
C: What company is your cell phone carrier?
D: How long was the last call you received on
your phone?
23
In a data table each unit takes a row;
each variable occupies a column.
Column headers identify variable names.
There are other ways to organize data,
and some are preferable when the idea is
to display the data efficiently. However,
in most cases, a data table is how data are
organized in a spreadsheet.
24
Here are the monthly fees (in $) paid by a random
sample of 50 users of internet service providers in
2008:
42
32
41
32
36
31
36
46
42
34
33
31
39
31
45
34
42
38
43
45
65
32
34
40
42
47
32
31
32
35
37
72
41
37
39
38
42
51
34
83
32
45
42
44
30
40
37
37
41
39
VARIABLE: ____________
UNITS: ____________
25
Here are the monthly fees (in $) paid by a random
sample of 50 users of internet service providers in
2008:
42
32
41
32
36
31
36
46
42
34
33
31
39
31
45
34
42
38
43
45
65
32
34
40
42
47
32
31
32
35
37
72
41
37
39
38
42
51
34
83
32
45
42
44
30
40
37
37
41
39
VARIABLE: Monthly fee (for use of internet)
UNITS: Users of internet service
26
User
Monthly Fee ($)
User 1*
42
User 2*
31
User 3*
33
:
You can start almost any
problem in this course by
first asking:
What are the units?
What is the variable?
*Perhaps identified by name? (Names aren’t given
here.)
Often, unit identifiers will not be given or displayed.
27
Variables / Statistical Units
The units are the companies listed on the New
York Stock Exchange.
Describe a variable.
Write the sentence
variable
_________ varies from company to company.
Is the variable quantitative or categorical?
If quantitative, is it discrete or continuous?
28
GDP per capita and Longevity
GDP / person
Longevity (years)
Qatar
$86006
75.6
U. S.
$47440
78.2
Spain
$30589
80.9
[world average]
$10433
67.2
$1317
60.9
Country
Haiti
:Parameters
:
NOT a unit
Not statistics
:
29
Distribution
The distribution of a variable tells us what values it takes
and the likelihood of those values.
What the fees are.
How often those fees
occur.
User
Monthly Fee ($)
User 1*
42
User 2*
31
User 3*
33
:
30
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
31
VARIABLES (there are 5)
UNITS
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
X (Variable) varies from unit to unit.
32
VARIABLES (there are 5)
UNITS
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
City MPG varies from car model to car model.
33
VARIABLES (there are 5)
UNITS
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
Number of cylinders varies from car model to car model.
34
VARIABLES (there are 5)
UNITS
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
Transmission type varies from car model to car model.
35
VARIABLES (there are 5)
UNITS
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
Transmission type CATEGORICAL VARIABLE
36
VARIABLES (there are 5)
UNITS
Car model
Vehicle type
Transmission
type
Number of
cylinders
City
MPG
Highway
MPG
:
BMW 3030CI
Subcompact
Automatic
6
19
27
BMW 3030CI
Subcompact
Manual
6
21
30
Buick Century
Midsize
Automatic
6
20
29
Chevrolet Blazer
4-wheel drive
Automatic
6
15
20
:
City MPG QUANTITATIVE VARIABLE
37
Mutual Fund
Category
Net assets
($ millions)
2008
return
19,378
-36.2%
0.98%
3,828
-48.0%
0.87%
74,886
-37.0%
0.15%
Expense Ratio
:
Fidelity Low-Priced
Stock
Small cap
value
Price International
Stock
International
Vanguard 500 Index
Large cap
blend
:
38
VARIABLES (there are 4)
UNITS
Mutual Fund
Category
Net assets
($ millions)
2008
return
19,378
-36.2%
0.98%
3,828
-48.0%
0.87%
74,886
-37.0%
0.15%
Expense Ratio
:
Fidelity Low-Priced
Stock
Small cap
value
Price International
Stock
International
Vanguard 500 Index
Large cap
blend
:
39