Download Basic Business Statistics, 10/e - RIT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Day 2 Lecture
Review of Descriptive Statistics
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc..
Chap 7-1
Agenda
Procedures Review




MyCourses,
Smam320 website,
WebAssign,
Groups
Lecture: Introduction to univariate data analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-2
Learning
“
Learning is defined as stabilizing,
through repeated use, certain
appropriate and desirable synapses
in the brain.“ (Leamnson, 2000, p. 5)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-3
Active Learning
Knowing and learning are communal
acts. They require a continual cycle
of discussion, disagreement, and
consensus over what has been and
what it all means.” (Palmer)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-4
The Quiet Signal
In order to use active
learning, we have to have a
quiet signal.
I will raise my hand when it is
time to stop group work and
resume all class activity.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-5
The Quiet Signal
Complete your sentence.
Raise your hand.
Alert classmates that its
time to focus up front again.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-6
Interview
A interviews B
At signal, reverse roles and B
interviews A. (same number
of minutes and same
questions.)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
C
A
B
Chap 7-7
Interview Questions
Name, etc.
How comfortable are you with
 Minitab?
How do you learn best?
* * * * * * * Extra Time? * * * * * * * *
What are your greatest fears about this class?
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-8
In Every Statistical Study:
 Questions
are posed
 Characteristics are observed
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-9
Major Branches of Statistics
Descriptive Statistics



Organizing, Summarizing Information
Graphical techniques (know which graph goes with
which type of variable: qualitative, quantitative; and
which are univariate techniques and which are
bivariate)
Numerical techniques (know which descriptive
summary measure is appropriate for summarizing
qualitative data and which for quantitative data.)
Inferential Statistics


Estimation
Decision making
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-10
Descriptive Statistics
Methods for summarizing data
Summaries usually consist of graphs and
numerical summaries of the data
Summarize the raw data
Present the data in a useful format
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-11
Inference
Methods of making decisions or predictions
about a populations based on sample
information.
Most of this class (320) is about Inference.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-12
Parameter and Statistic
A parameter is a numerical summary of the
population
A statistic is a numerical summary of a sample
taken from the population
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-13
Variables and Sample Statistics
VARIABLE TYPE
What it
measures
Variable Name
Values
NUMERIC
Summary
Measures
Population
Parameter
Sample
Statistic
Population
Sample
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
QUALITATIVE
(categorical)
QUANTITATIVE
(numeric)
A quality, a category, or
the occurrence of a
particular type that you
should be able to identify
in each question..
Shorthand way of
referring to the variable.
0,1 or Y,N, or S,F
A measure of something that
you should be able to identify
in each question.
1) proportion
2) mean

p
x
n
Shorthand way of referring to
the variable.
#’s
μ
where x is the # of
Successes in n trials
X
Measures of
Variability
Measures of
Location
 X2
or  X
μ
sX2
or
X
sX
Chap 7-14
Randomness
Simple Random Sampling: each subject in the
population has the same chance of being
included in that sample
Randomness is crucial to experimentation
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-15
Variability
Measurements vary from person to person
Measurements vary from sample to sample
Variability is the name of the game in Data
Analysis!
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-16
Characteristics are Variables
A Variable is any characteristic that is recorded
for subjects in the study.
Note that our text uses the word characteristic for
parameter.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-17
Variation in Data
The terminology variable highlights the fact that
data values vary.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-18
Data values are called
observations
Each observation can be:
 Quantitative
 Categorical
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-19
Categorical Data - Sample Proportion
The sample proportion of
successes, denoted by p, is
p sample proportion of successes
 number of S'snin the sample
Where S is the label used for the
response designated as success.
The population proportion of
successes is denoted by .
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-20
Describing the Center of a Data Set with
the arithmetic mean
The population mean is denoted by µ, is the
average of all x values in the entire population.
The sample median is obtained by first ordering
the n observations from smallest to largest (with
any repeated values included, so that every sample
observation appears in the ordered list). Then
the single middle value if n is odd
sample median= 
 the mean of the middle two values if n is even
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-21
Mean
The sum of the observations divided by the
number of observations
x 
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.

x
n
Chap 7-22
Median
The midpoint of the observations when they are
ordered from the smallest to the largest (or from
the largest to the smallest)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-23
Comparing the Sample Mean & Sample Median
when a distribution is skewed positively, the mean is larger
than the median,
when a distribution is skewed negatively, the mean is
smaller then the median, and
when a distribution is symmetric, the mean and the median
are equal.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-24
Illustrated Distribution Shapes
Unimodal
Skew negatively
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Bimodal
Multimodal
Symmetric
Skew positively
Chap 7-25
Measuring Spread: Standard
Deviation
Creates a measure of variation by summarizing the
deviations of each observation from the mean and
calculating an adjusted average of these deviations
(x  x)
s
n 1
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
2
Chap 7-26
Empirical Rule
For bell-shaped data sets:
Approximately 68% of the observations fall within 1
standard deviation of the mean
Approximately 95% of the observations fall within 2
standard deviations of the mean
Approximately 100% of the observations fall within 3
standard deviations of the mean
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-27
Z Scores
We can measure any observation on a variable in
deviations from its mean and measure these deviations in
units of the standard deviation. The resulting value is the
Z-score.
Computing the z score is often referred to as
standardization and the z score is called a standardized
score.
The formula used with sample data is
x
z score  x 
s
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-28
Z Scores
The z score is how many standard deviations the
observation is from the mean.
A positive z score indicates the observation is above
the mean
A negative z score indicates the observation is below
the mean.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-29
Multivariate Data
A multivariate data set consists of
measurements or observations on each of
two or more variables.
The data we collected from the gas use survey
represents multivariate data. When we looked at a
scatterplot diagram of gas use and distance of
residence and distinguished between commuters
and those who live on-campus, that was multivariate
data analysis.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-30
Bivariate Data
A bivariate data set consists of measurements or
observations on each of two variables.
When we produced comparative descriptive statistics
and comparative box-plots, those were examples of
bivariate data analysis where one of the variables
was a quantitative variable and the other was a
qualitative variable.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-31
Examples with one Quantitative
Variable and one Qualitative.
Boxplot of How much do you spend per week
How much do you spend per week
40
commuter
on-campus
30
20
10
0
Panel variable: Do you live on-campus or are yo
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-32
Comparative Descriptive Statistics
Variable
How much do you
year
N
Mean
StDev
Median
1st
6
8.00
6.78
6.50
2nd
10
18.70
11.89
17.50
6
3.67
4.32
2.50
3rd
4th
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
2
10.000
0.000000000
10.000
Chap 7-33
Example: Food Type and Pesticide
Status
What proportion of all sampled items contain pesticide residuals?
Pesticides:
Food Type:
Organic
Conventional
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Yes
No
29
98
19485
7086
Chap 7-34
Contingency Table
The Food Type and Pesticide Status Table is
called a contingency table
A contingency table:




Displays 2 categorical variables
The rows list the categories of 1 variable
The columns list the categories of the other variable
Entries in the table are frequencies
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-35
Scatterplots
A scatterplot is a plot of pairs of observed values
(both quantitative) of two different variables.
When one of the variables is considered to be a
response variable (y) and the other an explanatory
variable (x).
Use for y, the variable for which we want to understand
the variation. The x will be the variable that we think
might help us explain the variation in y.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-36
Multivariate Data Analysis
Scatterplot of How much do you vs How far is your
How much do you spend per week
40
Do you
live
on-campus
or are yo
commuter
on-campus
30
20
10
0
0
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
5
10
15
20
25
How far is your residence from
30
Chap 7-37
Association
Positive Association - Two variables are positively
associated when above-average values of one tend
to accompany above-average values of the other
and below-average values tend similarly to occur
together. (i.e., Generally speaking, the y values tend
to increase as the x values increase.)
Negative Association - Two variables are negatively
associated when above-average values of one
accompany below-average values of the other, and
vice versa. (i.e., Generally speaking, the y values
tend to decrease as the x values increase.)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-38
The Pearson Correlation Coefficient
A measure of the strength of the linear relationship
between the two variables is called the Pearson correlation
value of r does not depend on
the unit of measurement for each variable.
coefficient. The
The value of r does not depend on which of the
two variables is labeled x.
The value of r is between
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
–1 and +1.
Chap 7-39
Some Correlation Pictures
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-40
Regression Analysis
Regression involves “fitting a line” through the
scatterplot of the data to represent the paired
observations (x,y).
The relationship y = a + bx is the equation of a straight
line.
The value b, called the slope of the line, is the amount
by which y increases when x increase by 1 unit.
The value of a, called the intercept (or sometimes the
vertical intercept) of the line, is the height of the line
above the value x = 0.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-41
Variation
The Total Variation in the dependent variable can be
split into two parts
That which is EXPLAINED by the independent or
explanatory variable and
That which is NOT EXPLAINED by the independent
variable.
Least squares Regression is a method for getting an
equation for a line through the data that produces a
line that minimizes the sum of the unexplained
variation for all the observations in the data set.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-42
Using the Regression Equation
The regression equation is
How much do you spend per week on gas
Gas Spending = 9.32 + 0.422 (Distance of residence)
y
=
9.32 + 0.422 x
We can use this to predict how much someone spends, if we know
how far they live from RIT. We just plug that value in for x and
solve for y.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-43
How Good Is the Line?
The coefficient
r2, gives the
of determination, denoted by
proportion of variation in y that can be attributed
to an approximate linear relationship between
x and y.
The r2 is between 0 and +1. The higher the value, the
greater the % of variation explained by the regression
equation. In our example, we were not able to explain
much of the variation in the amount spent on gasoline.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 7-44