Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia, lookup

Regression analysis wikipedia, lookup

Data assimilation wikipedia, lookup

Linear regression wikipedia, lookup

Coefficient of determination wikipedia, lookup

Transcript
```AS
3.9
A
Investigate
bivariate
measurement
data
M
Investigate
bivariate
measurement
data, with
justification
E
Investigate
bivariate
measurement
data, with
statistical insight
One of the principles:
Grade distinctions should not be based on the
candidate being required to acquire and retain
more subject-specific knowledge.
Statistical enquiry cycle (PPDAC)
PPDAC is vital
Using the statistical enquiry cycle to …
investigate bivariate measurement data involves:
•
posing an appropriate relationship question
using a given multivariate data set
•
•
•
•
selecting and using appropriate displays
•
•
using the model to make a prediction
identifying features in data
finding an appropriate model
describing the nature and strength of the
relationship and relating this to the context
communicating findings in a conclusion
Using the statistical enquiry cycle to …
investigate bivariate measurement data involves:
•
posing an appropriate relationship question
using a given multivariate data set
•
•
•
•
selecting and using appropriate displays
•
•
using the model to make a prediction
identifying features in data
finding an appropriate model
describing the nature and strength of the
relationship and relating this to the context
communicating findings in a conclusion
Posing relationship questions
Possibly the most important component of the
investigation
•
Time spent on this component can determine
the overall quality of the investigation
•
This component provides an opportunity to
show justification (M) and statistical insight (E)
Posing relationship questions
What makes a good relationship question?
•
•
•
•
•
It is written as a question.
•
•
The question is related to the purpose of the task.
It is written as a relationship question.
It can be answered with the data available.
The variables of interest are specified.
It is a question whose answer is useful or
interesting.
Think about the population of interest. Can the
results be extended to a wider population?
Developing question posing skills
•
Pose several relationship questions (written
with reasons/justifications)
•
Possibly critique the questions
•
The precise meaning of some variables may
need to be researched
Different relationship questions
Is there a relationship between variable 1 and
variable 2 for Hector’s dolphins?
What is the nature of the relationship between
variable 1 and variable 2 for Hector’s dolphins?
Can variable 1 be used to predict variable 2 for
Hector’s dolphins?
Developing question posing skills
It is expected that you do some research:
•
Improve knowledge of variables and context
•
May find some related studies that creates
potential for integration of statistical and
contextual knowledge
Developing question posing skills
•
Draw some scatter plots to start to investigate
•
Reduce, add to and/or prioritise their list of
questions
•
Possibly critique the questions again
Appropriate displays
Which variable goes on the x-axis and which goes
on the y-axis?
•
It depends on the question and on the
variables of interest
•
Is there a relationship between zygomatic width
and rostrum length for Hector’s dolphins?
Variables on axes
 Is there a relationship between zygomatic width
and rostrum length for Hector’s dolphins?”
This is a straight forward relationship
question.
 Is there a relationship between zygomatic width
and rostrum length for Hector’s dolphins?”
This is a ‘correlation’ type question.
 Is there a relationship between zygomatic width
and rostrum length for Hector’s dolphins?”
Which variable goes on the x-axis and
which goes on the y-axis?
It depends on the question and on the variables of
interest
•
Is there a relationship between zygomatic width
and rostrum length for Hector’s dolphins?
•
Is there a relationship between rostrum width at
midlength and rostrum width at base for
Hector’s dolphins?
Variables on axes
Is there a relationship between rostrum width at midlength
and rostrum width at the base for Hector’s dolphins?
“I think that the width at the base could help
determine (or influence) the width at midlength.”
Is there a relationship between rostrum width at midlength
and rostrum width at the base for Hector’s dolphins?
Rostrum length at base (RWB) the explanatory variable and
rostrum width at midlength the response variable..
Is there a relationship between rostrum width at midlength
and rostrum width at the base for Hector’s dolphins?
Which variable goes on the x-axis and which goes
on the y-axis?
•
It depends on the question and on the
variables of interest
•
•
•
Is there a relationship between zygomatic width
and rostrum length for Hector’s dolphins?
Is there a relationship between rostrum width at
midlength and rostrum width at the base for
Hector’s dolphins?
For Hector’s dolphins, can rostrum length be
used to predict mandible length?
Variables on axes
For Hector’s dolphins, can rostrum length be used to
predict mandible length?
This is a regression question
For Hector’s dolphins, can rostrum length be used to
predict mandible length?
It is clear that we are interested in how mandible length
responds to rostrum length, so rostrum length is the
explanatory variable and mandible length is the response
variable.
Using the statistical enquiry cycle to …
investigate bivariate measurement data involves:
•
posing an appropriate relationship question
using a given multivariate data set
•
•
•
•
selecting and using appropriate displays
•
•
using the model to make a prediction
identifying features in data
finding an appropriate model
describing the nature and strength of the
relationship and relating this to the context
communicating findings in a conclusion
Same approach whether linear or non-linear
investigate bivariate measurement data involves:
•
posing an appropriate relationship question
using a given multivariate data set
•
•
•
•
selecting and using appropriate displays
•
•
using the model to make a prediction
identifying features in data
finding an appropriate model
describing the nature and strength of the
relationship and relating this to the context
communicating findings in a conclusion
Features, model, nature and strength
Generate the scatter plot using iNZight
(or excel)
RWM vs RWB
Features, model, nature and strength
Generate the scatter plot
• Let the data speak
• Use your eyes (visual aspects)
Write about what you see, not what you don’t see
DON’T fit a model yet
Features, model, nature and strength
Template for features (but allow flexibility)
• Trend
• Association (nature)
• Strength (degree of scatter)
• Groupings/clusters
• Unusual observations
• Other (e.g., variation in scatter)
• If there are groupings evident in the plot then it
may be better to explore the groups separately
at this stage.
Trend
From the scatter plot it
appears that there is a
linear trend between
rostrum width at base
and rostrum width at
midlength.
Use descriptions of variables rather than
variable names- keeps it contextual.
This is a reasonable expectation because two
different measures on the same body part of an
animal could be in proportion to each other.
Association
The scatter plot also
shows that as the
rostrum width at base
increases the rostrum
width at midlength tends
to increase.
This is to be expected because dolphins with small
rostrums would tend to have small values for rostrums
widths at base and midlength and dolphins with large
rostrums would tend to have large values for rostrums
widths at base and midlength.
Association
The scatter plot also
shows that as the
rostrum width at base
increases the rostrum
width at midlength tends
to increase.
A contextual description is preferable to one using
technical terms.
However it is appropriate to use terms such as
positive, negative or no association, but they are
better used after the contextual description.
Higher level considerations
This is to be expected because dolphins with small
rostrums would tend to have small values for rostrums
widths at base and midlength and dolphins with large
rostrums would tend to have large values for rostrums
widths at base and midlength.
You should reflect on the nature of the relationship with
respect to the context.
At this stage you could acknowledge (if the data does
not come from a randomised experiment) that they
have found only a statistical relationship and that this
does not necessarily imply a causal relationship
between the variables.
Higher level considerations
Alternatively, if the data comes from a suitable
experiment you could make a causation claim in
You may acknowledge that other variables (which
they must name) would impact on the (response)
variable, and suggest how they might impact on the
variable. For example, gender, age, etc. and
perhaps show these and compare.
Find a model
Because the trend is
linear I will fit a linear
model to the data.
You must state why you
have selected this
particular model.
The line is a good model for the data because for
all values of rostrum width at base, the number
of points above the line are about the same as the
number below it.
Find a model
Because the trend is
linear I will fit a linear
model to the data.
You must state why you
have selected this
particular model.
Don’t show the equation yet. There are still features
in the data to comment on.
Find a model
Because the trend is
linear I will fit a linear
model to the data.
You must state why you
have selected this
particular model.
A discussion of fit throughout the range of x-values
is required.
If the number of observations is small that casts
some doubt on the reliability of the model.
Find a model
The line is a good model for the data because for
all values of rostrum width at base, the number
of points above the line are about the same as the
number below it.
A discussion of fit throughout the range of x-values
is required.
If the number of observations is small that casts
some doubt on the reliability of the model.
Strength
The points on the graph
are reasonably close to
the fitted line so the
relationship between
rostrum width at
midlength and rostrum
width at base is
reasonably strong. This is supported by the
correlation coefficient r =
This must refer to visual aspects of the display.
Appropriate descriptors are: strong, moderate or weak.
Strength
The points on the graph
are reasonably close to
the fitted line so the
relationship between
rostrum width at
midlength and rostrum
width at base is
reasonably strong. This is supported by the
correlation coefficient r =
You must refer to the degree of scatter about the
trend or, equivalently, the closeness of the points to
the trend.
Groupings
Groupings
Now it seems that NI Hector’s
dolphins seem to have larger rostrum
widths at base than SI ones.
This is now helpful for doing
predictions.
There are some common values of
RWB for both species of dolphin.
This allows more comments to be
data points.
Groupings
Now it seems that NI
Hector’s dolphins seem to
have larger rostrum widths at
base than SI ones.
Many NI points are above the
line.
Why is that?
It could be the effect on the
fitted line of the unusual point
(RWB = 86, RWM = 50).
Groupings
This now provides an opportunity to comment
on these models.
South Island dolphins: Model seems appropriate.
North Island: Only 13 data points
– model may not be useful for reliable conclusions.
Points with lower RWB values tend to be below the line,
points in the middle tend to be above.
Could try a quadratic model but small number of data points
is an issue
Unusual points
One dolphin, one of
those with a rostrum
width at base of 86mm,
width at midlength
compared to dolphins
with the same, or similar,
rostrum widths at base.
Unusual points
One dolphin, one of
those with a rostrum
width at base of 86mm,
width at midlength
compared to dolphins
with the same, or similar,
rostrum widths at base.
Refer to actual data points.
Unusual points
One dolphin, one of
those with a rostrum
width at base of 86mm,
width at midlength
compared to dolphins
with the same, or similar,
rostrum widths at base.
Comment on the effect any unusual values would
have on the model.
Anything else
Variation in scatter?
Anything else
Variation in scatter?
Constant versus non-constant scatter relates to
assumptions of the linear regression i.e. that the
residuals are normally distributed.
Anything else
Variation in scatter?
Variation in scatter is relevant when discussing
precision of predictions.
Anything else
Variation in scatter?
Refer to visual aspects of the display and keep it
contextual.
For example: As the values of x increase the
amount (or degree) of variation in y tends to
increase (for fanning out)
Prediction
Try to use relevant values of x. In this case any
RWB value from 81mm to 86mm is sensible.
Prediction
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
Use all three models with RWB = 85mm (all, NI only,
SI only)
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
Using RWB = 85mm
All points: RWM = 0.77 x 85 – 8.72 = 56.73
NI dolphins: RWM = 0.48 x 85 + 19.19 = 59.99
SI dolphins: RWM = 0.46 x 85 + 14.37 = 53.47
Interpret in context, with units.
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
Use sensible rounding.
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
Using RWB = 85mm
All points: RWM = 0.77 x 85 – 8.72 = 56.73
NI dolphins: RWM = 0.48 x 85 + 19.19 = 59.99
SI dolphins: RWM = 0.46 x 85 + 14.37 = 53.47
Dangers – predicting outside the range of observed
x-values.
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
How good is a prediction?
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
A prediction is an estimate so I
need to consider bias and
precision.
How good is a prediction?
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
Bias:
If a model is good, then a
prediction is likely to be accurate
How good is a prediction?
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
The number of data points used to
form the model is relevant.
How good is a prediction?
Linear Trend
RWM = 0.77 * RWB + -8.72
Summary for Island = 1
Linear Trend
RWM = 0.48 * RWB + 19.19
Summary for Island = 2
Linear Trend
RWM = 0.46 * RWB + 14.37
Precision – Relates to the degree of scatter
Don’t relate a prediction to observed yvalues.
Statistical enquiry cycle (PPDAC)
Communicating findings in a conclusion
Each component of the cycle must be
communicated
Summary
Basic principles
•
•
•
Each component
Context
Visual aspects
Higher level considerations
•
•
•
Justify
Extend
Reflect
Other issues (if time)
•
The use or articles or reports to assist
contextual understanding
•
How to develop understanding of outliers on a
model
•
•
The place of residuals and residual plots
Is there a place for transforming variables?
```
Related documents