Download Gyps fulvus

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Forecasting wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 5 Review: Summarizing Bivariate Data
Name _______________________________
Period__________
1. During the first 3 centuries AD, the Roman Empire
produced coins in the Eastern provinces. Some
historians argue that not all these coins were produced
in Roman mints, and further that local provincial mints
struck some of them. Because the "style" of coins is
difficult to analyze, the historians would like to use
metallurgical analysis as one tool to identify the source
mints of these coins. Investigators studied 8 coins
known to have been produced by the mint in Rome in
an attempt to identify a trace element profile for these
coins, and have identified gold and lead as possible
factors in identifying other coins as having been minted
in Rome. The gold and lead content, measured as a %
of weight of each coin, is given in the table at right, and
a scatter plot of these data is presented below.
Gold and Lead Content
Gold
% by Weight
0.22
0.24
0.20
0.23
0.18
0.15
0.17
0.17
Lead
% by Weight
0.41
0.31
0.89
0.62
0.41
0.88
0.67
0.59
%Lead vs. %Gold
1. a) What is the equation of the least squares
best fit line?
b) Graph the best fit line on the scatter
plot.
Lead
0.8
0.6
0.4
c) What is the value of the correlation
coefficient? Interpret this value.
.150
.200
Gold
d) What is the value of the coefficient of determination? Give an interpretation of this
value.
Chapter 5 Test, Form B
Page 1 of 9
.250
2. Suppose that the coins analyzed in problem 1 are representative of the metallurgical
content of coins minted in Rome during the first 300 years AD.
a) If a Roman coin is selected at random, and it's gold content is 0.20% by weight,
calculate the predicted lead content. Be sure to use correct notation and units.
b) One of the coins used to calculate the regression equations has a gold content of
0.200%. Calculate the residual for this coin. Be sure to use correct notation and
units.
c) Assess the line. (3 parts)
x
Chapter 5 Test, Form B
Page 2 of 9
3. The Des Moines Register recently reported the ratings of high school sportsmanship as
compiled by the Iowa High School Athletic Association. For each school the spectators
and participants were rated by referees, where 1 = superior, and 5 = unsatisfactory. A
regression analysis of the average scores given to wrestling spectators and wrestlers is
shown below.
Linear Fit
WrestSpectators = 0.667 + 0.701 Wrestlers
Wrestling Spectators vs. Wrestlers
WrestSpectators
3.5
Summary of Fit
3
RSquare
RSquare Adj
s
2.5
2
1.5
1
1
1.5
2
2.5
WrestParticipants
3
3.5
Analysis of Variance
Source
DF SS
Model
1 26.437
Error
290 30.157
C. Total
291 56.594
0.467
0.465
0.322
MS
F Ratio
26.437 254.2274
0.104 Prob > F
<.0001
a) Identify and Interpret the correlation between the ratings of spectators and wrestlers?
b) Identify and Interpret the coefficient of determination.
c) Identify and Interpret the value of the standard deviation about the least squares line?
d) Identify a possible influential point by circling it on the graph. How would we tell if
the point is an influential point?
e) Identify a possible outlier by putting a box around it on the graph. Estimate the order
pair of this possible outlier, and find it’s residual.
f) Describe the difference between an outlier and an influential point for bivariate data.
Chapter 5 Test, Form B
Page 3 of 9
4. The preservation of objects made of organic material is a constant concern to those
caring for items of historical interest. For example, some delicate fabrics are natural
silks--they are made of protein and are biodegradable. Many silks in museum collections
are in danger of crumbling. It would be of great benefit to be able to assess the delicacy
of the fabric before making decisions about displaying it. One possibility is chemical
analysis, which might give some evidence about the brittle nature of a fabric. To
investigate this possibility, bio-chemical data in the form of a ratio of the amount of
certain amino acids in the fibers was acquired from the linings of sixteen 19th and early
20th century Japanese kimonos, and the tenacity (breaking stress) of the fabric was also
recorded.
Using the data from the Japanese kimonos, construct the least
squares best fit line predicting tenacity using amino acid ratio
as a predictor.
a)
Amino acid ratio and
tenacity for linings for 16
Japanese kimonos
Amino acid
ratio
2.05
1.78
2.08
2.62
2.00
1.92
1.89
1.32
1.20
1.63
1.05
1.60
1.60
1.98
2.16
2.10
What is the equation of the least-squares line?
b) Identify and interpret the slope and y-intercept.
c) Approximately what proportion of the variability in
tenacity is explained by the amino acid ratio?
Tenacity Vs. AminoAcid Ratio
6
5
Tenacity
4
3
2
1
0
0
1
2
AminoAcidRatio
Chapter 5 Test, Form B
Page 4 of 9
3
Tenacity
1.20
1.60
1.30
0.90
1.80
1.60
1.20
2.40
3.10
2.80
4.40
3.00
2.10
3.10
1.40
1.90
d) Is a line the best way to summarize the data? Explain.
x
5.The theory of fiber strength suggests that the relationship between fiber tenacity and
amino acid ratio is logarithmic, i.e. Tˆ = a + b log (R) , where T is the tenacity and R is
the amino acid ratio. Perform the appropriate transformation of variable(s) and fit this
logarithmic model to the data.
a) What is the resulting best fit line using this model?
b) For an amino acid ratio of R = 1.5 , what is the predicted tenacity?
c) Using your results so far, does it appear that the transformed model in question (5) is
no improvement, a slight improvement, or a significant improvement over the linear
model in question (4)? Justify your response with an appropriate statistical
argument.
Transformed Scatter plot
Transformed Residual Plot
x
Chapter 5 Test, Form B
Page 5 of 9
6. Paleontology, the study of forms of prehistoric life, can sometimes be aided by modern
biology. The study of prehistoric birds depends on fossil information, which typically
consists of imprints in stone of a prehistoric creature’s remains. To study the
productivity of an ancient ecosystem it would be useful know the actual mass of the
individual birds, but this information is not preserved in the fossil record. It seems
reasonable that the biomechanics of birds operates much the same today as in the past.
For example, relationship between the wing length and total weight of a bird should be
very similar today to the relationship in the distant past. The wing lengths of ancient
birds are readily obtainable from the fossil record, but the weight is not. Assuming
similar biomechanical development for ancient birds and modern birds, a regression
model expressing the relationship between wing length and total weight of a modern bird
could be used to estimate the mass of similar prehistoric birds and thus gauge some
aspects of the ancient ecosystem.
Data is available for some modern birds of prey. Specifically, data on the mean wing
length and mean total weight of species of hawk-like birds of prey is given below.
Wing length and total weight of
modern species of birds of prey
Bird species
Gyps fulvus
Gypaetus barbatus grandis
Catharista atrata
Aguila chrysatus
Hieraeus fasciatus
Helotarsus ecaudatus
Geranoatus melanoleucus
Circatus gallicus
Buteo bueto
Pernis apivorus
Pandion haliatus
Circus aeruginosos
Circus cyaneus (female)
Circus cyaneus (male)
Circus pygargus
Circus macrurus
Milvus milvus
Wing length
(cm)
69.8
71.7
50.2
68.2
56.0
51.2
51.5
53.3
40.4
45.1
49.6
41.3
37.4
33.9
35.9
35.7
50.7
Total weight
(kilograms)
7.27
5.39
1.70
3.71
2.06
2.10
2.12
1.66
1.03
0.62
1.11
0.68
0.472
0.331
0.237
0.386
0.927
Using these data, construct the least squares best-fit line for predicting total weight using
wing length as a predictor.
a)
What is the equation of the least-squares line?
Chapter 5 Test, Form B
Page 6 of 9
b) Approximately what proportion of the
variability in weight is explained by the
wing length?
Bivariate Fit of Weight(kg) By WingLength(cm)
8
7.Biological theory suggests that the
relationship between the weight of these
animals and their wing length is exponential,
bL
bL
i.e. W = a (10) , or W = a (e) where W is
the wing weight and L is the wing length.
Perform the appropriate transformation of
variable(s) and fit an exponential model to the
data.
Weight(kg)
6
4
2
0
30
40
50
60
WingLength(cm)
a) What is the resulting best fit line using the
transformed model?
b) For a wing length of the data point where L = 56.0 (Hieraeus fasciatus), what is the
predicted bird weight? Show your work below.
c) How would you evaluate your transformed model in question (7) to see if it is an
improvement over the linear model in question (6)?
Chapter 5 Test, Form B
Page 7 of 9
70
8. One of the problems when estimating the size of animal populations from aerial
surveys is that animals may bunch together, making it difficult to distinguish and
count them accurately. For example, a horse standing alone is easy to spot; if seven
horses huddled close together some may be missed, resulting in an undercount. The
relative frequency of undercounts is typically reported as a percent. For example, if
there are 10 horses in a group, a person in the plane may typically count fewer than
10 horses 20% of the time. In a recent study, the percent of sightings that resulted in
an undercount was related to the size of the "group" of horses and donkeys; the
following data were gathered:
% Undercount vs. Group Size for Horses and Donkeys
Group
Size
2
3
4
5
6
7
8
%
Occurrence
Undercount
5
5
6
10
5
7
5
%
Occurrence
Undercount
6
7
5
5
14
13
23
Group
Size
9
10
11
12
14
16
18
After fitting a straight line model, P̂  a  bG , significant curvature was detected in the
residual plot, and two nonlinear models were chosen for further analysis, the exponential
and the power models. The computer output for these models is given below, and the
residual plots are on the next page.
log P  a  bG
(Exponential)
log P  a  b log G
(Power)
Bivariate Fit of Log%U By GroupSize
Bivariate Fit of Log%U By LogGS
Linear Fit
Linear Fit
Log%U = 0.586 + 0.031 GroupSize
Log%U = 0.476 + 0.440 LogGS
Summary of Fit
Summary of Fit
RSquare
RSquare Adj
St. Dev. Of Residuals
0.499
0.458
0.156
RSquare
RSquare Adj
St. Dev. Of Residuals
Chapter 5 Test, Form B
Page 8 of 9
0.337
0.282
0.180
Residual Plots
c) Generally speaking, which of the two models, power or exponential, is better at
predicting the log (Percent Undercount)? Provide statistical justification for your
choice.
Chapter 5 Test, Form B
Page 9 of 9