Download Package `bstats`

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Computer simulation wikipedia , lookup

Corecursion wikipedia , lookup

Predictive analytics wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Psychometrics wikipedia , lookup

Regression analysis wikipedia , lookup

Generalized linear model wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Package ‘bstats’
February 15, 2013
Version 1.0-12-3
Date 2011-10-31
Title Basic statistical functions for R
Author Bin Wang <[email protected]>.
Maintainer Bin Wang <[email protected]>
Description This package collects commonly used procedures or
algorithms for general data analysis. In addition, routines
for linear regression analysis, statistical computing and
graphics, and many others have been implemented in R for some
courses taught at the University of South Alabama.
License Unlimited
Repository CRAN
Date/Publication 2011-12-04 09:26:34
NeedsCompilation yes
R topics documented:
ac . . . . . .
birth . . . . .
bptest . . . .
bstats . . . .
dw.test . . . .
edf . . . . . .
edu75 . . . .
influential.plot
ld50.logit . .
ld50.logitfit .
lm.ci . . . . .
mediation.test
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
3
4
5
5
7
8
8
10
10
11
12
2
ac
model.check .
model.test . .
oddsratio . .
predictor.plot
residual.plot .
river . . . . .
scb . . . . . .
supervisor . .
vif . . . . . .
white.test . .
wls . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Index
ac
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
15
17
18
19
19
20
20
22
23
24
Autocorrelation
Description
Removal of autocorrelation by transformation.
Usage
ac(lmobj,type=’cochrane’, ...)
## S3 method for class ’lm’
ac(lmobj,type=’cochrane’, ...)
Arguments
lmobj
an object that inherits from class lm, such as an lm or glm object.
type
method selection: ’iterative’, ’cochrane’.
...
not used.
Details
’iterative’: simultaneously estimate the regression coefficients and rho by minimizing the sum
squared errors. A grid searching method is used.
’cochrane’: 1. Fit a linear regression model and compute OLS estimates 2. Calculate the residuals
to estimate rho from the data. 3. Fit (1) to obtain estimates of the regression coefficients. 4. Check
to see whether autocorrelation still exist. If yes, repeat by using the estimated coefficients from step
3 in step 1.
Value
coefficients, rhohat, dwtest, re-fitted model.
birth
3
Author(s)
Wang, B.
References
Cochrane and Orcutt (1949)
St 335 text
Examples
data(edu75)
lm0 = lm(Y~X1+X2+X3, data=edu75)
ac.lm(lm0,type=’iterative’)
ac.lm(lm0, type=’cochrane’)
birth
Birth data
Description
Birth data for singleton live births with gestational age at least 38 weeks.
Usage
data(birth)
Format
A data frame with 400 observations on 9 variables.
Sex
Gestation
Weight
Length
Head
Chest
Mother.s.age
type
region
References
Wang, CSDA and JSS papers.
character
numeric
numeric
numeric
numeric
numeric
numeric
factor
factor
’male’ or ’female’
Gestational age (in weeks).
birth weight.
height.
head size.
chest size.
chest size.
’r’ = rural or ’u’ = urban.
region of the birth.
4
bptest
bptest
Breusch-Pagan Test
Description
Performs the Breusch-Pagan test against heteroskedasticity.
Usage
bptest(formula, varformula = NULL, studentize = TRUE, data = list())
Arguments
formula
varformula
studentize
data
a symbolic description for the model to be tested (or a fitted "lm" object).
a formula describing only the potential explanatory variables for the variance
(no dependent variable needed). By default the same explanatory variables are
taken as in the main regression model.
logical. If set to TRUE Koenker’s studentized version of the test statistic will be
used.
an optional data frame containing the variables in the model. By default the
variables are taken from the environment which bptest is called from.
Details
The Breusch-Pagan test fits a linear regression model to the residuals of a linear regression model
(by default the same explanatory variables are taken as in the main regression model) and rejects if
too much of the variance is explained by the additional explanatory variables.
Under H0 the test statistic of the Breusch-Pagan test follows a chi-squared distribution with parameter
(the number of regressors without the constant in the model) degrees of freedom.
Value
A list with class "htest" containing the following components:
statistic
p.value
parameter
method
data.name
the value of the test statistic.
the p-value of the test.
degrees of freedom.
a character string indicating what type of test was performed.
a character string giving the name(s) of the data.
References
T.S. Breusch & A.R. Pagan (1979), A Simple Test for Heteroscedasticity and Random Coefficient
Variation. Econometrica 47, 1287–1294
R. Koenker (1981), A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics
17, 107–112.
W. Kramer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physica
bstats
5
Examples
## generate a regressor
x <- rep(c(-1,1), 50)
## generate heteroskedastic and homoskedastic disturbances
err1 <- rnorm(100, sd=rep(c(1,2), 50))
err2 <- rnorm(100)
## generate a linear relationship
y1 <- 1 + x + err1
y2 <- 1 + x + err2
## perform Breusch-Pagan test
bptest(y1 ~ x)
bptest(y2 ~ x)
bstats
R package: bstats
Description
In this paackage, some R functions are written for the convenience of class uses. Especially for my
st 315, st 210, st 335, st 475/575
Author(s)
B. Wang <[email protected]>
dw.test
Durbin-Watson Test
Description
Performs the Durbin-Watson test for autocorrelation of disturbances.
Usage
dw.test(formula, order.by = NULL, alternative = c("greater", "two.sided", "less"),
iterations = 15, exact = NULL, tol = 1e-10, data = list())
Arguments
formula
a symbolic description for the model to be tested (or a fitted "lm" object).
order.by
Either a vector z or a formula with a single explanatory variable like ~ z. The
observations in the model are ordered by the size of z. If set to NULL (the default)
the observations are assumed to be ordered (e.g., a time series).
alternative
a character string specifying the alternative hypothesis.
6
dw.test
iterations
an integer specifying the number of iterations when calculating the p-value with
the "pan" algorithm.
exact
logical. If set to FALSE a normal approximation will be used to compute the p
value, if TRUE the "pan" algorithm is used. The default is to use "pan" if the
sample size is < 100.
tol
tolerance. Eigenvalues computed have to be greater than tol to be treated as
non-zero.
data
an optional data frame containing the variables in the model. By default the
variables are taken from the environment which dwtest is called from.
Details
The Durbin-Watson test has the null hypothesis that the autocorrelation of the disturbances is 0. It is
possible to test against the alternative that it is greater than, not equal to, or less than 0, respectively.
This can be specified by the alternative argument.
Under the assumption of normally distributed disturbances, the null distribution of the DurbinWatson statistic is the distribution of a linear combination of chi-squared variables. The p-value is
computed using the Fortran version of Applied Statistics Algorithm AS 153 by Farebrother (1980,
1984). This algorithm is called "pan" or "gradsol". For large sample sizes the algorithm might fail to
compute the p value; in that case a warning is printed and an approximate p value will be given; this
p value is computed using a normal approximation with mean and variance of the Durbin-Watson
test statistic.
For an overview on R and econometrics see Racine & Hyndman (2002).
Value
An object of class "htest" containing:
statistic
the test statistic.
p.value
the corresponding p-value.
method
a character string with the method used.
data.name
a character string with the data name.
References
J. Durbin & G.S. Watson (1950), Testing for Serial Correlation in Least Squares Regression I.
Biometrika 37, 409–428.
J. Durbin & G.S. Watson (1951), Testing for Serial Correlation in Least Squares Regression II.
Biometrika 38, 159–178.
J. Durbin & G.S. Watson (1971), Testing for Serial Correlation in Least Squares Regression III.
Biometrika 58, 1–19.
R.W. Farebrother (1980), Pan’s Procedure for the Tail Probabilities of the Durbin-Watson Statistic
(Corr: 81V30 p189; AS R52: 84V33 p363- 366; AS R53: 84V33 p366- 369). Applied Statistics
29, 224–227.
edf
7
R. W. Farebrother (1984), [AS R53] A Remark on Algorithms AS 106 (77V26 p92-98), AS 153
(80V29 p224-227) and AS 155: The Distribution of a Linear Combination of χ2 Random Variables
(80V29 p323-333) Applied Statistics 33, 366–369.
W. Krämer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physica.
J. Racine & R. Hyndman (2002), Using R To Teach Econometrics. Journal of Applied Econometrics
17, 175–189.
See Also
lm
Examples
## generate two AR(1) error terms with parameter
## rho = 0 (white noise) and rho = 0.9 respectively
err1 <- rnorm(100)
## generate regressor and dependent variable
x <- rep(c(-1,1), 50)
y1 <- 1 + x + err1
## perform Durbin-Watson test
dw.test(y1 ~ x)
err2 <- filter(err1, 0.9, method="recursive")
y2 <- 1 + x + err2
dw.test(y2 ~ x)
To compute the empirical distribution function.
edf
Description
To compute the empirical distribution function.
Usage
edf(x,y=NULL)
Arguments
x
A sample. ’NA’ values will be automatically removed.
y
A grid of points where the edf will be evaluated.
Author(s)
B. Wang <[email protected]>
8
influential.plot
See Also
scb.
Examples
x = rnorm(100)
(out = edf(x))
plot(out)
(out2= scb(out))
lines(out2)
edu75
Education expenditure data (1975)
Description
Education expenditure data for all 50 states in U.S.A in 1975.
Usage
data(edu75)
Format
A data frame with 50 observations on 6 variables.
States
Y
X1
X2
X3
Region
character
numeric
numeric
numeric
numeric
character
Initial of state names
Educational expenditure.
X1.
X2.
X3.
region, 1=northwest, 2,3,4.
References
Stat 335 text
influential.plot
Draw plots for the influence measures
Description
Draw plots for the influence measures.
influential.plot
9
Usage
influential.plot(lmobj,type=’hadi’,ID=FALSE,col=1)
Arguments
lmobj
An R object by fitting an OLS model to a data set.
type
Plot type. ’hadi’: the Hadi’s influence Measures; ’potential-residual’: potentialresidual plot; ’dfits’: DFITS plot; ’hat’: leverage plot; ’cook’: Cook’s distance.
ID
Whether to identify points in the plots. Default: FALSE
col
Color of the plot.
Value
Output the influence measures, including leverage values (Leverage), Hadi’s measure (Hadi), Welsch
and Kuh Measure (DFIT) and Cook’s distance (CookD). In addition, the standard residuals are also
exported.
Author(s)
B. Wang <[email protected]>
See Also
residual.plot.
Examples
data(river)
lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)
influential.plot(lm0)
influential.plot(lm0,type=’hadi’)
influential.plot(lm0,type=’potential’)
influential.plot(lm0,type=’leve’)
influential.plot(lm0,type=’dfit’)
influential.plot(lm0,type=’cook’)
influential.plot(lm0,type=’potential’,ID=TRUE)
10
ld50.logitfit
ld50.logit
Predict Doses for Binomial Assay model (using counts)
Description
Calibrate binomial assays, generalizing the calculation of LD50 based on a logistic regression
model.
Usage
ld50.logit(ndead, ntotal, dose, cf = 1:2, p = 0.5)
Arguments
ndead
A vector of number of failures.
ntotal
Total number of trials.
dose
A vector of dosages.
cf
The terms in the coefficient vector giving the intercept and coefficient of (log)dose
p
Probabilities at which to predict the dose needed.
References
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Springer.
Examples
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
n=20
ld50.logit(numdead,n,ldose,p = 0.5)
ld50.logitfit
Predict Doses for Binomial Assay model (using counts)
Description
Calibrate binomial assays, generalizing the calculation of LD50 based on a logistic regression
model.
Usage
ld50.logitfit(rate, dose, p = 0.5)
lm.ci
11
Arguments
rate
A vector of percentages of successes among all trials.
dose
A vector of dosages.
p
Probabilities at which to predict the dose needed.
References
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Springer.
Examples
ldose <- rep(0:5, 2)
rate <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)/20
ld50.logitfit(rate,ldose,p = 0.5)
lm.ci
To compute the confidene interval of the regression parameters.
Description
To compute the confidene interval of the regression parameters.
Usage
lm.ci(lmobj,level=0.95)
Arguments
lmobj
An R object by fitting a linear regression model to a data set.
level
Confidence level. Default: 0.95.
Author(s)
B. Wang <[email protected]>
See Also
model.test.
12
mediation.test
Examples
data(birth)
attach(birth)
lm0 = lm(Head~Weight)
lm.ci(lm0)
lm1 = lm(Head~Weight+Gestation)
lm.ci(lm1, level=0.99)
mediation.test
The Sobel mediation test
Description
To compute statistics and p-values for the Sobel test. Results for three versions of "Sobel test" are
provided: Sobel test, Aroian test and Goodman test.
Usage
mediation.test(mv,iv,dv)
Arguments
mv
The mediator variable.
iv
The independent variable.
dv
The dependent variable.
Details
To test whether a mediator carries the influence on an IV to a DV.
Value
Missing values are not allowed.
Author(s)
B. Wang <[email protected]>
model.check
13
References
MacKinnon, D. P., & Dwyer, J. H. (1993). Estimating mediated effects in prevention studies.
Evaluation Review, 17, 144-158.
MacKinnon, D. P., Warsi, G., & Dwyer, J. H. (1995). A simulation study of mediated effect measures. Multivariate Behavioral Research, 30, 41-62.
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in
simple mediation models. Behavior Research Methods,Instruments, & Computers, 36, 717-731.
Preacher, K. J., & Hayes, A. F. (2008). asymptotic and resampling strategies for assessing and
comparing indirect effects in multiple mediator models. Behavior Research Methods, Instruments,
& Computers, 40, 879-891.
Examples
mv = rnorm(100)
iv = rnorm(100)
dv = rnorm(100)
mediation.test(mv,iv,dv)
Linear Regression Model Check
model.check
Description
Performs tests to check the least squares assumptions for a linear regression model.
Usage
model.check(lmobj)
Arguments
lmobj
A fitted model
Details
In this function, we check the normality, independece, and constant variance assmptions of the error
terms, and the presence of multicollinearity.
Value
A list with class "htest" containing the following components:
statistic
the value of the test statistic.
p.value
the p-value of the test.
parameter
degrees of freedom.
method
a character string indicating what type of test was performed.
data.name
a character string giving the name(s) of the data.
14
model.test
References
To be updated.
Examples
data(river)
lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)
model.check(lm0)
model.test
To compare two models and determine which one is adequate.
Description
To compare a full model and reduced model to test whether the reduced model is adequate or not.
Usage
model.test(fmobj,rmobj,alpha=0.05)
Arguments
fmobj
An R object by fitting a full linear regression model (FM) to a data set.
rmobj
An R object by fitting a reduced linear regression model (RM) to a data set.
alpha
Significance level. Default: alpha=0.05.
Details
To test a null hypothesis "H0: the RM is adequate" against "H1: the FM is adequate". The values
of test statistic, p-value and critical value based on an F test will be given.
Value
Missing values are not allowed.
Author(s)
B. Wang <[email protected]>
See Also
lm.ci.
oddsratio
15
Examples
data(supervisor)
attach(supervisor)
lm0 = lm(Y~X1+X3)
lm1 = lm(Y~X1+X2+X3+X4+X5+X6)
model.test(lm1,lm0)
oddsratio
Odds Ratio and Relative Risk
Description
To compute the odds ratio and relative risk based on a 2 X 2 table.
Usage
oddsratio(x,alpha=0.05,n,...)
Arguments
x
n
alpha
...
A vector of length 2 of the number of events from the case and control studies.
A vector of length 2 of the sample sizes.
The significance level. Default: 0.05.
Controls
Details
x can be a matrix or a data.frame: the first columns showing the number of events and the second
column showing the sample sizes.
Exact confidence limits for the odds ratio by using an algorithm based on Thomas (1971). See also
Gart (1971). If the sample sizes are too large, the exact confidence interval may not work due to
overflow problem.
Asymptotic confidence limits are computed according to SAS/STAT(R) 9.2 User’s Guide, Second
Edition.
Score method: code has been published for generating confidence intervals by inverting a score test.
It is available from http://web.stat.ufl.edu/~aa/cda/R/two_sample/R2/
See also "riskratio" and "oddsratio" in R package epitools.
Value
OR
RR
ORCI
RRCI
an estimate of odds ratio;
an estimate of realtive risk;
A table showing various (1-alpha)% confidence limits for OR;
A table showing various (1-alpha)% confidence limits for RR;
16
oddsratio
References
Agresti, A. (1990) _Categorical data analysis_. New York: Wiley. Pages 59-66.
Agresti, A. (1992), A Survey of Exact Inference for Contingency Tables Statistical Science, Vol. 7,
No. 1. (Feb., 1992), pp. 131-153.
Agresti, A. (2002), Categorical Data Analysis, Second Edition, New York: John Wiley \& Sons.
Fisher, R. A. (1935) The logic of inductive inference. _Journal of the Royal Statistical Society
Series A_ *98*, 39-54.
Fisher, R. A. (1962) Confidence limits for a cross-product ratio. _Australian Journal of Statistics_
*4*, 41.
Fisher, R. A. (1970) _Statistical Methods for Research Workers._ Oliver & Boyd.
Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A Fortran subroutine for Fisher’s
exact test on unordered r*c contingency tables. _ACM Transactions on Mathematical Software_,
*12*, 154-161.
Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm
for Performing Fisher’s Exact Test in r x c Contingency Tables. _ACM Transactions on Mathematical Software_, *19*, 484-488.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given
row and column totals. _Applied Statistics_ *30*, 91-97.
Stokes, M. E., Davis, C. S., and Koch, G. G. (2000), Categorical Data Analysis Using the SAS
System, Second Edition, Cary, NC: SAS Institute Inc.
See Also
fisher.test, chisq.test
Examples
# library(bstats)
x = c(1,0)
n = c(72370,73058)
oddsratio(x,n=n)
Convictions <matrix(c(2, 10, 15, 3),
nrow = 2,
dimnames =
list(c("Dizygotic", "Monozygotic"),
c("Convicted", "Not convicted")))
Convictions
fisher.test(Convictions, conf.level = 0.95)$conf.int
x = matrix(c(2,10,17,13), ncol=2)
oddsratio(x)
Convictions <matrix(c(8, 492, 0, 500), nrow = 2, byrow=TRUE)
predictor.plot
17
fisher.test(Convictions, conf.level = 0.95)$conf.int
x = c(8,0)
n = c(500,500)
oddsratio(x,n=n)
predictor.plot
Draw plots for predictor impacts on the dependent variable
Description
Draw added-variable plot (av) or redidual plus component (rc) plot.
Usage
predictor.plot(lmobj,type=’av’,ID=FALSE, col=1)
Arguments
lmobj
An R object by fitting an OLS model to a data set.
type
Plot type. ’av’: added variable plot; ’rc’: residual plus component plot.
ID
Whether to identify points in the plots. Default: FALSE
col
Color of the plot.
Value
Missing value not allowed.
Author(s)
B. Wang <[email protected]>
See Also
residual.plot.
Examples
data(river)
lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)
predictor.plot(lm0)
predictor.plot(lm0,type=’rc’)
18
residual.plot
Draw residual plots for an ordinary regression model.
residual.plot
Description
Draw residual plots for an ordinary regression model.
Usage
residual.plot(lmobj,type=’fitted’,col=1)
Arguments
lmobj
An R object by fitting an OLS model to a data set.
type
Type of residual plot(s): ’fitted’, residuals against fitted values; ’index’, residuals
against index; ’predictor’, residuals against each of the predictors in the fitted
model; ’qqplot’, qq-plot of the standardized residuals to check the normality
assumption.
col
Color of the plot.
Value
Missing values are not allowed.
Author(s)
B. Wang <[email protected]>
See Also
influential.plot.
Examples
data(river)
lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)
residual.plot(lm0)
residual.plot(lm0,type=’index’)
residual.plot(lm0,type=’predictor’)
scb
19
New York river data
river
Description
This is a data set selected from book "Regression by examples" by Samprit Chatterjee and Ali S.
Hadi.
Usage
data(river)
Format
In a 1976 study exploring the relationship between water quality and land use, Haith (1976) obtained
the measurements on 20 river basins in New York State. A question of interest here is how the
land use around a river basin contributes to the water pollution as measured by the mean nitrogen
concentration (mg/liter).
River
Agr
Forest
Rsdntial
ComIndl
Nitrogen
character
numeric
numeric
numeric
numeric
numeric
River names
percentage of land area currently in agricultural use
percentage of forest land
percentage of land area in residential use
percentage of land area either in commercial or industrial use
mean nitrogen concentration
References
"Regression analysis by example" by Samprit Chatterjee and Ali S. Hadi, Wiley. ISBN: 978-0-47174696-6.
To compute the simultaneous confidence bands.
scb
Description
To compute the simultaneous confidence bands.
Usage
scb(x,alpha=0.05)
Arguments
x
alpha
An R object. Currently, only ’edf’ objects are supported.
Significance level. Default 0.05 for a 95 percent confidence level.
20
vif
Author(s)
B. Wang <[email protected]>
See Also
edf.
Examples
x = rnorm(100)
(out = edf(x))
plot(out)
(out2= scb(out))
lines(out2)
Supervisor performance data
supervisor
Description
This is a data set selected from book "Regression by examples" by Samprit Chatterjee and Ali S.
Hadi.
Usage
data(supervisor)
Format
A data frame with 28829 observations on 8 variables.
Y
X1--X6
numeric
numeric
overall rating of jon being done by supervisor
average score for six different aspects
References
"Regression analysis by example" by Samprit Chatterjee and Ali S. Hadi, Wiley. ISBN: 978-0-47174696-6.
vif
Variance Inflation Factors
vif
21
Description
Calculates variance-inflation and generalized variance-inflation factors for linear and generalized
linear models.
Usage
vif(object, ...)
## S3 method for class ’lm’
vif(object, ...)
Arguments
object
an object that inherits from class lm, such as an lm or glm object.
...
not used.
Details
If all terms in an unweighted linear model have 1 df, then the usual variance-inflation factors are
calculated.
If any terms in an unweighted linear model have more than 1 df, then generalized variance-inflation
factors (Fox and Monette, 1992) are calculated. These are interpretable as the inflation in size of
the confidence ellipse or ellipsoid for the coefficients of the term in comparison with what would
be obtained for orthogonal data.
The generalized vifs are invariant with respect to the coding of the terms in the model (as long as
the subspace of the columns of the model matrix pertaining to each term is invariant). To adjust for
the dimension of the confidence ellipsoid, the function also prints GV IF 1/(2×df ) where df is the
degrees of freedom associated with the term.
Through a further generalization, the implementation here is applicable as well to other sorts of
models, in particular weighted linear models and generalized linear models, that inherit from class
lm.
Value
A vector of vifs, or a matrix containing one row for each term in the model, and columns for the
GVIF, df, and GV IF 1/(2×df ) .
Author(s)
Henric Nilsson and John Fox <[email protected]>
References
Fox, J. and Monette, G. (1992) Generalized collinearity diagnostics. JASA, 87, 178–183.
Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage.
22
white.test
Examples
data(edu75)
lm0 = lm(Y~X1+X2+X3, data=edu75)
vif(lm0)
White test of constant variance
white.test
Description
Perform a test to check the common variance assumption for a linear regression model.
Usage
white.test(lmobj)
Arguments
lmobj
A fitted model
Details
In this function, we check constant variance assmptions of the error terms.
Value
A list with class "htest" containing the following components:
statistic
the value of the test statistic.
p.value
the p-value of the test.
parameter
degrees of freedom.
method
a character string indicating what type of test was performed.
data.name
a character string giving the name(s) of the data.
References
White test, From Wikipedia, the free encyclopedia.
Examples
data(river)
lm0 = lm(Nitrogen~Agr+Forest+Rsdntial+ComIndl, data=river)
white.test(lm0)
wls
23
Weighted least squares estimate by groups
wls
Description
Weighted least squares estimate by groups.
Usage
wls(lmobj,group)
Arguments
lmobj
An R object by fitting an OLS model to a data set.
group
used to cluster the data. Can be a factor or a numerical vector.
Value
output the updated regressionn model with WLS.
Author(s)
B. Wang <[email protected]>
See Also
residual.plot.
Examples
data(edu75)
lm0 = lm(Y~X1+X2+X3, data=edu75)
wls(lm0,group=edu75$Region)
Index
∗Topic datasets
birth, 3
edu75, 8
river, 19
supervisor, 20
∗Topic htest
bptest, 4
dw.test, 5
model.check, 13
oddsratio, 15
white.test, 22
∗Topic models
ld50.logit, 10
ld50.logitfit, 10
∗Topic regression
ac, 2
ld50.logit, 10
ld50.logitfit, 10
vif, 20
∗Topic stats
bstats, 5
edf, 7
influential.plot, 8
lm.ci, 11
model.test, 14
predictor.plot, 17
residual.plot, 18
scb, 19
wls, 23
∗Topic test
mediation.test, 12
edf, 7, 20
edu75, 8
fisher.test, 16
influential.plot, 8, 18
ld50.logit, 10
ld50.logitfit, 10
lines.glm.dose (ld50.logit), 10
lines.scb (scb), 19
lm, 7
lm.ci, 11, 14
mediation.test, 12
model.check, 13
model.test, 11, 14
oddsratio, 15
plot.edf (edf), 7
plot.glm.dose (ld50.logit), 10
plot.scb (scb), 19
predictor.plot, 17
print.edf (edf), 7
print.glm.dose (ld50.logit), 10
print.odds (oddsratio), 15
print.scb (scb), 19
residual.plot, 9, 17, 18, 23
river, 19
scb, 8, 19
supervisor, 20
ac, 2
birth, 3
bptest, 4
bstats, 5
vif, 20
white.test, 22
wls, 23
chisq.test, 16
dw.test, 5
24