Download Exploring R and the Group Assignment Analyzing the dataset

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exploring R and the Group Assignment
Analyzing the dataset ChickWeight: Students will form groups (4 or 5 students per group) and
each group will be handed out a list of 10-15 R-functions from the list below and may be some
more similar functions. Each group will have a separate set of functions. They will be asked to
explore the functions and share their results at the end.
Data Organization Functions: Try these commands.
str(ChickWeight)
head(Chickweight)
[Gives the structure of dataset ChickWeight]
[shows the first six observations with all the variables]
summary(ChickWeight)
[summarizes the information such as min , max, and mean of the data set]
ChickWeight$weight
[shows the column weight]
table(ChickWeight$weight) [organizes the column in a table format]
nrow(ChickWeight):
[gives the number of rows in the dataset]
ChickWeight[[Time]]=NULL
[removes the Time variable from the data]
Diet1=subset(ChickWeight,Diet==”1”) [Subsets the datasets in to all the chicks having diet level 1]
Diet1
[Will show the subset thus formed]
Overweight=subset(Diet1,weight>=”90”) [Subsets the chicks out of those in level 1 diet who weigh 90
and above]
Overweight
[Displays the subset with the variable name OverWeight]
Data Summarization Functions:
which.min(ChickWeight$weight) Ans: 196 [This is 196th entry]
ChickWeight$weight[196] Ans: 35 [196th entry is 35 and it is the minimum weight]
mean(ChickWeight$weight)
sd(ChickWeight$weight):
[Gives the mean of the weight variable]
[Standard deviation of the weight variable]
Ans: 121.8183
Ans: 71.07196
tapply(ChickWeight$weight,ChickWeight$Chick,mean)
tapply(ChickWeight$weight,ChickWeight$Chick,min)
Drawing Graphs:
plot(ChickWeight$weight,ChickWeight$Diet) [Draws scatterplot of weight versus Diet]
hist(ChickWeight$weight)
[Plots Histogram of the numerical data column weight]
boxplot(ChickWeight$weight ~ ChickWeight$Diet, xlab = “Diet”, ylab = “weight”, main = “Weight of
Chicks on Diet 1”) [Draws a boxplot with the given title and x & y labels]
A Linear Regression Model in R
LinReg=lm(weight~Time+Chick+Diet, data = ChickWeight) [Gives a linear regression model]
summary(LinReg)
[Displays a summary of the model]
LinReg$residuals
[Computes and displays the residuals]
SSE= sum(LinReg$residuals^2) [Computes the sum of squared errors]
SSE
[Displays the SSE]
For the assigned R Programming lab:
A Logistic Regression Model: [A logistic regression is used when the outcome variable is categorical]
Students will make a logistic regression model using the data set “framingham”
In this case we use the glm function as follows:
[Note: glm stands for generalized linear model]
LogLin= glm(outcome variable ~ ., family = binomial, data = framingham) [Note: The . after the ~
symbol here includes all the independent variables in the study]
summary(LogLin) [Summary of the model]
predLogLin= predict(LogLin, data= framingham $outcome variable) [This is a prediction function and is
used to predict the outcome using the model]
SSE = sum(predLogLin – framingham $outcome variable)^2
SSE
[Displays SSE]
[Computes SSE]
Related documents