Download Training activities day 1 E

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Elementary mathematics wikipedia , lookup

Transcript
Training activities day 1
E : Exercise on sales numbers
We will do a few further calculations on the sales numbers on the textbook by Navarro. Results from
questions marked with an A need to be entered to Absalon. Find the tests in the folder named Training 1 in
the Regression 2016 folder. In this item, the tests are numbered 1-E-3A etc (1= training 1, E=item E,
3A=question 3A) etc.
1. Type in the sales.by.month vector from Section 3.7.1 of Navarro.
2. Assume that 1 Australian dollar corresponds to 4.81 Danish Kroner (DKR). Use R to calculate how much
money Navarro earned in DKR each month.
3A. What is the total amount of DKR earned? Use the sum()-function.
4A. What is the average amount of DKR earned per month? You may use the mean()-function.
5A. We can calculate the mean in various ways. In the third question of the Day 1 training tests on Absalon,
you find several ways to calculate the mean number of sales. Which one is wrong?
F : Exercise on seq()
You might not find the seq()-function very useful at the moment, but I promise you will need it if you continue
working with R (and I will probably define an exercise later on where you will need it). This exercise should
make you more familiar with vector calculations in R which is important when we start working on real data.
The function seq() generates sequences of vectors, the syntax being:
 seq(from,to) generating a sequence from from to to with steps of size 1. Example :
> seq(1,5)
[1] 1 2 3 4 5
 seq(from,to,by=) generating a sequence from from to to with steps of the size specified by the
argument to by. Example (type yourself) : seq(11,34,by=3)
 seq(from,to,length=) generating a sequence from from to to with length corresponding to the
argument to length. Example (type yourself) : seq(11,33,length=3)
We will now do a few calculations based on the seq()-function. Results form questions marked with an A
have to be entered to Absalon.
If you need some more words on the seq()-function, watch the first 3 minutes
of https://www.youtube.com/watch?v=ljdm4oXD_o4 . Note that you can choose a faster speed by clicking the
gear icon.
1. How can you generate the numbers in the image above using the seq()-function?
2. Create a vector named x containing (1,2,3,...,19,20)
3. Create a vector named y containing (20,19,18,...,2,1)
4A. Create a vector named z with elements (1,2,3,...,19,20,19,...,2,1). Print / view z in R. When
defining z you might need the combine function c(). On Absalon among 5 possible choices, which one does
not give you the right sequence?.
5A. Generate a vector containing the elements (2, (2^2)/2, (2^3)/3,..., (2^7)/7 ).Try first to type 2^c(1,2) what happens? What is the mean of these 7 numbers?
G : Exercise on rep()
You might not find the rep()-function very useful at the moment either, but you will definitely need it if you
continue working with R.
The rep()-function creates a vector by repeating specific elements a specified number of times. rep() takes
two arguments: The element to repeat, x, and the number of times to repeat the element, times, the syntax
being rep(x,times). Examples:
> rep(x=1,times=5)
[1] 1 1 1 1 1
> rep(1,5) # we don't need to specify names of arguments when we put them in
right order
[1] 1 1 1 1 1
> rep(c(1,2),5)
[1] 1 2 1 2 1 2 1 2 1 2
> rep(c(1,2),c(2,5))
[1] 1 1 2 2 2 2 2
If you want more words on the rep() function watch the
video https://www.youtube.com/watch?v=ljdm4oXD_o4 (also given in item F) from 5:30 to 7:15 minutes
(skip the part on paste).
1. Define a vector containing elements (4,6,3,4,6,3,...,4,6,3) where there are 5 occurrences of each number.
2. Define a vector containing elements (4,4,...,4,6,...,6,3,...,3) with 5 occurrences of 4, 7 occurences of 6 and
10 occurences of 3.
3A. You can specify an alternative argument length to the rep()-function, rep(x,length=), which specifies
the length of the vector. Define a vector containing the elements (4,6,3,4,6,3,...,4,6,3,4) with 5 occurences of
4, 4 occurences of 6 and 3. Choose the appropriate code on Absalon.
H : Exercise on indexing
This exercise serves to become familiar with the indexing of vectors. This is important when you start
working with data. You might need to read section 3.10 of Navarro one more time.
First, copy the following commands into R to generate a sequence of random integer numbers:
set.seed(10)
x <- sample( 0:1000, 50, replace=T )
y <- sample( 0:1000, 50, replace=T )
The sample()-function is defined with arguments sample(x, size, replace = FALSE) where x is the
elements to sample from, size is the number of elements to sample, and replace specifies whether the
sampling should be with replacement (i.e. whether it is allowed to sample the same number several times).
The sample()-function used above thus samples 50 random numbers with replacement (due to replace=T if omitted, there will be no replacement) from the vector (0,1,..,1000). Note that 0:1000 generates the same
sequence as seq(0,1000).
The set.seed() command ensures that you will generate the exact same random numbers next time you
enter these three lines. The number 10 is an arbitrary number I've chosen, you should use the same.
To see the elements of x and y simply type
x
y
IMPORTANT : To be sure that you will obtain the correct numbers you have to run them immediately after
each other. As a check, the first five elements of x should be
> x[1:5]
[1] 507 307 427 693 85
and the means should equal
> mean(x)
[1] 458.6
> mean(y)
[1] 431.94
1. Make a plot of y vs x by typing plot( y ~ x ) or plot( x , y ). Are there any systematic tendencies? Is this to
be expected?
2A. Pick out the values in y which are > 600 and save them in a vector z. How many such elements are
there in y? Sort the numbers in z (simply use the sort()-function). If you need some more words on indexing
(than given in Navarro) you can watch the two videos (resp 4:10 and 4:40 mins (the first video is very basic,
the second more useful for solving this exercise):
https://www.youtube.com/watch?v=RNOKBoYalRc
http://youtu.be/SZII4zuqHN0
3. What are the index positions in y of the values which are > 600?
Hint: If you define a vector named numbers containing the numbers (1,2,...,50), you need to find the
elements of numbers for which y > 600.
4. What are the values in x which correspond to the values in y which are > 600?
5A. How many values in y are within 200 of the maximum value of the terms in y?
6A. Determine the mean of the values of y at index positions 1,8,15,...,43,50.
NB NB NB : Having completed test 6A on Absalon, you are given a link to a video where I
give the solution to all the questions in this exercise.
I : How to load a data set
In the video given at http://youtu.be/armBhT854RY I show you how to read a data set in to R. In general I will
work with data files in dBase format, as this format is easy to transfer between various statistical packages. I
will further demonstrate how to define a new variable and how to calculate the mean of a variable.
I also show how to export data to a dBase file.
If the video is shown in a bad quality, click the gear icon and choose the highest resolution.
The data file used in the video (sundby0.dbf) can be found from the link
http://publicifsv.sund.ku.dk/~sr/regression/datasets/
Right-click on the file to save it on your computer.
The commands I generate in the video is given below. You can copy them into your own R-script and try to
evaluate them.
# First set your own working directory to tell R where your dbf-file is located
library( foreign )
d <- read.dbf( 'sundby0.dbf' )
head( d )
d$bmi <- d$wgt / ( d$ht/100 )^2
hist( d$bmi )
range( d$bmi, na.rm=T )
write.dbf( d, 'sundby1.dbf' )
If you obtain the following error message when trying to load the data into R
> setwd("~/navn-og-placering-af-din-folder-skal-staa-her")
> library (foreign)
> d <- read.dbf('sunby0.dbf')
> Error in read.dbf("sunby0.dbf") : unable to open DBF file
your working directory does not contain the file. Either you've specified the wrong working directory or the file
has been saved with a wrong name. To ask R which files your working directory contains you can type dir()
in the console. If the data file is not listed, you've saved it in another folder.
J : Exercise on Sundby data
You will be working with the Sundby data set I introduced in the video in item J. If you have not yet loaded
the file into R, load the data set as described in item I.
1A : Calculate the mean height for males (gender==1) resp. females (gender==2).
Tip : The height-variable for males only is d$ht[ d$gender==1 ], ie. in the data set named d, the height
variable for those observations fulfilling that the gender variable equals 1. Similarly for females. Remember
the na.rm=T option shown in the video in item J when calculating the mean.