Download Training activities day 1 E

Training activities day 1 E : Exercise on sales numbers We will do a few further calculations on the sales numbers on the textbook by Navarro. Results from questions marked with an A need to be entered to Absalon. Find the tests in the folder named Training 1 in the Regression 2016 folder. In this item, the tests are numbered 1-E-3A etc (1= training 1, E=item E, 3A=question 3A) etc. 1. Type in the sales.by.month vector from Section 3.7.1 of Navarro. 2. Assume that 1 Australian dollar corresponds to 4.81 Danish Kroner (DKR). Use R to calculate how much money Navarro earned in DKR each month. 3A. What is the total amount of DKR earned? Use the sum()-function. 4A. What is the average amount of DKR earned per month? You may use the mean()-function. 5A. We can calculate the mean in various ways. In the third question of the Day 1 training tests on Absalon, you find several ways to calculate the mean number of sales. Which one is wrong? F : Exercise on seq() You might not find the seq()-function very useful at the moment, but I promise you will need it if you continue working with R (and I will probably define an exercise later on where you will need it). This exercise should make you more familiar with vector calculations in R which is important when we start working on real data. The function seq() generates sequences of vectors, the syntax being:  seq(from,to) generating a sequence from from to to with steps of size 1. Example : > seq(1,5) [1] 1 2 3 4 5  seq(from,to,by=) generating a sequence from from to to with steps of the size specified by the argument to by. Example (type yourself) : seq(11,34,by=3)  seq(from,to,length=) generating a sequence from from to to with length corresponding to the argument to length. Example (type yourself) : seq(11,33,length=3) We will now do a few calculations based on the seq()-function. Results form questions marked with an A have to be entered to Absalon. If you need some more words on the seq()-function, watch the first 3 minutes of https://www.youtube.com/watch?v=ljdm4oXD_o4 . Note that you can choose a faster speed by clicking the gear icon. 1. How can you generate the numbers in the image above using the seq()-function? 2. Create a vector named x containing (1,2,3,...,19,20) 3. Create a vector named y containing (20,19,18,...,2,1) 4A. Create a vector named z with elements (1,2,3,...,19,20,19,...,2,1). Print / view z in R. When defining z you might need the combine function c(). On Absalon among 5 possible choices, which one does not give you the right sequence?. 5A. Generate a vector containing the elements (2, (2^2)/2, (2^3)/3,..., (2^7)/7 ).Try first to type 2^c(1,2) what happens? What is the mean of these 7 numbers? G : Exercise on rep() You might not find the rep()-function very useful at the moment either, but you will definitely need it if you continue working with R. The rep()-function creates a vector by repeating specific elements a specified number of times. rep() takes two arguments: The element to repeat, x, and the number of times to repeat the element, times, the syntax being rep(x,times). Examples: > rep(x=1,times=5) [1] 1 1 1 1 1 > rep(1,5) # we don't need to specify names of arguments when we put them in right order [1] 1 1 1 1 1 > rep(c(1,2),5) [1] 1 2 1 2 1 2 1 2 1 2 > rep(c(1,2),c(2,5)) [1] 1 1 2 2 2 2 2 If you want more words on the rep() function watch the video https://www.youtube.com/watch?v=ljdm4oXD_o4 (also given in item F) from 5:30 to 7:15 minutes (skip the part on paste). 1. Define a vector containing elements (4,6,3,4,6,3,...,4,6,3) where there are 5 occurrences of each number. 2. Define a vector containing elements (4,4,...,4,6,...,6,3,...,3) with 5 occurrences of 4, 7 occurences of 6 and 10 occurences of 3. 3A. You can specify an alternative argument length to the rep()-function, rep(x,length=), which specifies the length of the vector. Define a vector containing the elements (4,6,3,4,6,3,...,4,6,3,4) with 5 occurences of 4, 4 occurences of 6 and 3. Choose the appropriate code on Absalon. H : Exercise on indexing This exercise serves to become familiar with the indexing of vectors. This is important when you start working with data. You might need to read section 3.10 of Navarro one more time. First, copy the following commands into R to generate a sequence of random integer numbers: set.seed(10) x <- sample( 0:1000, 50, replace=T ) y <- sample( 0:1000, 50, replace=T ) The sample()-function is defined with arguments sample(x, size, replace = FALSE) where x is the elements to sample from, size is the number of elements to sample, and replace specifies whether the sampling should be with replacement (i.e. whether it is allowed to sample the same number several times). The sample()-function used above thus samples 50 random numbers with replacement (due to replace=T if omitted, there will be no replacement) from the vector (0,1,..,1000). Note that 0:1000 generates the same sequence as seq(0,1000). The set.seed() command ensures that you will generate the exact same random numbers next time you enter these three lines. The number 10 is an arbitrary number I've chosen, you should use the same. To see the elements of x and y simply type x y IMPORTANT : To be sure that you will obtain the correct numbers you have to run them immediately after each other. As a check, the first five elements of x should be > x[1:5] [1] 507 307 427 693 85 and the means should equal > mean(x) [1] 458.6 > mean(y) [1] 431.94 1. Make a plot of y vs x by typing plot( y ~ x ) or plot( x , y ). Are there any systematic tendencies? Is this to be expected? 2A. Pick out the values in y which are > 600 and save them in a vector z. How many such elements are there in y? Sort the numbers in z (simply use the sort()-function). If you need some more words on indexing (than given in Navarro) you can watch the two videos (resp 4:10 and 4:40 mins (the first video is very basic, the second more useful for solving this exercise): https://www.youtube.com/watch?v=RNOKBoYalRc http://youtu.be/SZII4zuqHN0 3. What are the index positions in y of the values which are > 600? Hint: If you define a vector named numbers containing the numbers (1,2,...,50), you need to find the elements of numbers for which y > 600. 4. What are the values in x which correspond to the values in y which are > 600? 5A. How many values in y are within 200 of the maximum value of the terms in y? 6A. Determine the mean of the values of y at index positions 1,8,15,...,43,50. NB NB NB : Having completed test 6A on Absalon, you are given a link to a video where I give the solution to all the questions in this exercise. I : How to load a data set In the video given at http://youtu.be/armBhT854RY I show you how to read a data set in to R. In general I will work with data files in dBase format, as this format is easy to transfer between various statistical packages. I will further demonstrate how to define a new variable and how to calculate the mean of a variable. I also show how to export data to a dBase file. If the video is shown in a bad quality, click the gear icon and choose the highest resolution. The data file used in the video (sundby0.dbf) can be found from the link http://publicifsv.sund.ku.dk/~sr/regression/datasets/ Right-click on the file to save it on your computer. The commands I generate in the video is given below. You can copy them into your own R-script and try to evaluate them. # First set your own working directory to tell R where your dbf-file is located library( foreign ) d <- read.dbf( 'sundby0.dbf' ) head( d ) d$bmi <- d$wgt / ( d$ht/100 )^2 hist( d$bmi ) range( d$bmi, na.rm=T ) write.dbf( d, 'sundby1.dbf' ) If you obtain the following error message when trying to load the data into R > setwd("~/navn-og-placering-af-din-folder-skal-staa-her") > library (foreign) > d <- read.dbf('sunby0.dbf') > Error in read.dbf("sunby0.dbf") : unable to open DBF file your working directory does not contain the file. Either you've specified the wrong working directory or the file has been saved with a wrong name. To ask R which files your working directory contains you can type dir() in the console. If the data file is not listed, you've saved it in another folder. J : Exercise on Sundby data You will be working with the Sundby data set I introduced in the video in item J. If you have not yet loaded the file into R, load the data set as described in item I. 1A : Calculate the mean height for males (gender==1) resp. females (gender==2). Tip : The height-variable for males only is d$ht[ d$gender==1 ], ie. in the data set named d, the height variable for those observations fulfilling that the gender variable equals 1. Similarly for females. Remember the na.rm=T option shown in the video in item J when calculating the mean.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Training activities day 1 E