Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
APPM 2720 Spring 2016 Lecture 1 Douglas Nychka National Center for Atmospheric Research National Science Foundation Spring 2016 Goal This course is to expose students to data analysis and discovery using techniques from data science. Data Science: is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured. D. Nychka APPM2720 Lecture 1 2 Statistics is just part of this A classic intro stats course has things such as - probability discrete then maybe continuous distribution - concept of sample verses population - basic statistics, eg. mean, standard deviation and histograms - testing for the population mean and comfidence intervals - straight line fitting All these elementary topics are designed to reinforce basic principles in statistics. But they rarely show the value for large and complex data problems! D. Nychka APPM2720 Lecture 1 3 What this course is about - Confront a dataset based on answering a question. - Analysis tools and strategies used to reach an answer will be examples of statistical concepts. - Develop programming skill in R and related programs to look at data. - Although mathematical formulas will not be used much there still must be an strong element of logical thinking D. Nychka APPM2720 Lecture 1 4 Some data examples D. Nychka APPM2720 Lecture 1 5 Used Audi A4 prices 30000 20000 10000 Asking Price 40000 Used Audi A4 prices cars.com ● ● ●● ● ● −2003 2004−2007 2008−2011 2012−2015 ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ●●● ● ● ●● ● ● ● ● ●●● ● ● ●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ●● ●● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● 0 50 100 150 200 Mileage (thousands) How do you quantify the tradeoff between older cars and cheaper prices? D. Nychka APPM2720 Lecture 1 6 Stock prices Daily percent return Oct 2012 − Sep 2015 4 JAN 15, 2013 −− ● ● 2 0 −2 ● −4 Goldman Sachs ● ●● ●● ● ● ● −6 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●●● ● ●● ● ●● ●●● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ●● ●●● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●● ●● ● ●● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ●● ●●●● ●●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●●● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● −4 −2 0 2 4 6 Morgan Stanley How well do Goldman Sachs and Morgan Stanley stocks track each other? D. Nychka APPM2720 Lecture 1 7 Mary Jane Resort Where is the steepest part of a ski run? D. Nychka APPM2720 Lecture 1 8 15 Boulder daily rainfall 23.1 cm (cm) 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ●●●●● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●● ●●● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ●●● ●● ● ● ● 0 5 ● ● ● ● ● 1900 1920 1940 1960 1980 2000 Years What is the probability of rainfall in Boulder exceeding 8 cm ( about 3.2 inches) in a day? D. Nychka APPM2720 Lecture 1 9 Digial images: 100 Weddings The average image wedding J. Salavon Cabinet 15 See Jason Salavon on Wikipedia D. Nychka APPM2720 Lecture 1 10 An example of R code: 1:10 ## [1] 1 2 3 4 5 6 7 8 9 10 mean(1:10) ## [1] 5.5 D. Nychka APPM2720 Lecture 1 11 Computation n<-1:50 e<-(1+ 1/n)^n print(e) ## ## ## ## ## ## ## ## [1] [8] [15] [22] [29] [36] [43] [50] 2.000000 2.565785 2.632879 2.658970 2.672849 2.681464 2.687333 2.691588 2.250000 2.581175 2.637928 2.661450 2.674319 2.682435 2.688022 D. Nychka APPM2720 Lecture 1 2.370370 2.593742 2.642414 2.663731 2.675696 2.683357 2.688681 2.441406 2.604199 2.646426 2.665836 2.676990 2.684232 2.689312 2.488320 2.613035 2.650034 2.667785 2.678208 2.685064 2.689917 2.521626 2.620601 2.653298 2.669594 2.679355 2.685856 2.690497 2.546500 2.627152 2.656263 2.671278 2.680439 2.686612 2.691053 12 Graphics plot( n,e) D. Nychka APPM2720 Lecture 1 13 Thank you! Questions? D. Nychka APPM2720 Lecture 1 14