Download Lecture1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Economics of digitization wikipedia , lookup

Time series wikipedia , lookup

Transcript
APPM 2720 Spring 2016
Lecture 1
Douglas Nychka
National Center for Atmospheric Research
National Science Foundation
Spring 2016
Goal
This course is to expose students to data analysis and discovery using
techniques from data science.
Data Science: is an interdisciplinary field about processes and systems
to extract knowledge or insights from data in various forms, either structured or unstructured.
D. Nychka APPM2720 Lecture 1
2
Statistics is just part of this
A classic intro stats course has things such as
- probability discrete then maybe continuous distribution
- concept of sample verses population
- basic statistics, eg. mean, standard deviation and histograms
- testing for the population mean and comfidence intervals
- straight line fitting
All these elementary topics are designed to reinforce basic principles in
statistics.
But they rarely show the value for large and complex data problems!
D. Nychka APPM2720 Lecture 1
3
What this course is about
- Confront a dataset based on answering a question.
- Analysis tools and strategies used to reach an answer will be examples
of statistical concepts.
- Develop programming skill in R and related programs to look at data.
- Although mathematical formulas will not be used much there still must
be an strong element of logical thinking
D. Nychka APPM2720 Lecture 1
4
Some data examples
D. Nychka APPM2720 Lecture 1
5
Used Audi A4 prices
30000
20000
10000
Asking Price
40000
Used Audi A4 prices cars.com
●
●
●●
●
●
−2003
2004−2007
2008−2011
2012−2015
●
●
●
● ●●
● ●
●
●●
●
●● ● ● ●
●
●●●
●
●
●
●
●
● ●●
● ● ● ● ●
●
● ●●
●●
●
● ●
●
●
●
● ●●
●●
●
●
●● ●
●●
● ●
●
●
●
●●●
●●● ●
● ●● ●
● ●
●
●●● ●
●
●●●
● ●
●● ●
●
●
●
●●● ●
● ●
●● ●
● ● ●
●
●
●●●●●● ●●
●●
●
● ●
●
●
●●● ●●
●
● ●
●●
● ●
●●
●
● ●● ●● ●
● ● ●●
●
●
●●
●
● ●● ●●
●
●
●●
● ●
●
●
● ●
●
●
●
●
●● ●
● ●
●
● ●
●
● ●
● ●
●
● ● ● ● ●
●
●
●
●●●
●●
●
●
● ●
●●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ●
● ●
●
●
●
●●
●
●
● ● ●
● ●
●●
●
●
●
● ●
●
●
●
●● ● ●
● ●● ● ●
● ●
●
●
●
●
●
● ●
●
● ●
● ●
●
● ● ● ●
● ●
● ●
●
●
●●
●
●
●
● ●
●
●●
●●
● ●●
●
●●
●●
● ● ● ●
●●
●
● ●●
●
●
●●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
0
●
0
50
100
150
200
Mileage (thousands)
How do you quantify the tradeoff between older cars and cheaper prices?
D. Nychka APPM2720 Lecture 1
6
Stock prices
Daily percent return Oct 2012 − Sep 2015
4
JAN 15, 2013 −−
●
●
2
0
−2
●
−4
Goldman Sachs
● ●●
●●
●
●
●
−6
● ●
●
●
●
●
● ●
●
● ●● ● ●
●
●●
● ●
● ●●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●●● ●
●● ●
●● ●●●
●
●●
●● ● ●●●●
●
●
● ●●
●
●●
●●
● ● ●
● ●●
●
●
●
●
●
●
●●
● ●
●
●●
●
● ●●
●
●●
●●
●●● ●
●
●
●
●●●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
● ●●●
●
●
●●
●
●
●●
●
●
● ● ●
●
●●
●
●
●●●●
●●
●
●●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●●●
●
●
●● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
● ●● ●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●●●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
● ●● ●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
● ●●
●●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●● ●
●
●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●●●
●● ●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●● ●●●●
●
●
● ● ●● ●
●
●
●
●
●
●
●
●
●
● ●● ●
●
● ●
●●
●
● ●●●
● ●●● ●● ●
●
● ● ●
●●
●
● ● ●● ●
●
●
● ● ●●
●
● ●●
●
● ●
●
●
●
●
●
●
● ●
● ●● ● ●
●
●
●●
●
●●
●
●
●
●
●
−4
−2
0
2
4
6
Morgan Stanley
How well do Goldman Sachs and Morgan Stanley stocks track each
other?
D. Nychka APPM2720 Lecture 1
7
Mary Jane Resort
Where is the steepest part of a ski run?
D. Nychka APPM2720 Lecture 1
8
15
Boulder daily rainfall
23.1 cm
(cm)
10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
● ●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●● ●●
●●
●● ●●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●●● ● ● ● ● ●
●
●
●● ●
●
● ●
● ●●
● ●●● ● ●
●●
●
● ●
●● ● ●
●●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ● ● ● ●●●
● ●
● ●
●
●● ●
●●
● ●● ●●●
●
●
●
●
●
●
●
●●● ● ●
● ● ●●
● ●● ● ●
●● ●●
●
●
●
●●
● ●●●●●
● ●● ● ●
●● ● ●●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
● ● ●
● ●
●
● ●● ● ●
● ●
●
●
●
●
●●
●
● ●
●
● ●●
●
●
●
●
●
●●
●
● ● ● ●
●
●
● ● ●
● ●
● ●●
●
● ●
● ●●
●
●
● ●● ● ●● ●
●
● ● ●
●●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
● ●
●
● ●●
● ● ●● ● ●●
● ● ● ●
●
● ● ●● ● ●
●●● ●
● ●
●● ● ●● ● ●
●●
●
●
●
●
● ●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ●●●●●●●● ●●● ●● ●●
●
●
●
●●
●
●
●
●● ●
● ● ●
● ●
●
●
●
● ●● ●
●
●● ●
●●
●●
● ●● ● ●
●
●●
●
●●●
●
● ● ●
●
●● ● ● ●
●●
●●●
●●
● ●● ●
● ● ●● ●
●
●● ● ● ●
●●● ●
●
● ●●●
●● ●
●
●
0
5
●
●
●
●
●
1900
1920
1940
1960
1980
2000
Years
What is the probability of rainfall in Boulder exceeding 8 cm ( about
3.2 inches) in a day?
D. Nychka APPM2720 Lecture 1
9
Digial images: 100 Weddings
The average image
wedding
J. Salavon
Cabinet 15
See Jason Salavon on Wikipedia
D. Nychka APPM2720 Lecture 1
10
An example of R code:
1:10
##
[1]
1
2
3
4
5
6
7
8
9 10
mean(1:10)
## [1] 5.5
D. Nychka APPM2720 Lecture 1
11
Computation
n<-1:50
e<-(1+ 1/n)^n
print(e)
##
##
##
##
##
##
##
##
[1]
[8]
[15]
[22]
[29]
[36]
[43]
[50]
2.000000
2.565785
2.632879
2.658970
2.672849
2.681464
2.687333
2.691588
2.250000
2.581175
2.637928
2.661450
2.674319
2.682435
2.688022
D. Nychka APPM2720 Lecture 1
2.370370
2.593742
2.642414
2.663731
2.675696
2.683357
2.688681
2.441406
2.604199
2.646426
2.665836
2.676990
2.684232
2.689312
2.488320
2.613035
2.650034
2.667785
2.678208
2.685064
2.689917
2.521626
2.620601
2.653298
2.669594
2.679355
2.685856
2.690497
2.546500
2.627152
2.656263
2.671278
2.680439
2.686612
2.691053
12
Graphics
plot( n,e)
D. Nychka APPM2720 Lecture 1
13
Thank you!
Questions?
D. Nychka APPM2720 Lecture 1
14