Download lecture_01

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction & Python 101
Outline
•
•
•
•
1
What is data science?
Overview of course
Examples
Crash course in Python
What is Data Science?
mathematics &
statistics
computer science
data science
domain expertise
2
What is Data Science?
• Rapidly growing field
• aka
•
•
•
•
•
machine learning (ML)
data mining
artificial intelligence
predictive analytics
...
• By end of course: put ML “expert” on your resume..
3
What is Data Science?
• Machine Learning => Prediction
• Known patterns
• Data Mining => Discovery
• Unknown patterns
4
What is Data Science?
• Data is everywhere!
•
•
•
•
•
5
Netflix recommendations
OK-Cupid matches
User website clicks
Self-driving cars
...
What is Data Science?
Data scientist:
• “Someone who knows more statistics than a computer
scientist and more computer science than a statistician”
• Wide range: from purely academic to applied
focus of this course
6
What is Data Science?
Data scientist:
• “Someone who extracts insights from messy data”
7
Outline
•
•
•
•
8
What is data science?
Overview of course
Examples
Crash course in Python
Course Overview
•
•
•
•
Wide breadth of data science topics
Focus on application, not theory
Module on data visualization
Use real-world tools on real-world data
• Python
• Knime
• Will not implement algorithms “from scratch”
9
Course Overview
Topics:
• Python 101 + viz
• ML tools
• k-nearest neighbors
• naive Bayes
• regression
• support vector machines
• artificial neural networks
• decision trees
10
•
•
•
•
•
•
clustering
anomaly detection
genetic algorithms
map-reduce
nature language processing
deep learning
Outline
•
•
•
•
11
What is data science?
Overview of course
Example
Crash course in Python
Example
• Start w/ “simple” example
• Does not require domain expertise
• Does not require fancy schmancy algorithms
• Does not require specialized data preprocessing
• Question: given set of data scientists,
• find interesting interconnections..
12
Example
• Find connections / commonalities between data scientists
• Data? List of users
• Each entry is dictionary
• two keys: id, name
13
Example
• Also given list of friendships (based on id)
• E.g., “Hero” (id 0) is friends with “Dunn” (id 1)
14
Example
• In graph form:
15
Example
• May want to ask some questions:
•
•
•
•
16
Average number of friends per user?
Who has the most friends?
What’s the minimum number of friends?
...
Example
• Easy to do in Python...
• 1) Add dictionary entry called “friends” for each user
• for each user in users list
• add a dictionary entry keyed on “friends”,
• value is empty list
17
Example
• 2) Go through friendship list of tuples
• add entries for each user
• users[i][“friends”] is an empty list (append the friend)
18
Example
• 3) Average number of friends? How to compute?
• Iterate through users list
• Use len( ) to determine length of each user’s “friends” list
• Sum and divide by len( users )
19
Example
• Book has several examples
• Don’t worry about details..
• Just examples of how powerful Python is...
• and how “simple” data science can be
• E.g., may just want to count connections
• May not need fancy ML algorithms..
20
Example
• Countless other (more complicated) examples
• Musical instrument classification
• Avalanches in seismic data
• ...
stuff I’ve done...
• This lecture => focused on “simple” example
• Doesn’t require much domain expertise
• Doesn’t require fancy algorithms
21
Outline
•
•
•
•
22
What is data science?
Overview of course
Example
Crash course in Python
Crash Course in Python
• We will use Python for data viz and ML / DM
• We will use Python 2.7 and Anaconda framework
• Has all the necessary tools (e.g., numpy, scipy, matplotlib)
• Easy to install..
• If issues, use a VM
• Just a refresher today
23
Python 101 - Syntax
• Python uses indentation instead of curly braces
• Readable code..
24
Python 101 - import
• Some modules not loaded (by default)
• Need to use import keyword (similar to #include)
• Can also use alias for simplicity
import matplotlib.pyplot as plt
plt.plot([0,1,2,3],[1,2,4,8])
plt.show()
25
Python 101 - Arithmetic
+
*
/
//
**
%
=>
=>
=>
=>
=>
=>
=>
addition
subtraction
multiplication
division (watch out for integer division!)
explicit integer division
exponentiation
modulo
Python does NOT have ++ or -26
Python 101 - Functions
• Takes zero or more input parameters
• Can return zero or more values
def fx( x ):
return x**2
27
Python 101 - Functions
• Functions can have default arguments
def fx(x = 0):
return x**2
28
Python 101 - Functions
• Functions can have named arguments
def fx(x=0, y=0):
return x**y
fx(2,3)
fx(y=3, x=2)
29
Python 101 - Strings
• Define strings w/ single or double quotes
s = “Hello World”
s2 = ‘Hello World’
• Use triple quotes for multiple line strings
s3 = “”” this is
a long
string “””
30
Python 101 - Lists
• Basically an array
• Very powerful
• Can have mixed types
my_list = [0, “hi”, 3.14]
31
Python 101 - Lists
• Built in functions:
• len( .. )
• sum( .. )
=>
=>
length of list
sum elements
my_list = [0, 1, 2]
len(my_list)
sum(my_list)
32
Python 101 - Lists
• Indexing
=>
0 indexed
• Can also use negative index!!
my_list = [0, 1, 2]
print my_list[0]
print my_list[-1]
33
Python 101 - Lists
• range( x ) function: produce “list” of numbers 0 to x
• Slicing (use colon with index)
my_list = range(10)
my_list[:3]
#0,1,2
my_list[3:]
#3,4,5,...,9
my_list[1:5]
#1,2,3,4
34
Python 101 - Lists
• in keyword checks for membership
my_list = range(10)
0 in my_list
10 in my_list
35
# True
# False
Python 101 - Lists
• append adds element to end of list
my_list = [1,2,3]
my_list.append(4)
print my_list
36
Python 101 - Tuple
• Tuple is an immutable list
• Can’t be modified
my_tuple = (1,2)
37
Python 101 - Tuple
• Function can return multiple values (via tuple)
def fx(x, y):
return (x**2,y**2)
t = fx(2,4)
x,y = fx(2,4)
38
#t => tuple (4, 16)
#x => 4, y=>16
Python 101 - Dictionary
• Similar to C++ map
• Associative data structure of key:value pairs
my_dict = {}
my_dict[“pi”] = 3.14
my_dict[“e”] = 2.71
another_dict = {“A”: 90, “B”: 80}
39
Python 101 - Dictionary
• Use in keyword to check for key
my_dict[“pi”] = 3.14
my_dict[“e”] = 2.71
“pi” in my_dict
40
# =>
True
Python 101 - Dictionary
• .keys( )
• .values( )
• .items( )
=>
=>
=>
returns list of keys
returns list of values
returns list of (key, value) tuples
my_dict[“pi”] = 3.14
my_dict[“e”] = 2.71
print my_dict.keys()
41
Python 101 – Control Flow
• If statements
• NOTE: colon at the end of each statement
• NOTE: elif instead of “else if”
if x < y:
print “x < y”
elif x > y:
print “x > y”
else:
print “x == y”
42
Python 101 – Control Flow
• while loop
• NOTE: colon
• NOTE: white space
x = 0
while x < 10:
print x
x += 1
43
Python 101 – Control Flow
• for loop
• NOTE: colon
• NOTE: white space
for x in range(10):
print x
44
Python 101 – Control Flow
• for loop
• NOTE: colon
• NOTE: white space
my_list = [1,2,3]
for x in my_list:
print x
45
Python 101 – Control Flow
• for loop
• NOTE: colon
• NOTE: white space
my_dict = {“A”:90, “B”:80}
for x in my_dict.keys():
print x,my_dict[x]
for x,y in my_dict.items():
print x,y
46
Python 101 – Logic
•
•
•
•
47
True
False
logical AND (&&)
logical OR (||)
=>
=>
and
or