Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction & Python 101 Outline • • • • 1 What is data science? Overview of course Examples Crash course in Python What is Data Science? mathematics & statistics computer science data science domain expertise 2 What is Data Science? • Rapidly growing field • aka • • • • • machine learning (ML) data mining artificial intelligence predictive analytics ... • By end of course: put ML “expert” on your resume.. 3 What is Data Science? • Machine Learning => Prediction • Known patterns • Data Mining => Discovery • Unknown patterns 4 What is Data Science? • Data is everywhere! • • • • • 5 Netflix recommendations OK-Cupid matches User website clicks Self-driving cars ... What is Data Science? Data scientist: • “Someone who knows more statistics than a computer scientist and more computer science than a statistician” • Wide range: from purely academic to applied focus of this course 6 What is Data Science? Data scientist: • “Someone who extracts insights from messy data” 7 Outline • • • • 8 What is data science? Overview of course Examples Crash course in Python Course Overview • • • • Wide breadth of data science topics Focus on application, not theory Module on data visualization Use real-world tools on real-world data • Python • Knime • Will not implement algorithms “from scratch” 9 Course Overview Topics: • Python 101 + viz • ML tools • k-nearest neighbors • naive Bayes • regression • support vector machines • artificial neural networks • decision trees 10 • • • • • • clustering anomaly detection genetic algorithms map-reduce nature language processing deep learning Outline • • • • 11 What is data science? Overview of course Example Crash course in Python Example • Start w/ “simple” example • Does not require domain expertise • Does not require fancy schmancy algorithms • Does not require specialized data preprocessing • Question: given set of data scientists, • find interesting interconnections.. 12 Example • Find connections / commonalities between data scientists • Data? List of users • Each entry is dictionary • two keys: id, name 13 Example • Also given list of friendships (based on id) • E.g., “Hero” (id 0) is friends with “Dunn” (id 1) 14 Example • In graph form: 15 Example • May want to ask some questions: • • • • 16 Average number of friends per user? Who has the most friends? What’s the minimum number of friends? ... Example • Easy to do in Python... • 1) Add dictionary entry called “friends” for each user • for each user in users list • add a dictionary entry keyed on “friends”, • value is empty list 17 Example • 2) Go through friendship list of tuples • add entries for each user • users[i][“friends”] is an empty list (append the friend) 18 Example • 3) Average number of friends? How to compute? • Iterate through users list • Use len( ) to determine length of each user’s “friends” list • Sum and divide by len( users ) 19 Example • Book has several examples • Don’t worry about details.. • Just examples of how powerful Python is... • and how “simple” data science can be • E.g., may just want to count connections • May not need fancy ML algorithms.. 20 Example • Countless other (more complicated) examples • Musical instrument classification • Avalanches in seismic data • ... stuff I’ve done... • This lecture => focused on “simple” example • Doesn’t require much domain expertise • Doesn’t require fancy algorithms 21 Outline • • • • 22 What is data science? Overview of course Example Crash course in Python Crash Course in Python • We will use Python for data viz and ML / DM • We will use Python 2.7 and Anaconda framework • Has all the necessary tools (e.g., numpy, scipy, matplotlib) • Easy to install.. • If issues, use a VM • Just a refresher today 23 Python 101 - Syntax • Python uses indentation instead of curly braces • Readable code.. 24 Python 101 - import • Some modules not loaded (by default) • Need to use import keyword (similar to #include) • Can also use alias for simplicity import matplotlib.pyplot as plt plt.plot([0,1,2,3],[1,2,4,8]) plt.show() 25 Python 101 - Arithmetic + * / // ** % => => => => => => => addition subtraction multiplication division (watch out for integer division!) explicit integer division exponentiation modulo Python does NOT have ++ or -26 Python 101 - Functions • Takes zero or more input parameters • Can return zero or more values def fx( x ): return x**2 27 Python 101 - Functions • Functions can have default arguments def fx(x = 0): return x**2 28 Python 101 - Functions • Functions can have named arguments def fx(x=0, y=0): return x**y fx(2,3) fx(y=3, x=2) 29 Python 101 - Strings • Define strings w/ single or double quotes s = “Hello World” s2 = ‘Hello World’ • Use triple quotes for multiple line strings s3 = “”” this is a long string “”” 30 Python 101 - Lists • Basically an array • Very powerful • Can have mixed types my_list = [0, “hi”, 3.14] 31 Python 101 - Lists • Built in functions: • len( .. ) • sum( .. ) => => length of list sum elements my_list = [0, 1, 2] len(my_list) sum(my_list) 32 Python 101 - Lists • Indexing => 0 indexed • Can also use negative index!! my_list = [0, 1, 2] print my_list[0] print my_list[-1] 33 Python 101 - Lists • range( x ) function: produce “list” of numbers 0 to x • Slicing (use colon with index) my_list = range(10) my_list[:3] #0,1,2 my_list[3:] #3,4,5,...,9 my_list[1:5] #1,2,3,4 34 Python 101 - Lists • in keyword checks for membership my_list = range(10) 0 in my_list 10 in my_list 35 # True # False Python 101 - Lists • append adds element to end of list my_list = [1,2,3] my_list.append(4) print my_list 36 Python 101 - Tuple • Tuple is an immutable list • Can’t be modified my_tuple = (1,2) 37 Python 101 - Tuple • Function can return multiple values (via tuple) def fx(x, y): return (x**2,y**2) t = fx(2,4) x,y = fx(2,4) 38 #t => tuple (4, 16) #x => 4, y=>16 Python 101 - Dictionary • Similar to C++ map • Associative data structure of key:value pairs my_dict = {} my_dict[“pi”] = 3.14 my_dict[“e”] = 2.71 another_dict = {“A”: 90, “B”: 80} 39 Python 101 - Dictionary • Use in keyword to check for key my_dict[“pi”] = 3.14 my_dict[“e”] = 2.71 “pi” in my_dict 40 # => True Python 101 - Dictionary • .keys( ) • .values( ) • .items( ) => => => returns list of keys returns list of values returns list of (key, value) tuples my_dict[“pi”] = 3.14 my_dict[“e”] = 2.71 print my_dict.keys() 41 Python 101 – Control Flow • If statements • NOTE: colon at the end of each statement • NOTE: elif instead of “else if” if x < y: print “x < y” elif x > y: print “x > y” else: print “x == y” 42 Python 101 – Control Flow • while loop • NOTE: colon • NOTE: white space x = 0 while x < 10: print x x += 1 43 Python 101 – Control Flow • for loop • NOTE: colon • NOTE: white space for x in range(10): print x 44 Python 101 – Control Flow • for loop • NOTE: colon • NOTE: white space my_list = [1,2,3] for x in my_list: print x 45 Python 101 – Control Flow • for loop • NOTE: colon • NOTE: white space my_dict = {“A”:90, “B”:80} for x in my_dict.keys(): print x,my_dict[x] for x,y in my_dict.items(): print x,y 46 Python 101 – Logic • • • • 47 True False logical AND (&&) logical OR (||) => => and or