* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Visualize
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Information privacy law wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Clusterpoint wikipedia , lookup
Business intelligence wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Database model wikipedia , lookup
PYTHON AND SQL Introduction to these complimentary tools Gus Cavanaugh Agenda ■ Brief overview of Data Analytics Office Hours Meetup ■ Quick Intro to SQL and Python ■ When to use SQL, Python, both, or neither ■ Switch over to Google Hangout for discussion Data Analytics Office Hours ■ About me: Consultant trying to use data to solve business problems ■ Meetups have been super helpful for learning technical subjects. But most are in-depth presentations on a topic. As such, I often quickly get lost. ■ When I gave meetup talks, the best questions were often asked by individuals after the talk was over. Worse, they were usually foundational questions that meant the attendee missed out on a lot of value ■ I want this meetup to be the place where those types of questions are asked. ■ Presentations will be short or non-existent – just enough to cover some basic ideas. After that, it’s up to us. ■ To put it crudely, I want this to be the forum for “stupid” questions. If one person asks one question they would be afraid to raise their hand and ask at a large meetup, this will have been a success My ask of you ■ Introduce yourself ■ Ask Questions… – Through the webinar – Through the meetup: http://www.meetup.com/DataAnalytics-Office-Hours/ – Via Email: [email protected] – On Twitter: @GusCavanaugh On to the material! Python and SQL ■ Python: a general purpose programming language with popular libraries for data science and analytics. A few non-exhaustive examples of popular libs: – Pandas: data analysis – Matplotlib/Seaborn/Bokeh: data visualization – PySpark: manipulating data on clusters of computers – Django/Flask: building web applications ■ SQL: “ESS-Queue-L” or “SEE-KWELL” a structured query language for manipulating data in relational databases – No libraries! (at least nothing comparable to what you will find with a general purpose language) ■ Both Python and SQL are code ■ For data analytics, you will benefit from knowing both Python and SQL Use SQL to gather your data if it is in a relational database, use Python for everything else When to use SQL or Python ■ Let’s assume your typical data analysis project has three phases: gather, analyze, and visualize data – Gather ■ Data exists in a relational database, e.g., Oracle, a web API, e.g., Twitter, and some spreadsheets your co-worker Sloppy Steve keeps ■ You’ll likely want to create one repository for all of that data ■ A good idea would be to use SQL to gather the records from Oracle and Python to get data via Twitter’s API ■ It’s probably a good idea to use Python to read in the Excel data as well ■ Let’s assume your data fits in memory – at this point you can manipulate the data in Python ■ If it doesn’t fit in memory, you can persist the data in relational database When to use SQL or Python ■ Let’s assume your typical data analysis project has three phases: gather, analyze, and visualize data – Analyze ■ Filter, aggregate, count, sum, average – both SQL and Python do this well. If your data is in a relational DB, SQL can be a great choice at this step ■ Join tabular data – SQL and Python (via Pandas) but this is an operation in SQL’s wheelhouse. Again, if you’re data is no longer in a relational DB, SQL won’t be an option ■ Statistics and Machine Learning – while you can do some statistics with SQL, you’ll want to use Python for these operations. ■ Text processing or anything else – Python all the way When to use SQL or Python ■ Let’s assume your typical data analysis project has three phases: gather, analyze, and visualize data – Visualize ■ For simple graphics, Excel can be a fine, albeit boring choice, should you have it installed – It does have the ability to connect to many relational databases ■ But let’s be real: you’ll want to go with Python (or JavaScript) for a visualization library ■ SQL doesn’t offer anything to help you graph data Python or SQL? ■ For gathering data, you need both ■ If none of your data is in a relational database, then you’ll be fine with just Python ■ Here’s the thing though: almost every business will touch a relational database in some way. ■ Tableau, MicroStrategy, Cognos, and Excel can be massively enhanced if you know just a little bit of SQL. More often than not the business people and the DBA (database administrators) are not on the same page ■ In terms of getting a job, start with SQL!