Download Visualize

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data model wikipedia , lookup

Database wikipedia , lookup

Information privacy law wikipedia , lookup

SAP IQ wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Clusterpoint wikipedia , lookup

Business intelligence wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Transcript
PYTHON AND SQL
Introduction to these complimentary tools
Gus Cavanaugh
Agenda
■ Brief overview of Data Analytics Office Hours Meetup
■ Quick Intro to SQL and Python
■ When to use SQL, Python, both, or neither
■ Switch over to Google Hangout for discussion
Data Analytics Office Hours
■ About me: Consultant trying to use data to solve business problems
■ Meetups have been super helpful for learning technical subjects. But most are in-depth
presentations on a topic. As such, I often quickly get lost.
■ When I gave meetup talks, the best questions were often asked by individuals after the
talk was over. Worse, they were usually foundational questions that meant the attendee
missed out on a lot of value
■ I want this meetup to be the place where those types of questions are asked.
■ Presentations will be short or non-existent – just enough to cover some basic ideas. After
that, it’s up to us.
■ To put it crudely, I want this to be the forum for “stupid” questions. If one person asks
one question they would be afraid to raise their hand and ask at a large meetup, this will
have been a success
My ask of you
■ Introduce yourself
■ Ask Questions…
– Through the webinar
– Through the meetup: http://www.meetup.com/DataAnalytics-Office-Hours/
– Via Email: [email protected]
– On Twitter: @GusCavanaugh
On to the material!
Python and SQL
■ Python: a general purpose programming language with popular libraries for data
science and analytics. A few non-exhaustive examples of popular libs:
– Pandas: data analysis
– Matplotlib/Seaborn/Bokeh: data visualization
– PySpark: manipulating data on clusters of computers
– Django/Flask: building web applications
■ SQL: “ESS-Queue-L” or “SEE-KWELL” a structured query language for manipulating
data in relational databases
– No libraries! (at least nothing comparable to what you will find with a general
purpose language)
■ Both Python and SQL are code
■ For data analytics, you will benefit from knowing both Python and SQL
Use SQL to gather your data if it is in a
relational database, use Python for
everything else
When to use SQL or Python
■ Let’s assume your typical data analysis project has three phases: gather, analyze,
and visualize data
– Gather
■
Data exists in a relational database, e.g., Oracle, a web API, e.g., Twitter, and some
spreadsheets your co-worker Sloppy Steve keeps
■
You’ll likely want to create one repository for all of that data
■
A good idea would be to use SQL to gather the records from Oracle and Python to
get data via Twitter’s API
■
It’s probably a good idea to use Python to read in the Excel data as well
■
Let’s assume your data fits in memory – at this point you can manipulate the data
in Python
■
If it doesn’t fit in memory, you can persist the data in relational database
When to use SQL or Python
■ Let’s assume your typical data analysis project has three phases: gather, analyze,
and visualize data
– Analyze
■
Filter, aggregate, count, sum, average – both SQL and Python do this well. If your
data is in a relational DB, SQL can be a great choice at this step
■
Join tabular data – SQL and Python (via Pandas) but this is an operation in SQL’s
wheelhouse. Again, if you’re data is no longer in a relational DB, SQL won’t be an
option
■
Statistics and Machine Learning – while you can do some statistics with SQL, you’ll
want to use Python for these operations.
■
Text processing or anything else – Python all the way
When to use SQL or Python
■ Let’s assume your typical data analysis project has three phases: gather, analyze,
and visualize data
– Visualize
■
For simple graphics, Excel can be a fine, albeit boring choice, should you have it
installed
–
It does have the ability to connect to many relational databases
■
But let’s be real: you’ll want to go with Python (or JavaScript) for a visualization
library
■
SQL doesn’t offer anything to help you graph data
Python or SQL?
■ For gathering data, you need both
■ If none of your data is in a relational database, then you’ll be fine with just Python
■ Here’s the thing though: almost every business will touch a relational database in
some way.
■ Tableau, MicroStrategy, Cognos, and Excel can be massively enhanced if you know
just a little bit of SQL. More often than not the business people and the DBA
(database administrators) are not on the same page
■ In terms of getting a job, start with SQL!