Download 1-PPT

Hands-on Data Science and OSS Driving Business Value with Open Source and Data Science Kevin Crocker, @_K_C_Pivotal #datascience, #oscon Data Science Education Lead Pivotal Software Inc. @pivotal E v e r y t h i n g o s c o n 2 0 1 4 VM info  Everything is ‘oscon2014’  User:password –> oscon2014:oscon2014  PostgreSQL 9.2.8 dbname -> oscon2014  Root password -> oscon2014  Installed software: postgresql 9.2.8, R, MADlib, pl/pythonu, pl/pgpsl, anaconda ( /home/oscon2014/anaconda), pgadmin3, Rstudio, pyMADlib, and more to come in v1.1 Objective of Data Science DRIVE AUTOMATED LOW LATENCY ACTIONS IN RESPONSE TO EVENTS OF INTEREST What Matters: Apps. Data. Analytics. Apps power businesses, and those apps generate data Analytic insights from that data drive new app functionality, which in-turn drives new data The faster you can move around that cycle, the faster you learn, innovate & pull away from the competition What Matters: OSS at the core Apps power businesses, and those apps generate data Analytic insights from that data drive new app functionality, which in-turn drives new data The faster you can move around that cycle, the faster you learn, innovate & pull away from the competition End Game: Drive Business Value with OSS interesting problems that can’t easily be solved with current technology Use (find) the right tool for the job - If they don’t exist, create them - Prefer OSS if it fits the need - Drive business value through distributed, MPP analytics - Operationalization (O16n) of your Analytics Create interesting solutions that drive business value PIVOTAL DATA SCIENCE TOOLKIT 1 Find Data Platforms • Pivotal Greenplum DB • Pivotal HD • Hadoop (other) • SAS HPA • AWS 2 3 Run Code Interfaces • pgAdminIII • psql • psycopg2 • Terminal • Cygwin • Putty • Winscp Write Code Editing Tools • Vi/Vim • Emacs • Smultron • TextWrangler • Eclipse • Notepad++ • IPython • Sublime Languages • SQL • Bash scripting • C • C++ • C# • Java • Python • R 4 Write Code for Big Data In-Database • SQL • PL/Python • PL/Java • PL/R • PL/pgSQL 5 Hadoop • HAWQ • Pig • Hive • Java 6 Visualization • python-matplotlib • python-networkx • D3.js • Tableau Implement Algorithms Libraries • MADlib Java • Mahout R • (Too many to list!) Text • OpenNLP • NLTK • GPText C++ • opencv Show Results Python • NumPy • SciPy • scikit-learn • Pandas Programs • Alpine Miner • Rstudio • MATLAB • SAS • Stata • GraphViz • Gephi • R (ggplot2, lattice, shiny) • Excel 7 Collaborate Sharing Tools • Chorus • Confluence • Socialcast • Github • Google Drive & Hangouts A large and varied tool box! Toolkit? This image was created by Swami Chandresekaran, Enterprise Architect, IBM. He has a great article about what it takes to be a Data Scientist: Road Map to Data Scientist http://nirvacana.com/tho ughts/becoming-a-datascientist/ We need the right technology for every step Open Source At Pivotal  Pivotal has a lot of open source projects (and people) involved in Open Source  PostgreSQL, Apache Hadoop (4)  MADlib (16), PivotalR (2), pyMADlib (4), Pandas via SQL (3),  Spring (56), Groovy (3), Grails (3)  Apache Tomcat (2) and HTTP Server (1)  Redis (1)  Rabbit MQ (4)  Cloud Foundry (90)  Open Chorus  We use a combination of our commercial software and OSS to drive business value through Data Science Motivation  Our story starts with SQL – so naturally we try to use SQL for everything! Everything?  SQL is great for many things, but it’s not nearly enough –Straightforward way to query data –Not necessarily designed for data science  Data Scientists know other languages – R, Python, … Our challenge  MADlib – Open source – Extremely powerful/scalable – Growing algorithm breadth – SQL  R / Python – Open source – Memory limited – High algorithm breadth – Language/interface purpose-designed for data science  Want to leverage both the performance benefits of MADlib and the usability of languages like R and Python How Pivotal Data Scientists Select Which Tool to Use Yes Build final set of models in MADlib, Chorus, or PivotalR/PyMADlib Prototype in R/Python or directly in MADlib, Chorus Is the algorithm of choice available in MADlib or Chorus? Optimized for algorithm performance, scalability, & code overhead No Do opportunities for explicit parallelization exist? Yes Leverage Procedural Language extensions in Pivotal (i.e. PL/R, PL/Python) No Connect to Pivotal via ODBC Pivotal, MADlib, R, and Python  Pivotal & MADlib & R Interoperability –PivotalR –PL/R  Pivotal & MADlib & Python Interoperability –pyMADlib –PL/Python MADlib  MAD stands for:  lib stands for library of: – advanced (mathematical, statistical, machine learning) – parallel & scalable – in-database functions  Mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development MADlib: A Community Project Open Source: BSD License • Developed as a partnership with multiple universities Community – University of California-Berkeley – University of Wisconsin-Madison – University of Florida • Compatibile with Postgres, Greenplum Database, and Hadoop via HAWQ • Designed for Data Scientists to provide Scalable, Robust Analytics capabilities for their business problems. • • • • Homepage: http://madlib.net Documentation: http://doc.madlib.net Source: https://github.com/madlib Forum: http://groups.google.com/group/madlib-user-forum MADlib: Architecture Generalized Linear Models Linear Systems Matrix Factorization Machine Learning Algorithms Core Methods Support Modules Random Sampling Sparse Vectors Data Type Mapping Exception Handling Logging and Reporting Probability Functions Array Operations Linear Algebra Memory Management Boost Support C++ Database Abstraction Layer User Defined Functions User Defined Types User Defined Operators Database Platform Layer User Defined Aggregates OLAP Window Functions OLAP Grouping Sets MADlib: Diverse User Experience SQL psql> madlib.linregr_train('abalone', Python from pymadlib.pymadlib import * 'abalone_linregr', conn = DBConnect() 'rings', mdl = LinearRegression(conn) 'array[1,diameter,height]'); lreg.train(input_table, indepvars, depvar) psql> select coef, r2 from abalone_linregr; -[ RECORD 1 ]---------------------------------------------- cursor = lreg.predict(input_table, depvar) scatterPlot(actual,predicted, dataset) coef | {2.39392531944631,11.7085575219689,19.8117069108094} r2 | 0.350379630701758 Open Chorus R MADlib In-Database Functions Descriptive Statistics Predictive Modeling Library Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Cox Proportional Hazards • Regression • Elastic Net Regularization • Sandwich Estimators (Huber white, clustered, marginal effects) Matrix Factoriization • Single Value Decomposition (SVD) • Low-Rank Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Affinity Analysis, Market Basket) • Topic Modeling (Parallel LDA) • Decision Trees • Ensemble Learners (Random Forests) • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation Linear Systems • Sparse and Dense Solvers Sketch-based Estimators • CountMin (CormodeMuthukrishnan) • FM (Flajolet-Martin) • MFV (Most Frequent Values) Correlation Summary Support Modules Array Operations Sparse Vectors Random Sampling Probability Functions Calling MADlib Functions: Fast Training, Scoring  MADlib allows users to easily and create models without moving data out of the systems – Model generation – Model validation – Scoring (evaluation of) new data  All the data can be used in one model  Built-in functionality to create of multiple smaller models (e.g. regression/classification grouped by feature)  Open-source lets you tweak and extend methods, or build your own MADlib model function Table containing training data SELECT madlib.linregr_train( 'houses’, 'houses_linregr’, 'price’, 'ARRAY[1, tax, bath, size]’, ‘bedroom’); Features included in the model Create multiple output models (one for each value of bedroom) Table in which to save results Column containing dependent variable Calling MADlib Functions: Fast Training, Scoring  MADlib allows users to easily and create models without moving data out of the systems – Model generation – Model validation – Scoring (evaluation of) new data  All the data can be used in one model  Built-in functionality to create of multiple smaller models (e.g. regression/classification grouped by feature)  Open-source lets you tweak and extend methods, or build your own SELECT madlib.linregr_train( 'houses’, 'houses_linregr’, 'price’, 'ARRAY[1, tax, bath, size]’); MADlib model scoring function SELECT houses.*, madlib.linregr_predict(ARRAY[1,tax,bath,size], m.coef )as predict FROM houses, houses_linregr m; Table with data to be scored Table containing model K-Means Clustering Clustering refers to the problem of partitioning a set of objects according to some problem-dependent measure of similarity. In the k-means variant, given n points x1,…,xn∈ℝd, the goal is to position k centroids c1,…,ck∈ℝd so that the sum of distances between each point and its closest centroid is minimized. Each centroid represents a cluster that consists of all points to which this centroid is closest. So, we are trying to find the centroids which minimize the total distance between all the points and the centroids. K-means Clustering Example Use Cases: Which Blogs are Spam Blogs? Given a user’s preferences, which other blog might she/he enjoy? What are our customers saying about us? What are our customers saying about us?  Discern trends and categories on-line conversations? - Search for relevant blogs - ‘Fingerprinting’ based on word frequencies - Similarity Measure - Identify ‘clusters’ of documents What are our customers saying about us? Method • Construct document histograms • Transform histograms into document “fingerprints” • Use clustering techniques to discover similar documents. What are our customers saying about us? Constructing document histograms  Parsing & extract html files  Using natural language processing for tokenization and stemming  Cleansing inconsistencies  Transforming unstructured data into structured data What are our customers saying about us? “Fingerprinting” - Term frequency of words within a document vs. frequency that those words occur in all documents - Term frequency-inverse document frequency (tfidf weight) - Easily calculated based on formulas over the document histograms. - The result is a vector in n-dim. Euclidean space. K-Means Clustering – Training Function The k-means algorithm can be invoked in four ways, depending on the source of the initial set of centroids: 1. Use the random centroid seeding method. 2. Use the kmeans++ centroid seeding method. 3. Supply an initial centroid set in a relation identified by the rel_initial_centroids argument. 4. Provide an initial centroid set as an array expression in the initial_centroids argument. Random Centroid seeding method kmeans_random( rel_source, expr_point, k, fn_dist, agg_centroid, max_num_iterations, min_frac_reassigned ) Kmeans++ centroid seeding method kmeanspp( rel_source, expr_point, k, fn_dist, agg_centroid, max_num_iterations, min_frac_reassigned ) Initial Centroid set in a relation kmeans( rel_source, expr_point, rel_initial_centroids, -- this is the relation expr_centroid, fn_dist, agg_centroid, max_num_iterations, min_frac_reassigned ) Initial centroid as an array kmeans( rel_source, expr_point, initial_centroids, -- this is the array fn_dist, agg_centroid, max_num_iterations, min_frac_reassigned ) K-Means Clustering – Cluster Assignment 1. After training, the cluster assignment for each data point can be computed with the help of the following function: closest_column( m, x ) Assessing the quality of the clustering A popular method to assess the quality of the clustering is the silhouette coefficient, a simplified version of which is provided as part of the k-means module. Note that for large data sets, this computation is expensive. The silhouette function has the following syntax: simple_silhouette( rel_source, expr_point, centroids, fn_dist ) What are our customers saying about us? ANIMATED? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us? What are our customers saying about us?  innovation  leader  design •bug •installation •speed •graphics •improvement •download Pivotal, MADlib, R, and Python  Pivotal & MADlib & R Interoperability –PivotalR –PL/R  Pivotal & MADlib & Python Interoperability –pyMADlib –PL/Python Pivotal & R Interoperability  In a traditional analytics workflow using R: –Datasets are transferred from a data source –Modeled or visualized –Model scoring results are pushed back to the data source  Such an approach works well when: –The amount of data can be loaded into memory, and –The transfer of large amounts of data is inexpensive and/or fast  PivotalR explores the situation involving large data sets where these two assumptions are violated and you have an R background Enter PivotalR  Challenge Want to harness the familiarity of R’s interface and the performance & scalability benefits of in-DB analytics  Simple solution: Translate R code into SQL PivotalR Design Overview RPostgreSQL PivotalR 2. SQL to execute 1. R  SQL 3. Computation results No data here Database/Hadoop w/ MADlib Data lives here PivotalR Design Overview  Call MADlib’s in-database machine learning functions directly from R  Syntax is analogous to native R functions – for example, madlib.lm() mimics the syntax of the native lm() function  Data does not need to leave the database  All heavy lifting, including model estimation & computation, is done in the database PivotalR Design Overview  Manipulate database tables directly from R without needing to be familiar with SQL  Perform the equivalent of SQL’s ‘select’ statements (including joins) in a syntax that is similar to R’s data.frame operations  For example: R’s ‘merge’  SQL’s ‘join’ PivotalR: Current Functionality And more ... (SQL wrapper) •+ - * / %% ^ MADlib Functionality • • • • • • • • Linear Regression Logistic Regression Elastic Net ARIMA Marginal Effects Cross Validation Bagging summary on model objects • Automated Indicator Variable Coding as.factor • predict %/% • $ [ [[ $<- [<- [[<• == != > < >= <= • & | ! • merge • is.na • by • sort • preview • db.data.frame • content • as.db.data.frame • c mean sum sd var min max length colMeans colSums • • dim names http://github.com/gopivotal/PivotalR/ db.connect db.disconnect db.list db.objects db.existsObject delete PivotalR Example  Load the PivotalR package – > library('PivotalR')  Get help for a function – > help(db.connect)  Connect to a database – > db.connect(host = “dca.abc.com”, user = “student01”, dbname = “studentdb”, password = ”studentpw", port = 5432, madlib = "madlib", conn.pkg = "RPostgreSQL", default.schemas = NULL)  List connections – > db.list() PivotalR Example  Connect to a table via db.data.frame function (note that the data remains in the database and is not loaded into memory) – > y <- db.data.frame(“test.abalone”, conn.id = 1, key = character(0), verbose = TRUE, is.temp = FALSE)  Fit a linear regression model (one model for each gender) and display it – > fit <- madlib.lm(rings ~ . - id | sex, data = y) – > fit # view the result  Apply the model to data in another table (i.e. x) and compute mean-square-error – lookat(mean((x$rings - predict(fit, x))^2)) PivotalR PivotalR  PivotalR is an R package you can download from CRAN. - http://cran.r-project.org/web/packages/PivotalR/index.html - Using Rstudio, you can install it with: install.packages("PivotalR")  GitHub has the latest, greatest code and features but is less stable. - https://github.com/gopivotal/PivotalR  R front end to PostgreSQL and all PostgreSQL databases.  R wrapper around MADlib, the open source library for in-database scalable analytics  Mimics regular R syntax for manipulating R’s “data.frame”  Provides R functionality to Big Data stored in-database or Apache Hadoop.  Demo code: https://github.com/gopivotal/PivotalR/wiki/Example  Training Video: https://docs.google.com/file/d/0B9bfZ-YiuzxQc1RWTEJJZ2V1TWc/edit Pivotal, MADlib, R, and Python  Pivotal & MADlib & R Interoperability –PivotalR –PL/R  Pivotal & MADlib & Python Interoperability –pyMADlib –PL/Python PL/R on Pivotal  Procedural Language (PL/X) – X includes R, Python, pgSQL, Java, Perl, C etc – need to be installed on each database  PL/R enables you to write PostgreSQL and DB functions in the R language  R installed on each segment of the Pivotal cluster  Parsimonious – R piggy-backs on Pivotal’s parallel architecture  Minimize data movement SQL & R PL/R on Pivotal  Allows most of R’s capabilities. Basic guide: “PostgreSQL Functions by Example” http://www.joeconway.com/presentations/function_basics.pdf  In PostgreSQL and GPDB/PHD: Check which PL languages are installed in database: select * from pg_language; lanname | lanispl | lanpltrusted | lanplcallfoid | lanvalidator | lanacl -----------+---------+--------------+---------------+--------------+-------internal | f | f | 0 | 2246 | c | f | f | 0 | 2247 | sql | f | t | 0 | 2248 | plpgsql | t | t | 10885 | 10886 | plpythonu | t | f | 16386 | 0 | plr | t | f | 18975 | 0 | (6 rows)  PL/R is an “untrusted” language – only database superusers have the ability to create UDFs with PL/R (see “lanpltrusted” column in pg_language table) PL/R Example  Consider the census dataset below (each row represents an individual): – h_state = integer encoding which state they live in – earns = their income – hours = how many hours per week they work – … and other features  Suppose we want to build a model of income for each state separately SQL Models PL/R Example  Prepare table for PL/R by converting it into array form -- Create array version of table DROP TABLE IF EXISTS use_r.census1_array_state; CREATE TABLE use_r.census1_array_state AS( SELECT h_state::text h_state, array_agg(h_serialno::float8) h_serialno, array_agg(earns::float8) earns, array_agg(hours::float8) hours, array_agg((earns/hours)::float8) wage, array_agg(hsdeg::float8) hsdeg, array_agg(somecol::float8) somecol, array_agg(associate::float8) associate, array_agg(bachelor::float8) bachelor, array_agg(masters::float8) masters, array_agg(professional::float8) professional, array_agg(doctorate::float8) doctorate, array_agg(female::float8) female, array_agg(rooms::float8) rooms, array_agg(bedrms::float8) bedrms, array_agg(notcitizen::float8) notcitizen, array_agg(rentshouse::float8) rentshouse, array_agg(married::float8) married FROM use_r.census1 GROUP BY h_state ) DISTRIBUTED BY (h_state); PL/R Example SQL & R TN Data CA Data NY Data PA Data TX Data CT Data NJ Data IL Data MA Data WA Data TN Model CA Model NY Model PA Model TX Model CT Model NJ Model IL Model MA Model WA Model PL/R Example  Run linear regression to predict income in each state – Define output data type – Create PL/R function SQL Wrapper Body of the function in R SQL Wrapper Define output type PL/R function PL/R Example  Execute PL/R function PL/R  PL/R is not installed – I have to download the source and compile it  Instructions can be found here  http://www.joeconway.com/plr/doc/plr-install.html  CHALLENGE: Download PL/R, compile it, and install it in PostgreSQL Pivotal, MADlib, R, and Python  Pivotal & MADlib & R Interoperability –PivotalR –PL/R  Pivotal & MADlib & Python Interoperability –pyMADlib –PL/Python Pivotal & Python Interoperability  In a traditional analytics workflow using Python: –Datasets are transferred from a data source –Modeled or visualized –Model scoring results are pushed back to the data source  Such an approach works well when: –The amount of data can be loaded into memory, and –The transfer of large amounts of data is inexpensive and/or fast  pyMADlib explores the situation involving large data sets where these two assumptions are violated and you have a Python background Enter pyMADlib  Challenge Want to harness the familiarity of Python’s interface and the performance & scalability benefits of in-DB analytics  Simple solution: Translate Python code into SQL pyMADlib Design Overview ODBC/JDBC 2. SQL to execute 1. Python  SQL 3. Computation results No data here Database/Hadoop w/ MADlib Data lives here Simple solution: Translate Python code ODBC/ into SQL JDBC Python  SQL SQL to execute MADlib Model output  All data stays in DB and all model estimation and heavy lifting done in DB by MADlib  Only strings of SQL and model output transferred across ODBC/JDBC  Best of both worlds: number crunching power of MADlib along with rich set of visualizations of Matplotlib, NetworkX and all your other favorite Python libraries. Let MADlib do all the heavy-lifting on your Greenplum/PostGreSQL database, while you program in your favorite language – Python. Hands-on Exploration PyMADlib Tutorial – IPython Notebook Viewer Link http://nbviewer.ipython.org/5275846 Where do I get it ? $pip install pymadlib Pivotal, MADlib, R, and Python  Pivotal & MADlib & R Interoperability –PivotalR –PL/R  Pivotal & MADlib & Python Interoperability –pyMADlib –PL/Python PL/Python on Pivotal  Syntax is like normal Python function with function definition line replaced by SQL wrapper  Alternatively like a SQL User Defined Function with Python inside  Name in SQL is plpythonu – ‘u’ means untrusted so need to be superuser to create functions SQL wrapper Normal Python SQL wrapper CREATE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if a > b: return a return b $$ LANGUAGE plpythonu; Returning Results  Postgres primitive types (int, bigint, text, float8, double precision, date, NULL etc.)  Composite types can be returned by creating a composite type in the database: CREATE TYPE named_value AS ( name text, value integer );  Then you can return a list, tuple or dict (not sets) which reference the same structure as the table: CREATE FUNCTION make_pair (name text, value integer) RETURNS named_value AS $$ return [ name, value ] # or alternatively, as tuple: return ( name, value ) # or as dict: return { "name": name, "value": value } # or as an object with attributes .name and .value $$ LANGUAGE plpythonu;  For functions which return multiple rows, prefix “setof” before the return type Returning more results You can return multiple results by wrapping them in a sequence (tuple, list or set), an iterator or a generator: Sequence Generator CREATE FUNCTION make_pair (name text) RETURNS SETOF named_value AS $$ return ([ name, 1 ], [ name, 2 ], [ name, 3]) $$ LANGUAGE plpythonu; CREATE FUNCTION make_pair (name text) RETURNS SETOF named_value AS $$ for i in range(3): yield (name, i) $$ LANGUAGE plpythonu; Accessing Packages  In an MPP environment: To be available, packages must be installed on every individual segment node. – Can use “parallel ssh” tool gpssh to conda/pip install  Then just import as usual inside function: CREATE FUNCTION make_pair (name text) RETURNS named_value AS $$ import numpy as np return ((name,i) for i in np.arange(3)) $$ LANGUAGE plpythonu; Benefits of PL/Python  Easy to bring your code to the data  When SQL falls short leverage your Python (or R/Java/C) experience quickly  Apply Python across terabytes of data with minimal overhead or additional requirements  Results are already in the database system, ready for further analysis or storage Spring What it is: Application Framework introduced as open source in 2003 Intention: Build enterprise-class Java applications more easily. Outcomes: 1. Streamlined architecture, speeding application development by 2x and accelerating Time to Value. 2. Portable, since Spring applications are identical for every platform. Portable across multiple app servers Spring Ecosystem http://spring.io XD BOOT GRAILS Stream, Taps, Jobs Bootable, Minimal, Ops-Ready Full-stack, Web INTEGRATION BATCH BIG DATA WEB Channels, Adapters, Filters, Transformers Jobs, Steps, Readers, Writers Ingestion, Export, Orchestration, Hadoop Controllers, REST, WebSocket DATA RELATIONAL NON-RELATIONAL CORE FRAMEWORK http://projects.spring.io/spring-xd/ SECURITY GROOVY REACTOR http://projects.spring.io/spring-data/ Spring XD - Tackling Big Data Complexity Files  One stop shop for Analytics Redis Gemﬁre ... –Spring Integration, Batch, Data  XD = 'eXtreme Data' Social Workﬂow Export Jobs  Built on existing Spring Assets Mobile Big Data Programming Model Ingest –Data Ingestion –Real-time Analytics –Workflow Orchestration –Data Export Sensor RDBMS Compute HDFS OLAP ... Groovy and Grails  Dynamic Language for the JVM  Inspired by Smalltalk, Python, and Ruby  Integrated with the Java language & platform at every level “Cloud”  Means many things to many people.  Distributed applications accessible over a network  Typically, but not necessarily, The Internet  An application and/or it's platform  Resources on demand  Inherently virtualized  Can run in-house (private cloud) as well  Hardware and/or software sold as a commodity Pivotal Speakers at OSCON 2014 10:40am Tuesday, Global Scaling at the New York Times using RabbitMQ, F150 Alvaro Videla (RabbitMQ), Michael Laing (New York Times) Cloud 11:30am Tuesday, The Full Stack Java Developer, D136 Joshua Long (Pivotal), Phil Webb (Pivotal) Java & JVM | JavaScript - HTML5 - Web 1:40pm Tuesday, A Recovering Java Developer Learns to Go, E142 Matt Stine (Pivotal) Emerging Languages | Java & JVM Pivotal Speakers at OSCON 2014 2:30pm Tuesday, Unicorns, Dragons, Open Source Business Models And Other Mythical Creatures, PORTLAND BALLROOM, Main Stage Andrew Clay Shafer (Pivotal) 11:30am Wednesday, Building a Recommendation Engine with Spring and Hadoop, D136 Michael Minella (Pivotal) Java & JVM 1:40pm Wednesday, Apache Spark: A Killer or Savior of Apache Hadoop?, E143 Roman Shaposhnik (Pivotal) Sponsored Sessions Pivotal Speakers at OSCON 2014 10:00am Thursday, Developing Micro-services with Java and Spring, D139/140 Phil Webb (Pivotal) Java & JVM | Tools & Techniques 11:00am Thursday, Apache HTTP Server; SSL from End-to-End, D136 William A Rowe Jr (Pivotal) Security Data Science At Pivotal  Drive business value by operationalizing Data Science models using a combination of our commercial software (based on open source) and open source software.  Open Source is at the core of what we do Thank You! Kevin Crocker [email protected] @_K_C_Pivotal Data Science Education Lead BUILT FOR THE SPEED OF BUSINESS

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1-PPT