Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

PL/SQL wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Oracle Database wikipedia , lookup

Transcript
StubHub & Oracle
Advanced Analytics
Brian Motzer, Senior Database Administrator
StubHub
Brian Motzer, [email protected]
 Senior Database Administrator, StubHub
 21 years of Oracle DBA experience specializing in Performance Tuning and
Architecture ( DigitalRiver.com, BestBuy HQ )
2
StubHub, an ebay company
The world's largest fan-to-fan ticket marketplace
StubHub in numbers
 38 Million Registered Users
 8.9 Million Monthly Unique
Visitors
 216 Million Monthly Page Views
 2.5 Million Tickets Sold in 2012
 $2.57 Billion Ticketing Sales in
RIP 2012-2015
2012
$3k-335K
3
Before Oracle Analytics we had this…
4
We needed this…POC proved this.
R
5
Attempted fraud reduction of 95%!!
6
What do Kim K. and servers have in common?
7
15M txns sampled !
=
Data Warehouse Specs
Hardware and OS
 Sun SPARC T5-2
 1000G of RAM
 256 processors
Oracle Software and Tools








8
Oracle Database Enterprise Edition11gR203
Partitioning
Oracle Auto DOP
Oracle Active Data Guard 11gR203
Flash Back Database
ADG Snapshot
Oracle R Enterprise
Informatica
DEFAULT = 2xcpu*threads
Data Warehouse stats/size
 Informatica is ETL being used.
 Total DW Database Size 20TB with 8 years of history
 700 ETL jobs as part of the daily batch load, completes in 4-5 Hours ( was 15-18 per auto DOP )
 160 MM Records / 40GB of raw data from 25+ data sources loaded daily
 300 plus targets (Dimensions + Facts + Aggregates + Snapshots)
 225 BO Users and 300+ AD-Hoc Users
9
Oracle “R” statisitics
 Model is executed ~2x/second.
 1.25 seconds per execution ( needs enhancing on the clob sys.rq$script read )
 Model is about 500K in size with ~3000 lines of code.
 Stored procedure has 1500 lines of code
 75 external connections started/waiting
 Initial extproc spawn is 15-20 seconds, prespawn to fix this ( Mark Hornick )?
10
Stubhub R Architecture
JAVA
ECOM
11
public
oracle
Model A
procedure
( A)
90%
(prod)
synonym
30 tables
GTT
population
oracle
Model B
procedure
( B)
10%
(beta)
Birth of an R model
ecommerce
RAC cluster
model
imported into
ecom
Informatica Daily ETL
DW
primary
Active Data Guard
DW
standby
model
exported
Oracle R
database
model trained
12
DB link
Homegrown scripts
 drop_r_objects.sh
– unix command line interface for dropping models.
– Makes dropping of “live” models/functions impossible as it checks to see what’s “live” and will bail out.
 export_import_data_model.sh
– command line export/import of models/functions to/from a database.
– This allows any dba to step in and export a model/function from the data warehouse, migrate it the ecommerce
database and import it. Checks for issues on import/export.
13
Homegrown scripts
14
Calling Model from SP
SELECT * FROM
TABLE (rqEval ( CURSOR (
SELECT * FROM GTT_SCORE_MODEL_ATTRIB ),
q '[SELECT 1 in_TRANSACTION_ID, 1 score,
cast(' ab ' as varchar2(50)) MODEL_VERSION
FROM sys.dual]','FARE_RFUN_ModelScoring_V3'))
15
Issues along the way….
16
Issues encountered
 Unable to create synonyms on models, ended up going with stored procedure.
 Use GTT to pass 55+ variables into the model ( working on fix to not use rqEval )
 The function call seems to be a select of sys.rq$script ( clob field ) for each and every call which is
very expensive, need to make that a call to a binary object to speed it up.
 LD_LIBRARY_PATH, the oracle listener and RAC. If this isn’t correct then R cant find the CRAN
packages that are installed and fails. The srvctl LD_LIBRARY_PATH setting is typically different than
unix level environment.
–
–
–
17
srvctl config fix
$ORACLE_HOME/hs/admin/extproc.ora hardcode of LD_LIBRARY_PATH
Bounce listener from the cmd line
Issues encountered
 For hundreds of development databases/machines it can be overwhelming to ensure that all the
CRAN packages are installed at the OS level/Oracle home. Can they all be stored inside of the
database instead of at the OS level?
 Sun Solaris Studio had to be installed on development database machines to allow compilation of
outside CRAN packages.
 Packages ( cran etc. ) aren’t always available on your platform ( Solaris for example ). We’ve had
Oracle support compile and distribute these to back to us.
 Need more documentation on debugging/tracing model calls. Are there any examples out there, was
talked about by support but never provided. We can trace Oracle sql calls but the model execution
processing is a block box.
18
Issues encountered ( cont’d )
 So far no GUI tools for manipulating/modify models/functions. Difficult for right-mouse-clicky DBA’s to
drop/create models easily. Anything Coming for SQL developer/TOAD?
 ensure cursor_sharing is set to EXACT for R to run, otherwise you'll see errors in ore.connect such as
"ORA-00904 invalid identifier“. What about db’s that need cursor_sharing = FORCE for lack of bind
variables?
19
Why Oracle Advanced Analytics?
 Highly Scalable and Performant
– Fully utilize the existing data warehouse infrastructure, dedicated Primary database for ETL (real + batch ) load
and offload Operation/Ad-hoc reports , ORE workload to Standby database - ADG to fully meet our daily DW
operation SLAs, data scientists data analyze as well as Stubhub business growth needs.
 High Availability, Enterprise & DR Ready
– With Oracle Active Data Guard , the current DW and ORE ready for the planned and unplanned downtime and
disaster recovery
 Enabling Big Data Analysis
– Due to resource ( CPU + Memory + Network ) its now possible for Stubhub to use Oracle R to perform data
analysis against large data volumes.
 Security Control , Access Control
– Take full advantage of Oracle database advance security future for big data analysts
20
Why Oracle Advanced Analytics?
 Availability of tight integration of open source R with Oracle database
 DW capacity allows us to build several such concurrent models simultaneously allowing
us to fine tune our models to precision
 ORE allows open source R packages to be readily deployed and leveraged on
database data
 ORE allows SQL access to the models
21
22