* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download File
Survey
Document related concepts
Transcript
StubHub & Oracle Advanced Analytics Brian Motzer, Senior Database Administrator StubHub Brian Motzer, [email protected] Senior Database Administrator, StubHub 21 years of Oracle DBA experience specializing in Performance Tuning and Architecture ( DigitalRiver.com, BestBuy HQ ) 2 StubHub, an ebay company The world's largest fan-to-fan ticket marketplace StubHub in numbers 38 Million Registered Users 8.9 Million Monthly Unique Visitors 216 Million Monthly Page Views 2.5 Million Tickets Sold in 2012 $2.57 Billion Ticketing Sales in RIP 2012-2015 2012 $3k-335K 3 Before Oracle Analytics we had this… 4 We needed this…POC proved this. R 5 Attempted fraud reduction of 95%!! 6 What do Kim K. and servers have in common? 7 15M txns sampled ! = Data Warehouse Specs Hardware and OS Sun SPARC T5-2 1000G of RAM 256 processors Oracle Software and Tools 8 Oracle Database Enterprise Edition11gR203 Partitioning Oracle Auto DOP Oracle Active Data Guard 11gR203 Flash Back Database ADG Snapshot Oracle R Enterprise Informatica DEFAULT = 2xcpu*threads Data Warehouse stats/size Informatica is ETL being used. Total DW Database Size 20TB with 8 years of history 700 ETL jobs as part of the daily batch load, completes in 4-5 Hours ( was 15-18 per auto DOP ) 160 MM Records / 40GB of raw data from 25+ data sources loaded daily 300 plus targets (Dimensions + Facts + Aggregates + Snapshots) 225 BO Users and 300+ AD-Hoc Users 9 Oracle “R” statisitics Model is executed ~2x/second. 1.25 seconds per execution ( needs enhancing on the clob sys.rq$script read ) Model is about 500K in size with ~3000 lines of code. Stored procedure has 1500 lines of code 75 external connections started/waiting Initial extproc spawn is 15-20 seconds, prespawn to fix this ( Mark Hornick )? 10 Stubhub R Architecture JAVA ECOM 11 public oracle Model A procedure ( A) 90% (prod) synonym 30 tables GTT population oracle Model B procedure ( B) 10% (beta) Birth of an R model ecommerce RAC cluster model imported into ecom Informatica Daily ETL DW primary Active Data Guard DW standby model exported Oracle R database model trained 12 DB link Homegrown scripts drop_r_objects.sh – unix command line interface for dropping models. – Makes dropping of “live” models/functions impossible as it checks to see what’s “live” and will bail out. export_import_data_model.sh – command line export/import of models/functions to/from a database. – This allows any dba to step in and export a model/function from the data warehouse, migrate it the ecommerce database and import it. Checks for issues on import/export. 13 Homegrown scripts 14 Calling Model from SP SELECT * FROM TABLE (rqEval ( CURSOR ( SELECT * FROM GTT_SCORE_MODEL_ATTRIB ), q '[SELECT 1 in_TRANSACTION_ID, 1 score, cast(' ab ' as varchar2(50)) MODEL_VERSION FROM sys.dual]','FARE_RFUN_ModelScoring_V3')) 15 Issues along the way…. 16 Issues encountered Unable to create synonyms on models, ended up going with stored procedure. Use GTT to pass 55+ variables into the model ( working on fix to not use rqEval ) The function call seems to be a select of sys.rq$script ( clob field ) for each and every call which is very expensive, need to make that a call to a binary object to speed it up. LD_LIBRARY_PATH, the oracle listener and RAC. If this isn’t correct then R cant find the CRAN packages that are installed and fails. The srvctl LD_LIBRARY_PATH setting is typically different than unix level environment. – – – 17 srvctl config fix $ORACLE_HOME/hs/admin/extproc.ora hardcode of LD_LIBRARY_PATH Bounce listener from the cmd line Issues encountered For hundreds of development databases/machines it can be overwhelming to ensure that all the CRAN packages are installed at the OS level/Oracle home. Can they all be stored inside of the database instead of at the OS level? Sun Solaris Studio had to be installed on development database machines to allow compilation of outside CRAN packages. Packages ( cran etc. ) aren’t always available on your platform ( Solaris for example ). We’ve had Oracle support compile and distribute these to back to us. Need more documentation on debugging/tracing model calls. Are there any examples out there, was talked about by support but never provided. We can trace Oracle sql calls but the model execution processing is a block box. 18 Issues encountered ( cont’d ) So far no GUI tools for manipulating/modify models/functions. Difficult for right-mouse-clicky DBA’s to drop/create models easily. Anything Coming for SQL developer/TOAD? ensure cursor_sharing is set to EXACT for R to run, otherwise you'll see errors in ore.connect such as "ORA-00904 invalid identifier“. What about db’s that need cursor_sharing = FORCE for lack of bind variables? 19 Why Oracle Advanced Analytics? Highly Scalable and Performant – Fully utilize the existing data warehouse infrastructure, dedicated Primary database for ETL (real + batch ) load and offload Operation/Ad-hoc reports , ORE workload to Standby database - ADG to fully meet our daily DW operation SLAs, data scientists data analyze as well as Stubhub business growth needs. High Availability, Enterprise & DR Ready – With Oracle Active Data Guard , the current DW and ORE ready for the planned and unplanned downtime and disaster recovery Enabling Big Data Analysis – Due to resource ( CPU + Memory + Network ) its now possible for Stubhub to use Oracle R to perform data analysis against large data volumes. Security Control , Access Control – Take full advantage of Oracle database advance security future for big data analysts 20 Why Oracle Advanced Analytics? Availability of tight integration of open source R with Oracle database DW capacity allows us to build several such concurrent models simultaneously allowing us to fine tune our models to precision ORE allows open source R packages to be readily deployed and leveraged on database data ORE allows SQL access to the models 21 22