Download Hive - DWH Community

DATA WAREHOUSE Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE Themen  Big Data Buzz Word oder eine neue Dimension und Möglichkeiten  Oracles Technologie zu Speichern von unstrukturierten und teilstrukturierten Massendaten  Cloudera Framwork  „Connectors“ in die neue Welt Oracle Loader for Hadoop und HDFS  Big Data Appliance  Mit Oracle R Enterprise neue Analyse-Horizonte entdecken  Big Data Analysen mit Endeca Hive • Hive is an abstraction on top of MapReduce • Allows users to query data in the Hadoop cluster without knowing Java or MapReduce • Uses the HiveQL language • Very similar to SQL • The Hive Interpreter runs on a client machine • Turns HiveQL queries into MapReduce jobs • Submits those jobs to the cluster • Note: this does not turn the cluster into a relational database server! • It is still simply running MapReduce jobs • Those jobs are created by the Hive Interpreter Hive (cont’d) • Sample Hive query: SELECT stock.product, SUM(orders.purchases) FROM stock INNER JOIN orders ON (stock.id = orders.stock_id) WHERE orders.quarter = 'Q1' GROUP BY stock.product; Pig • Pig is an alternative abstraction on top of MapReduce • Uses a dataflow scripting language • Called PigLatin • The Pig interpreter runs on the client machine • Takes the PigLatin script and turns it into a series of MapReduce jobs • Submits those jobs to the cluster • As with Hive, nothing ‘magical’ happens on the cluster • It is still simply running MapReduce jobs Pig (cont’d) • Sample Pig script: stock = LOAD '/user/fred/stock' AS (id, item); orders= LOAD '/user/fred/orders' AS (id, cost); grpd = GROUP orders BY id; totals = FOREACH grpd GENERATE group, SUM(orders.cost) AS t; result = JOIN stock BY id, totals BY group; DUMP result; Flume and Sqoop • Flume provides a method to import data into HDFS as it is generated • Rather than batch-processing the data later • For example, log files from a Web server • Sqoop provides a method to import data from tables in a relational database into HDFS - HIVE • Does this very efficiently via a Map-only MapReduce job • Can also ‘go the other way’ • Populate database tables from files in HDFS Oozie • Oozie allows developers to create a workflow of MapReduce jobs • Including dependencies between jobs • The Oozie server submits the jobs to the server in the correct sequence HBase • HBase is ‘the Hadoop database’ • A ‘NoSQL’ datastore • Can store massive amounts of data • Gigabytes, terabytes, and even petabytes of data in a table • Scales to provide very high write throughput • Hundreds of thousands of inserts per second • Copes well with sparse data • Tables can have many thousands of columns • Even if most columns are empty for any given row • Has a very constrained access model • Insert a row, retrieve a row, do a full or partial table scan • Only one column (the ‘row key’) is indexed HBase vs Traditional RDBMSs RDBMS HBase Data layout Row-oriented Column-oriented Transactions Yes Single row only Query language SQL get/put/scan Security Authentication/Authorizati on TBD Indexes On arbitrary columns Row-key only Max data size TBs PB+ Read/write throughput 1000s queries/second limits Millions of queries/second Kontakt und mehr Informationen Oracle Data Warehouse Community Mitglied werden Viele kostenlose Seminare und Events Download – Server: www.ORACLEdwh.de Nächste deutschsprachige Oracle DWH Konferenz: 19. + 20. März 2013 Kassel

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hive - DWH Community