* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1 - DBMS 2
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Web analytics wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Relational model wikipedia , lookup
Business intelligence wikipedia , lookup
Open Database Connectivity wikipedia , lookup
WHAT THE MARKET-LEADING DBMS VENDORS DON’T WANT YOU TO KNOW Disruption is gathering steam Curt Monash Analyst since 1981 Covered DBMS since the pre-relational days Also analytics, search, etc. Own firm since 1987 Publicly available research Blogs, including DBMS2 (www.dbms2.com -- the source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com Database diversity Mike Stonebraker, PhD Curt Monash, PhD “One size doesn’t fit all” “Horses for courses” “Database diversity” Mike and Curt The world needs 9 to 11 different kinds of data management software The case for grand integrated DBMS Theoretical relational model has great advantages Actual relational DBMS are versatile and modular Software developers have economies of scale Vendor consolidation theoretically saves effort and money So does database consolidation The case for database diversity Different kinds of data require fundamentally different kinds of data management software Putting all that together in one system is extremely hard Nobody has ever done it well Application and use cases High-end e-commerce 100-terabyte analytics High-volume call center Media-heavy web startup Simple departmental application General enterprise or SaaS app End-user or ISV Data management distinctions Fundamental Data manipulation language Data access method Practical Type of data Type of hardware Administrative burden Performance stresses and metrics Very practical $ Major components of DBMS cost License and maintenance Hardware, power, facilities Mainly for VLDB analytics Installation and ongoing administration Especially maintenance Time-to-benefit is a factor too Programming Sometimes a differentiator 11 kinds of data management software 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. High-end OLTP/general-purpose DBMS Mid-range OLTP/general-purpose DBMS Row-based analytic RDBMS Column- or array-based analytic RDBMS Text search engines XML and OO DBMS (but these may merge with search) RDF and other graphical DBMS (but these may merge with relational) Event/stream processing engines (aka CEP) Embedded DBMS for devices Sub-DBMS file managers (e.g. MapReduce/Hadoop) Science DBMS High-end OLTP/general-purpose DBMS Oracle, DB2, MS SQL Server, et al. Amazing throughput and scale-up Bullet-proofing 24/7 Security certifications Datatype extensibility Expensive, expensive, expensive Mid-range OLTP/general-purpose DBMS Three main groups Some are comparable to (or better than) the systems that ran the world in the 1990s Crippled high-end (“Express” editions) ISV/VAR-focused (Progress, several nonrelational) Open source-based (Postgres, MySQL) What does the Postgres family still lack? Generally inexpensive Row-based analytic RDBMS Data warehouses should be in separate instances But that’s not enough Sequential vs. random reads MPP vs. SMP Teradata, Netezza, DATAllegro Column- or array-based analytic RDBMS Retrieving whole rows carries penalties Columnar is better I/O Optimization But not in all use cases MOLAP may be superceded Text search engines “85% of all information is in text” … There really are a lot of words out there And search interfaces are hugely important Text search has its own data access methods … and 16.9% of all statistics are made up out of thin air May play more nicely with columnar than row-based RDBMS Watch integrations with other analytic datatypes Attivio (relational, a little XML) Mark Logic (a lot of XML) XML and OO DBMS Reasons for logical XML structures Native XML data access methods Schema flexibility Dressed-up text XML is the transport format, and it’s too complex to unpack The data came from neither an RDMS nor text store in the first place Like text and object So far mainly in niches RDF and other graphical DBMS “Semantic web” is overhyped … … but the world DOES need ontology management systems Much depends on path length Analytic RDBMS may do the job Event/stream processing engines Design point = super-low latency … … but there are other applications Data is “executed against” queries rather than vice versa Could be the future of BI … … and of social networking Embedded DBMS for devices Products Sybase SQL Anywhere solidDB – focused on caching post-acquisition? Cloudscape – vaporized? McObject – tiny startup Features Load-and-forget Zero-DBA Small-footprint Sometimes -- subsettable library Matching analytic DBMS to use cases 100 Tb data mart 50 Tb enterprise data warehouse 5 Gb – 5 Tb OLTP offload Matching OLTP/general DBMS to use cases Market leader Mid-range High-end e-commerce High-volume call center Web startup It depends on how locked-in you are Simple departmental application General enterprise or SaaS app Clayton Christensen’s “disruption” narrative Market leaders have many advantages, including top technology. Followers come up with good technology too. The leaders stay ahead by making their products ever better and more complex. The followers sell into new or non-mainstream markets, at prices the leaders can’t match. So they dominate new markets. Old markets turn into low-margin commodity-fests. Unless they diversify, old leaders are doomed. That’s what’s happening here Much DBMS complexity is without benefit Other complexity only benefits a few highend customers Data warehouse specialists exploit radically superior technology (e.g., MPP) Open source vendors have radically different price points and business models Open source adoption has been strongest in non-traditional markets. And the big vendors know it Oracle is diversifying furiously Oracle has announced a clear focus on topend customers IBM is obviously focused on the high end too Oracle and (to some extent) IBM are buying alternative DBMS technologies Microsoft and IBM aren’t dependent on the DBMS business anyway