Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Moving Data • The need to move data around is not new, and will be with us while ever there are different DBMS – Internally, between systems: • Disparate information systems to one EIS • Transactional to Warehouse • Upgrading legacy systems – Externally, between organisations – Delivering data to clients with browsers School of Computing and Management Sciences © Sheffield Hallam University Processing Single database Create Table B AS Select * from Table A .dmp .xml .edi .csv .txt Common RDBMS Flat files of either variable or fixed length Import/Export W3C Standard Electronic Data compliant Interchange XML file conversion EDI message conversion Other data stores Line-by-line processing Degree of task complexity Moving Data • Between applications using same DBMS: – internal DBMS processes can include: • Import/Export - caution, this means different things to different manufacturers! • Bulk/Batch Copy/Move • Backup and Restore • Create …..as select * from….. – OS level is possible for some (cruder?) DBMS School of Computing and Management Sciences © Sheffield Hallam University Moving Data • Between different environments: – Enable the internal database functionality of target by making two systems logically the same: • Linking DBMS by internal drivers • Linking DBMS through external drivers (eg ODBC) – Dump from source into readable file and then import to target School of Computing and Management Sciences © Sheffield Hallam University External File formats • Variable length files – Need to identify where fields and records terminate – Probably the most common is the CSV • Comma Separated Variable • most dbms can at least dump to csv file – EG: Arnold,Band Music,Dallas Wind Symphony,Junkin,2 Arnold,Film Music,Moscow SO,Stromberg,1 Arnold,Symphonies 5 and 6,Ireland NSO,Penny,1 Arnold,Symphonies 7 and 8,Ireland NSO,Penny,1 © Sheffield Hallam University School of Computing and Management Sciences External File formats • Fixed length files – Each data field occurs at a set location in a record, or at a precise offset from position 1 • Watch out for record terminators! School of Computing and Management Sciences © Sheffield Hallam University Moving Data - special cases • Electronic Data Interchange (EDI) – facilitates the exchange of business documents between organisations. Steps are: • • • • • Collect data locally translate into predefined format transmit decode the data insert into relevant target database School of Computing and Management Sciences © Sheffield Hallam University Moving Data - EDI cont... • The case for EDI: – – – – – Decreased Operational Costs Improved Accuracy Greater Transaction Processing Efficiency Increased Productivity Improved payment cycles ????? School of Computing and Management Sciences © Sheffield Hallam University Moving Data - EDI cont... • The case against EDI: – – – – – Highly formalised High cost of entry Often industry specific For historic reasons, often utilises a VAN EDI implementation can often fall short of expectations because 20%-40% of trading partners can fail to come on board (Chute, 1996) – Kalakota and Whinston, 1996 estimate less than 5% of US businesses utilise EDI • But what about the latest e-trends, don’t they cry out for EDI? School of Computing and Management Sciences © Sheffield Hallam University Moving Data - XML • Extensible Markup Language (XML) – “A Web-based markup language that could give EDI the kick-start it needs…” EDINews – XML is a language which allows users to create their own markup language which can describe one particular content – It can run on any platform – Closely related to HTML, via SGML – Non proprietary - W3C School of Computing and Management Sciences © Sheffield Hallam University XML reminder • "XML is a human-readable, machine-understandable, general syntax for describing hierarchical data, applicable to a wide range of applications, databases, e-commerce, Java, web development, searching, and so on. " – Oracle OTN website, my italics • Building blocks are: Tags, Elements and attributes • Separates Data from validity checking and from presentation – by using DTD or Schemas to validate, and stylesheets to present School of Computing and Management Sciences © Sheffield Hallam University XML: why do it? • Because the rest of the world is doing it! – many tools now available – Rdbms vendors support its use • Because it has become the de facto answer to the need to exchange data between disparate systems • Because it is self-describing and thus can be universally used • XML is an Open standard • It is platform, database and application independent • Browser support means cheap delivery School of Computing and Management Sciences © Sheffield Hallam University XML: why not to do? • Performance • Massive increase in file sizes (over binary, or even over CSVs) • Increased skills level required to query compared with SQL (arguable) • New skillset required School of Computing and Management Sciences © Sheffield Hallam University Issues with XML in RDBMS • Paradigm conflict: – Hierarchical V Relational • XML is not designed to be queriable. – Xpath not as powerful as SQL? – Whole new syntax to learn • XML is not normalised • ISO SQL does not lend itself to generating XML output School of Computing and Management Sciences –http://www.w3.org/TR/xpath © Sheffield Hallam University How to store XML in RDBMS • Assuming XML inbound, do you: – break it up and store it in relational tables – store the document as a CLOB (Character Large Object datatype) – shred and store in a structured way, applying DTD or Schema rules? • Depends how it will be used: – part of an existing rdbmd-based system which allows users to do ad hoc SQL queries – just to forward on, as XML, to other users but seldom queried – Held for use by XPath conversant users – heavily CRUDed – As permanence for Web-based Distributed Authoring and Versioning (DAV). Extensions to HTTP which allows users to collaboratively edit and manage files on remote web servers. School of Computing and Management Sciences © Sheffield Hallam University XMLType in XMLdb drop table xml_chum ; create table xml_chum (somexmldata xmltype) ; Insert into xml_chum Values ( XMLTYPE.createXML('<CHUM><RegistrationID>1</RegistrationI D><ChumID>st2</ChumID></CHUM>') ); Insert into xml_chum Values ( XMLTYPE.createXML('<CHUM><RegistrationID>2</RegistrationI D><ChumID>st4</ChumID></CHUM>') ); select XMLTYPE.extractValue(somexmldata,'/CHUM/ChumID') from XML_CHUM where existsNode(somexmldata,'/CHUM[RegistrationID="2"]') = 1 ; School of Computing and Management Sciences © Sheffield Hallam University Outputing XML with XMLdb SET ECHO off set pagesize 0 set long 10000 SET VERIFY off SET FEEDBACK off spool F:\OScripts\xml\CD.xml select '<?xml version="1.0"?>' from dual ; select '<ROWSET>' from dual ; SELECT xmlelement("CD", xmlforest(a.composer, b.title, b.cdid)) FROM composers a, cds b where a.composerid=b.composerid and a.composerid < 10 ; select '</ROWSET>' from dual ; spool off ; Moving Data - XML • But the W3C Recommendations for XML include: – – – – It shall be straightforward to use XML over the Internet. XML shall support a wide variety of applications. XML shall be compatible with SGML. It shall be easy to write programs that process XML documents. – The number of optional features in XML is to be kept to an absolute minimum, ideally zero. – XML documents should be human-legible and reasonably clear. – Terseness in XML markup is of minimal importance. School of Computing and Management Sciences © Sheffield Hallam University Moving Data - summary • Laudable XML aims, but how data-centric are they? • Is anything more portable than a CSV (with header)? • However you wrap it up, a computer-based data handling system needs to convert information into ons-and-offs! School of Computing and Management Sciences © Sheffield Hallam University