Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
„Andmebaaside integreerimine ja optimeerimine“. 1. Andmete integreerimise vajadus. Integreerimine - andmete ühendamine eri allikatest ühte baasi. Business critical systems are running on different platforms in different geographical areas and time zones. Single business transactions and processes oftentimes span these physical and technological boundaries. The CIO of such an organization is faced with an important decision: how to ensure that this distributed and fragmented information is available to all decision makers – how they want it, when they want it, where they want it. In order to make smart decisions, executives and managers need up-to-date information. In the multi-platform organization it is extremely complex and expensive to achieve seamless data integration. This complexity is further compounded if the organization requires real-time data integration. The requirement to have instant access to information that spans the physical boundaries of organizations and countries is currently a major focus of CIO’s of corporations in the “global village”. The information required to make smart decisions is oftentimes locked up in specific systems and needs to be acquired from source systems, adapted to meet the needs of the decision makers and then applied to the relevant target system(s). 2. Off-line ja on-line integreerimine. EII (enterprise information integration) ja EAI (enterprise application integration). Enterprise Application Integration Solutions EAI solutions are mostly used for integration of information on an application (transactional) level. These solutions are normally deployed in conjunction with other components to enable the exchange of information between the different systems. The four components required for an EAI solution are described below. Each of the individual components is critical to the successful implementation of an EAI project. Interface – The Proxy Interface technologies can include adapters, connectors, agents and APIs. This interface is responsible for a view into and out of the source and target systems in a cross-system exchange. The sheer number and variety of platforms and systems combined with the variety of EAI integrators has created an environment where literally thousands of proprietary interfaces exist. Process – The Broker The process or message broker simply controls and orchestrates the vast number of messages that are being transmitted at any given time. Various business rules and workflow processes are incorporated into the EAI implementation via this broker. The broker is programmed with a fixed instruction set to control the delivery steps within the context of an existing business process. Transport – The Medium The message transport medium is the way in which the different systems communicate with each other. For systems communication the medium is usually concrete and can be fairly complex, depending on the environment. Typical EAI message transport mediums include SNA, HTTP, TCP/IP and other proprietary protocols. Message – The Content The actual content of the message comes in two parts: the message wrapper and the message data. The message wrapper is what specifies a multi-part EDI exchange, an XML formatted exchange or an object based exchange. A native database exchange via ODBC may also be considered as being a type of wrapper. In most EAI projects different vendors provide individual components and this can make these projects fairly complex to implement successfully. EII EII works by providing individuals or applications a single, virtual view across multiple data sources. A view represents a business entity - a customer, a sales pipeline, or the performance of a manufacturer's production floor - annotated with a metadata-based description. Applications access a view as if its data were physically located in a single database, even though individual data may reside in different source systems. When an application accesses a view, the EII platform transparently handles connectivity with back-end databases and applications, along with related functions, such as security, data integrity, and query optimization. Enterprise Information Integration (EII) allows applications to use information from a variety of sources. The EII server evaluates requests for information, queries the individual data sources, and delivers customized output to the requesting application. EII can work in conjunction with existing integration technologies like ETL (extract, transform, load) and EAI (enterprise application integration), but some people blur the distinctions among these technologies. Unlike ETL, EII leaves source data in place and retrieves what it needs, on demand. Compared with EAI, EII is clearly for combining information assets, not for scheduling data flows between applications. EII solves a set of business problems that share several common characteristics. IT managers should consider EII if their application needs to accomplish three of more of the tasks from the following list. Access data distributed across multiple sources (relational databases, enterprise applications, data warehouses, documents, XML) Combine data in different formats (relational databases, flat files, Word or Excel documents, XML) Integrate corporate data with information from outside the firewall Merge static data with messages, web services, or other data streams Perform queries that include archived data with live information Mix information from a data warehouse with current information Analyze information on-the-fly to drive an application Report to a variety of formats using widely distributed information Implement a path to a service-oriented architecture with a minimal impact to existing IT infrastructure http://www.ipedo.com/html/eii_zone_techoverview.html http://www.datawarehouse.com/article/?articleID=4942 3.Andmeait ja andmeaidandus. Klassikaline ülesehitus, komponentide selgitus. The data warehouse architecture is based on a relational database management system server that functions as the central repository for informational data. Operational data and processing is completely separated from data warehouse processing. This central information repository is surrounded by a number of key components designed to make the entire environment functional, manageable and accessible by both the operational systems that source data into the warehouse and by end-user query and analysis tools. Typically, the source data for the warehouse is coming from the operational applications. As the data enters the warehouse, it is cleaned up and transformed into an integrated structure and format. The transformation process may involve conversion, summarization, filtering and condensation of data. Because the data contains a historical component, the warehouse must be capable of holding and managing large volumes of data as well as different data structures for the same database over time. Components of data warehousing: Data Warehouse Database The central data warehouse database is the cornerstone of the data warehousing environment. This database is almost always implemented on the relational database management system (RDBMS) technology. However, this kind of implementation is often constrained by the fact that traditional RDBMS products are optimized for transactional database processing. Certain data warehouse attributes, such as very large database size, ad hoc query processing and the need for flexible user view creation including aggregates, multi-table joins and drilldowns, have become drivers for different technological approaches to the data warehouse database. Sourcing, Acquisition, Cleanup and Transformation Tools A significant portion of the implementation effort is spent extracting data from operational systems and putting it in a format suitable for informational applications that run off the data warehouse. The data sourcing, cleanup, transformation and migration tools perform all of the conversions, summarizations, key changes, structural changes and condensations needed to transform disparate data into information that can be used by the decision support tool. They produce the programs and control statements, including the COBOL programs, MVS job-control language (JCL), UNIX scripts, and SQL data definition language (DDL) needed to move data into the data warehouse for multiple operational systems. These tools also maintain the meta data. The functionality includes: Removing unwanted data from operational databases Converting to common data names and definitions Establishing defaults for missing data Accommodating source data definition changes Meta data Meta data is data about data that describes the data warehouse. It is used for building, maintaining, managing and using the data warehouse. Meta data can be classified into: Technical meta data, which contains information about warehouse data for use by warehouse designers and administrators when carrying out warehouse development and management tasks. Business meta data, which contains information that gives users an easy-to-understand perspective of the information stored in the data warehouse. Access Tools The principal purpose of data warehousing is to provide information to business users for strategic decisionmaking. These users interact with the data warehouse using front-end tools. Many of these tools require an information specialist, although many end users develop expertise in the tools. Tools fall into four main categories: query and reporting tools, application development tools, online analytical processing tools, and data mining tools. Data Marts The concept of a data mart is causing a lot of excitement and attracts much attention in the data warehouse industry. Mostly, data marts are presented as an alternative to a data warehouse that takes significantly less time and money to build. However, the term data mart means different things to different people. A rigorous definition of this term is a data store that is subsidiary to a data warehouse of integrated data. The data mart is directed at a partition of data (often called a subject area) that is created for the use of a dedicated group of users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on the data warehouse rather than a physically separate store of data. In most instances, however, the data mart is a physically separate store of data and is resident on separate database server, often a local area network serving a dedicated user group. Sometimes the data mart simply comprises relational OLAP technology which creates highly denormalized dimensional model (e.g., star schema) implemented on a relational database. The resulting hypercubes of data are used for analysis by groups of users with a common interest in a limited portion of the database. These types of data marts, called dependent data marts because their data is sourced from the data warehouse, have a high value because no matter how they are deployed and how many different enabling technologies are used, different users are all accessing the information views derived from the single integrated version of the data. Data Warehouse Administration and Management Data warehouses tend to be as much as 4 times as large as related operational databases, reaching terabytes in size depending on how much history needs to be saved. They are not synchronized in real time to the associated operational data but are updated as often as once a day if the application requires it. In addition, almost all data warehouse products include gateways to transparently access multiple enterprise data sources without having to rewrite applications to interpret and utilize the data. Furthermore, in a heterogeneous data warehouse environment, the various databases reside on disparate systems, thus requiring internetworking tools. The need to manage this environment is obvious. Managing data warehouses includes security and priority management; monitoring updates from the multiple sources; data quality checks; managing and updating meta data; auditing and reporting data warehouse usage and status; purging data; replicating, subsetting and distributing data; backup and recovery and data warehouse storage management. Information Delivery System The information delivery component is used to enable the process of subscribing for data warehouse information and having it delivered to one or more destinations according to some user-specified scheduling algorithm. In other words, the information delivery system distributes warehouse-stored data and other information objects to other data warehouses and end-user products such as spreadsheets and local databases. Delivery of information may be based on time of day or on the completion of an external event. The rationale for the delivery systems component is based on the fact that once the data warehouse is installed and operational, its users don’t have to be aware of its location and maintenance. All they need is the report or an analytical view of data at a specific point in time. With the proliferation of the Internet and the World Wide Web such a delivery system may leverage the convenience of the Internet by delivering warehouse-enabled information to thousands of end-users via the ubiquitous world wide network. http://www.tdan.com/i003fe11.htm 4. Metaandmed, nende vajadus, tagamine ja kasutamine. Metadata describes the data about data elements and attributes (such as name, size, or data type), data about records or data structure (such as length, fields, columns), and data about data (where located, how associated, ownership). It may include descriptive information about content, quality, conditions, or characteristics of the data. For web services, the metadata is wrapped around the basic functionality so that the basic service can be used by reference. http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.iq/html/iqintro/meta.htm 5.ETL (extraction, transformation, loading) , komponentide funktsioonid ja nendega seotud probleemid. ETL utilities are most typically associated with data warehousing where some measure of data latency can be tolerated. These tools are normally used to process batches of data in bulk mode and perform data cleansing and transformation during this process. Meta data management is a feature that is required by most users of ETL tools and data is usually staged in a staging area where it is cleansed, verified and transformed prior to being loaded into the target system. During this process the meta data repository is generated and maintained. ETL tools are mostly used in scenarios where bi-directional replication is not required. These tools are almost always pure data movement tools and not well suited for the requirements associated with true real-time data integration. Data movement occurs between the different source systems and the operational data store or data warehouse. ETL technology normally exists as a tool or utility and does not always provide the total data integration solution required in the complex environment of realtime, cross-platform data integration. http://www.itcollege.ee/~gseier/dmdirect_article.cfm.html 6.Andmete kvaliteediga seotud probleemid. Essential problems and causes of insufficient data quality are listed as: • Incorrect values, missing values and references, duplicates (these do not interfere operative systems). • Inconsistent data between different systems. • Incorrect data gathering and data operations. • Insufficient plausibility checks in operative systems (e.g. during data input). • Changes in the operative systems that are not documented or forwarded to other systems. • Insufficient modelling and redundant data. • System problems (technical) 7.Andmete kvaliteedi tagamise poliitika. Possibilities for ensuring data quality are mentioned as: • Quality checks before the final loading into the Data Warehouse Data Base. • Data Cleansing in the ETL process. • Loading and tagging problematic data (e.g. referential integrity or value domains) • Automatic correction and data cleansing (e.g. format errors). • Manual correction and data cleansing (e.g. data interpretation; frequent the problems are already known by the domain expert). • Feedback to data suppliers about test results (for possible data correction and further data delivery). • Error location and co-ordination with data suppliers. • Organisational approaches. Interestingly not one enterprise listed the integration of the quality specification and quality measurement in the meta data management. If possible the data quality lacks should be reported to the data suppliers and improvement should start at their causes (proactive). A continuous contact between the central data warehouses and source systems is therefore useful. Typical objectives of a data quality program include: o o o o o o o Eliminate redundant data Validate data at input Eliminate false null values Ensure data values fall within defined domains Resolve conflicts in data Ensure proper definition and use of data values Establish and apply standards. 8.OLTP (transaktsioonipõhised) ja DSS (otsustussüsteemide) andmebaaside arhitektuuri ja toimimise eripärad (Sybase IQ baasil). OLTP andmebaasi süsteemid on parimad muutuvate andmete administreerimiseks ning on sobiv just reaalajas toimuvate muutuste ja suure arvu kasutajate jaoks. Kuigi päringute suurused on väiksed tehakse neid palju ja üheaegselt. Näited: pank, piletite müük. Olulised omadused on andmete lukustamise võimalused (concurrency) muudatuste tegemise hetkel ning võimalus veenduda et muudatuste protsess on edukalt lõpetatud (Atomicity). OLTP süsteemid nõuavad väga hästi disainitud andmestruktuuri, optimaalset indekseerimist ning võimaldavad teha varundust jooksvalt. DSS süsteemid on ideaalsed muutumatute andmete päringute sooritamiseks (statistika, arhiivid, andmeladu). DSS andmebaasides on tabelid tavaliselt väga indekseeritud ning anmeid eeltöödeldakse ja organiseeritakse et erinevate päringutüüpidega kohanduda. Kuna kasutajad ei muuda andmeid pole concurrency ja atomicity probleemiks. Andmeid uuendatakse tavaliselt pikemate perioodide tagant. 9.Sybase IQ tabelite administreerimine. Sybase IQ applications use SQL to access Sybase IQ databases. The SQL interface is fully compatible with the Sybase Adaptive Server Anywhere and Sybase Adaptive Server Enterprise SQL syntax. It is also ANSI SQL-92 compliant, and supports both ODBC 3.0 and JDBC call-level SQL interfaces. Java stored procedures can be employed to place business logic in the database. Creating tables in Sybase Central To create a table using Sybase Central, see "Managing tables" in Introduction to Sybase IQ. SQL statement for creating tables. The SQL statement for creating tables is CREATE TABLE. CREATE TABLE skill ( skill_id INTEGER NOT NULL, skill_name CHAR( 20 ) NOT NULL, skill_type CHAR( 20 ) NOT NULL ) Indexes and IQ UNIQUE If you estimate IQ UNIQUE incorrectly, there is no penalty for loads; the Optimizer simply uses the next larger index. For queries, if you estimate IQ UNIQUE incorrectly and you have an HG, LF, or storage-optimized default index, the Optimizer ignores the IQ UNIQUE value and uses the actual number of values in the index. If you do not have one of these indexes and your estimate is wrong by a significant amount (for example, if you specify IQ UNIQUE 1000000 when the actual number of unique values is 12 million), query performance may suffer. To change the value of IQ UNIQUE for an existing index, run the sp_iqrebuildindex procedure. For details, see Using the ALTER TABLE statement The following command adds a column to the skill table to allow space for an optional description of the skill: ALTER TABLE skill ADD skill_description CHAR( 254 ) The following statement changes the name of the entire table: ALTER TABLE skill RENAME qualification These examples show how to change the structure of the database. The ALTER TABLE statement can change many characteristics of a table—foreign keys can be added or deleted, and so on. However, you cannot use MODIFY to change table or column constraints. Instead, you must DELETE the old constraint and ADD the new one. In all these cases, once you make the change, stored procedures, views, and any other item referring to this column will no longer work. Altering tables in a join index You cannot ADD, DROP or MODIFY a base table column that participates in a join condition of a join index. To alter joined columns, you must first drop the join index, alter the table, and then recreate the join index. Dropping tables The following DROP TABLE statement deletes all the records in the skill table and then removes the definition of the skill table from the database DROP TABLE skill Like the CREATE statement, the DROP statement automatically executes a COMMIT before and after dropping the table. This makes permanent all changes to the database since the last COMMIT or ROLLBACK. The DROP statement also drops all indexes on the table, except if any column in the table participates in a join index. Dropping a table in Sybase Central Connect to the database. Click the Tables folder for that database. Right-click the table you wish to delete, and select Delete from the pop-up menu. Creating a primary key The following statement creates the same skill table as before, except that a primary key is added: CREATE TABLE skill ( skill_id INTEGER NOT NULL, skill_name CHAR( 20 ) NOT NULL, skill_type CHAR( 20 ) NOT NULL, primary key( skill_id ) ) The primary key values must be unique for each row in the table which, in this case, means that you cannot have more than one row with a given skill_id. Each row in a table is uniquely identified by its primary key. Columns in the primary key are not allowed to contain NULL. You must specify NOT NULL on the column in the primary key. Creating a primary key in Sybase Central Creating a primary key in Sybase Central Connect to the database. Click the Tables folder for that database. Right-click the table you wish to modify, and select Properties from the pop-up menu to display its property sheet. Click the Columns tab, select the column name, and either click Add to Key or Remove from Key. Column values must be unique. Creating foreign keys You can create a table named emp_skill, which holds a description of each employee's skill level for each skill in which they are qualified, as follows: CREATE TABLE emp_skill( emp_id INTEGER NOT NULL, skill_id INTEGER NOT NULL, "skill level" INTEGER NOT NULL, PRIMARY KEY( emp_id, skill_id ), FOREIGN KEY REFERENCES employee, FOREIGN KEY REFERENCES skill ) The emp_skill table definition has a primary key that consists of two columns: the emp_id column and the skill_id column. An employee may have more than one skill, and so appear in several rows, and several employees may possess a given skill, so that the skill_id may appear several times. The emp_skill table also has two foreign keys. The foreign key entries indicate that the emp_id column must contain a valid employee number that is a primary key in the skill table from the employee table, and that the skill_id must contain a valid entry that is a primary key in the skill table from the skill table. A table can only have one primary key defined, but it may have as many foreign keys as necessary. Creating a foreign key in Sybase Central Connect to the database. Click the Tables folder for that database. Click the table holding the primary key, and drag it to the foreign key table. When the primary key table is dropped on the foreign key table, the Foreign Key Wizard is displayed, which leads you through the process of creating the foreign key. All the information about tables in a database is held in the system tables. The information is distributed among several tables. You can use Sybase Central or DBISQL to browse the information in these tables. Type the following command in the DBISQL command window to see all the columns in the SYS.SYSTABLE table: SELECT * FROM SYS.SYSTABLE Displaying system tables in Sybase Central Connect to the database. Right-click the database, and select Filter Objects from the pop-up menu. Select SYS and OK. When you view the database tables or views with Show System Objects checked, the system tables or views are also shown. http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.iq/html/iqintro/meta.htm http://216.239.59.104/search?q=cache:Z1XeE5DiaWMJ:www.sybase.cz/prod/dbase/pdf/IQ_DBAssoc.pdf+syba se+IQ+indexing+howto&hl=en 10.Sybase IQ salvestatavate andmete mahu vähendamise põhimõtted. Sybase IQ employs patented bit-wise indexing and compression technology to index all the fields in a data warehouse database. This approach aids query performance in addition to reducing disk storage. Bit-wise indexing reduces disk space requirements. Unlike traditional RDBMSs, where indexes occupy additional disk space over and above that required by the actual data, Sybase IQ bit-wise indexing does not require any additional disk space. In Sybase IQ, the index is the database, and although disk space requirements for this index will vary by application, the amount of disk space required by a Sybase IQ database (including indexes) is nearly always smaller than the size of the input data. Sybase IQ indexes offer these benefits over traditional indexing techniques: Index sizes remain small. The entire database can be fully indexed and made available for ad hoc queries in the same space that would be needed to store the raw data. Most traditional databases need three times as much space. Queries are resolved by efficiently combining and manipulating indexes on only the relevant columns. This avoids time-consuming table scans. I/O is minimized, eliminating potential bottlenecks. Because indexes are compact, more data can be kept in memory for subsequent queries, thereby speeding throughput on iterative analysis. Tuning is data-dependent, allowing data to be optimized once for any number of ad hoc queries. 11.Sybase IQ indeksite põhitüübid, nende otstarbeka kasutamise põhimõtted, indeksite loomine. For any column that has no index defined, or whenever it is the most effective, query results are produced using the default index. This structure is fastest for projections, but generally is slower than any of the three column index types you define for anything other than a projection. Performance is still faster than most RDBMSs since one column of data is fetched, while other RDBMSs need to fetch all columns which results in more disk I/O operations. If a column is used only in projections, even if some of the queries return a small number of rows, Low_Fast and High_Non_Group indexes are redundant because the default structure is equally as fast for projecting a small number of rows. The Low_Fast (LF) index type This index is ideal for columns that have a very low number of unique values (under 1,000) such as sex, Yes/No, True/False, number of dependents, wage class, and so on. LF is the fastest index in Adaptive Server IQ. When you test for equality, just one lookup quickly gives the result set. To test for inequality, you may need to examine a few more lookups. Calculations such as SUM, AVG, and COUNT are also very fast with this index. As the number of unique values in a column increases, performance starts to degrade and memory and disk requirements start to increase for insertions and some queries. When doing equality tests, though, it is still the fastest index, even for columns with many unique values. Recommended use Use an LF index when: A column has fewer than 1,000 unique values. A column has fewer than 1,000 unique values and is used in a join predicate. Never use an LF index for a column with 10,000 or more unique values. If the table has fewer than 25,000 rows, use an HG index, as fewer disk I/O operations are required for the same operation. Advantages and disadvantages of Low_Fast The High_Group (HG) index type The High_Group index is commonly used for join columns with integer data types. It is also more commonly used than High_Non_Group because it handles GROUP BY efficiently. Use an HG index when: The column will be used in a join predicate A column has more than 1000 unique values Foreign key columns require their own, individual HG index. However, if a join index exists, the same column cannot have both an explicitly created HG index and a foreign key constraint. The High_Non_Group (HNG) index type Add an HNG index when you need to do range searches. An HNG index requires approximately three times less disk space than an HG index requires. On that basis alone, if you do not need to do group operations, use an HNG index instead of a HG index. Conversely, if you know you are going to do queries that a HG index handles more efficiently, or if the column is part of a join and/or you want to enforce uniqueness, use a HG index. Using the HNG index in place of a HG index may seriously degrade performance of complex ad-hoc queries joining four or more tables. If query performance is important for such queries in your application, choose both HG and HNG. Use an HNG index when: The number of unique values is high (greater than 1000) You don't need to do GROUP BY on the column The Compare (CMP) index type A Compare (CMP) index is an index on the relationship between two columns. You may create Compare indexes on any two distinct columns with identical data types, precision, and scale. The CMP index stores the binary comparison (<, >, or =) of its two columns. The CMP index can be created on columns that are NULL, NOT NULL, or a mixture. The CMP index cannot be unique. Note that numeric and decimal data types are considered identical. You may create CMP indexes on them when precision and scale are identical. For CHAR, VARCHAR, BINARY, and VARBINARY columns, precision means having the same column width. The following restrictions apply to CMP: You can drop CMP indexes. CMP indexes cannot be unique. CMP indexes are not replicated in underlying join indexes. A partial width insert into a table is disallowed when not all columns of a CMP index are part of the insert. An exception is raised if you attempt to alter or delete a column that is defined in a CMP index. Users cannot ALTER TABLE MODIFY an existing column that is defined in a CMP index. CMP indexes do not support the BIT, FLOAT, DOUBLE, and REAL data types. The Containment (WD) index type This index allows you to store words from a column string of CHAR and VARCHAR data. Use a WD index for the fastest access to columns that contain a list of keywords (for example, in bibliographic record or Web page). The You The The The following restrictions apply to WD: cannot specify the UNIQUE attribute. WD index is used only with the CONTAINS or LIKE predicate. column-name must identify a CHAR or VARCHAR column in a base table minimum permitted column width is 3 bytes and the maximum permitted column width is 32767 bytes. Three index types are used to process queries involving date, time, or datetime quantities: A DATE index is used on columns of data type DATE to process certain queries involving date quantities. The TIME index is used on columns of data type TIME to process certain queries involving time quantities. The DTTM index is used on columns of data type DATETIME or TIMESTAMP to process certain queries involving datetime quantities. Recommended use Use a DATE, TIME, or DTTM index in the following two cases, when the DATE, TIME, DATETIME, or TIMESTAMP column is used in queries containing date and time functions and operations: For a simple equality predicate (no DATEPART) with a DATE, TIME, DATETIME, or TIMESTAMP column, LF and HG indexes have the best performance. If an LF or HG index is not available, then the DATE, TIME, or DTTM index is used to get the result. If a DATE, TIME, DATETIME, or TIMESTAMP column is used in the GROUP BY clause or in the WHERE/HAVING clauses for equalities (including join conditions) or IN predicates, the column needs an LF or HG index, as only these indexes can do fast equality. Also see the section " Additional indexes" for index recommendations for DATE, TIME, DATETIME, and TIMESTAMP columns. For a query with an equality predicate (= or !=), if one side of the comparison is a DATEPART expression or some other date and time function (e.g., YEAR, QUARTER, DAY, MINUTE), and the other side of the comparison is a constant expression (including a constant value or host variable), then the DATE, TIME, or DTTM index is used (if the index is available) to get the result set. For example, the DATE, TIME, or DTTM index is used in the following queries: SELECT * FROM tab WHERE DATEPART(YEAR, col1) = 2002; SELECT * FROM tab WHERE DATEPART(HOUR, col2) = 20; SELECT * FROM tab WHERE MINUTE (col3) != 30; SELECT * FROM tab WHERE DATEPART(MONTH, col2) = @tmon; where @tmon is an INTEGER host variable. The DATE, TIME, and DTTM indexes do not support some date parts (Calyearofweek, Dayofyear, Millisecond) and DATEPART range predicates. For example, The DATE, TIME, and DTTM indexes have performance consistent with the HNG index. Compared to HNG, DATE, TIME, and DTTM indexes are generally faster (up to twice as fast) than HNG in the supported cases. In the special cases discussed in the "Recommended use" section, the performance of the DATE, TIME, and DTTM indexes is even better. Therefore, an HNG index is not necessary in addition to a DATE, TIME, or DTTM index on a column of DATE, TIME, DATETIME, or TIMESTAMP data type. Additional indexes The recommendation is to always have a DATE, TIME, or DTTM index on a column of DATE, TIME, DATETIME, or TIMESTAMP data type, if the column is referenced in the WHERE clause, in ON conditions, or in the GROUP BY clause. In addition, the HG or LF index may also be appropriate for a DATE, TIME, DATETIME, or TIMESTAMP column, especially if you are evaluating equality predicates against the column. A LF index is also recommended, if you frequently use the column in the GROUP BY clause and there are less than 1000 distinct values (i.e., less than three years of dates). Optimizing performance for ad hoc joins When you create an additional column index, the CREATE INDEX command creates the new index as part of the individual table and as part of any join indexes that include the column. CMP and multicolumn HG indexes are the only exception to this rule. If the existing column indexes in the individual table already contain data, the CREATE INDEX statement also inserts data into the new index from an existing index. This ensures data integrity among all the column indexes for columns within an individual table. Data is also inserted and synchronized automatically when you add an index to previously loaded tables that are part of a join index. http://sybooks.sybase.com:80/onlinebooks/groupiq/iqg1250e/iqapg/@Generic__BookView;pt=462?DwebQuery=Indexing Vastused peaksid olema põhjalikud, lühivastused ei tule arvesse. Materjale ümber kirjutada ja kaasvõitlejatelt copy-paste teha pole mõtet. Lugemiseks: www.datawarehouse.com/article/?articleID= ……… (punktide asemele tuleb kirjutada artikli number ) Relatsioonilistest andmebaasidest: 4436 Andmekvaliteet: 2912 4681 7148 Metaandmed: 4978 4979 Andmete integreerimine: 4942 4489 4897 1014857 Ka www.dmreview.com/article_sub.cfm?articleId=8263 võiks huvi pakkuda. Andmeladundus: 4916 4864 4814 4850 4780 4662 4473 www.dmreview.com/editorial/dmreview/print_action.cfm?EdID=6248 annab üht-teist struktureerimata andmete käsitlemisest, nagu ka samalt aadressilt EdID=5891. Metaandmetega algtutvuseks sobib www.ukoln.ac.uk/metadata/publications/nutshell www.ukoln.ac.uk/metadata/review.html „Andmebaaside integreerimine ja optimeerimine“ Kodused tööd. Kuupäeva parser. Andmetes võib kuupäev alla esitatud väga mitmesugusel kujul. Näiteks eesti keeles on tavaks kirjutada kuupäev kujul 11. jaan(uar(il)) (20)04(.)a(.) või 11.01.04, või 11.01.2004 . Teistes keeltes ja kultuurides on kuupäevade esitusviisid erinevad – ameeriklastel üldjuhul mm/dd/(yy)yy, inglastel dd/mm/yyyy jne. Kuupäeva komponente eraldavad sümbolid võivad olla erinevad. Euroopa Liidu standardi kohaselt tuleb kuupäev esitada kujul yyyy-mm-dd (see peaks olema ka Eesti andmetöötluse standardiks). Ülesanne: Projekteerida ja kirjutada programm(parser), mis teisendab standardsele kujule sisendisse antava suvalises formaadis kuupäeva. Programm peaks olema õpivõimeline – kord sisendisse sattunud ja lahendatud formaadi kohta salvestatakse kõik teisenduseks vajalik. Kui sisendformaat pole tuntav, peaks interaktiivselt küsima tegevusjuhendeid. Ideed eesti keeles kirjutatult, programmi võib realiseerida suvalises programmeerimiskeeles. Ülesanne on väga mahuks; idee kirjeldusele see ei mõju, realisatsioonis võib piirduda 3-4 eriti markantse näitega. Numbrid sõnadega (eesti keeles). Aruandluses nõutakse tihti lisaks numbritele ka nende piisavalt võltsimiskindlat sõnalist esitust. Numbrite võltsimiskindluse tõstmiseks lisatakse neile vasakule täitmata positsioonidesse sümbolid, millest on raske numbreid tuletada, näit. tärn(*). Sõnalises vormis tuleb eesti keeles tähelepanu pöörata sajaliste esitusele – näit. tavapärase „sada kaks“ (102) asemel tuleb kasutada vormi „ükssada kaks“. Tähelepanu ka nn. „teistmetele“ ! Programm tuleb esitada SQLis (Oracle PL/SQL ei kõlba) soovitavalt andmebaasi salvestatud funktsioonina (protseduurina). Aeglaselt muutuvate andmete ökonoomne salvestamine koos muutumiste ajaloo säilimisega. Tüüpsituatsioon andmeaidanduses, kus põhisüsteemidest tulevates andmetes muutuvad ainult vähesed väljad, nii et andmete nn. hetkvõtteid tervikuna pole otstarbeks säilitada. Põhitingimuseks on, et põhisüsteemide normaalset tööd häirida ei tohi ning nende poolt mingit täiendavat teavet muutuste kohta andmetes ei tule. Selgitamaks, mis ja kuidas on muutunud võrreldes eelmise andmevõtuga peab olema loodud mingi muutuste tuvastamise ja salvestamise mehhanism. Eeskujuks võiks võtta ühe artiklis „Improved ETL Change Detection through Operational Meta Data“ by M. Jennings (link: www.datawarehouse.com/article/?articleID=2982) toodud näidetes või metoodika, millest ma olen loengutes rääkinud. Kodutööde esitamine on eksami eduka sooritamise eelduseks ning need tuleks esitada vähemalt 1 nädal enne eksamit (eksam on 13. ja 14. jaanuaril 2005. a.).