Download 9.Sybase IQ tabelite administreerimine.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
„Andmebaaside integreerimine ja optimeerimine“.
1. Andmete integreerimise vajadus.
Integreerimine - andmete ühendamine eri allikatest ühte baasi.
Business critical systems are running on different platforms in different geographical areas and time zones.
Single business transactions and processes oftentimes span these physical and technological boundaries. The
CIO of such an organization is faced with an important decision: how to ensure that this distributed and
fragmented information is available to all decision makers – how they want it, when they want it, where they
want it.
In order to make smart decisions, executives and managers need up-to-date information. In the multi-platform
organization it is extremely complex and expensive to achieve seamless data integration. This complexity is
further compounded if the organization requires real-time data integration.
The requirement to have instant access to information that spans the physical boundaries of organizations and
countries is currently a major focus of CIO’s of corporations in the “global village”.
The information required to make smart decisions is oftentimes locked up in specific systems and needs to be
acquired from source systems, adapted to meet the needs of the decision makers and then applied to the
relevant target system(s).
2. Off-line ja on-line integreerimine. EII (enterprise information
integration) ja EAI (enterprise application integration).
Enterprise Application Integration Solutions
EAI solutions are mostly used for integration of information on an application
(transactional) level. These solutions are normally deployed in conjunction with
other components to enable the exchange of information between the different
systems.
The four components required for an EAI solution are described
below. Each of the individual components is critical to the successful
implementation of an EAI project.
Interface – The Proxy
Interface technologies can include adapters, connectors, agents and
APIs. This interface is responsible for a view into and out of the source
and target systems in a cross-system exchange. The sheer number
and variety of platforms and systems combined with the variety of EAI
integrators has created an environment where literally thousands of
proprietary interfaces exist.
Process – The Broker
The process or message broker simply controls and orchestrates the vast
number of messages that are being transmitted at any given time.
Various business rules and workflow processes are incorporated into
the EAI implementation via this broker. The broker is programmed with
a fixed instruction set to control the delivery steps within the context of an
existing business process.
Transport – The Medium
The message transport medium is the way in which the different systems
communicate with each other. For systems communication the medium
is usually concrete and can be fairly complex, depending on the
environment. Typical EAI message transport mediums include SNA,
HTTP, TCP/IP and other proprietary protocols.
Message – The Content
The actual content of the message comes in two parts: the message
wrapper and the message data. The message wrapper is what specifies
a multi-part EDI exchange, an XML formatted exchange or an object based
exchange. A native database exchange via ODBC may also be considered
as being a type of wrapper.
In most EAI projects different vendors provide individual components and
this can make these projects fairly complex to implement successfully.
EII
EII works by providing individuals or applications a single, virtual view across multiple data sources. A view
represents a business entity - a customer, a sales pipeline, or the performance of a manufacturer's production
floor - annotated with a metadata-based description. Applications access a view as if its data were physically
located in a single database, even though individual data may reside in different source systems. When an
application accesses a view, the EII platform transparently handles connectivity with back-end databases and
applications, along with related functions, such as security, data integrity, and query optimization.
Enterprise Information Integration (EII) allows applications to use information from a variety of sources. The
EII server evaluates requests for information, queries the individual data sources, and delivers customized
output to the requesting application.
EII can work in conjunction with existing integration technologies like ETL (extract, transform, load) and EAI
(enterprise application integration), but some people blur the distinctions among these technologies. Unlike
ETL, EII leaves source data in place and retrieves what it needs, on demand. Compared with EAI, EII is clearly
for combining information assets, not for scheduling data flows between applications.
EII solves a set of business problems that share several common characteristics. IT managers should consider
EII if their application needs to accomplish three of more of the tasks from the following list.

Access data distributed across multiple sources (relational databases, enterprise applications, data
warehouses, documents, XML)

Combine data in different formats (relational databases, flat files, Word or Excel documents, XML)

Integrate corporate data with information from outside the firewall

Merge static data with messages, web services, or other data streams

Perform queries that include archived data with live information

Mix information from a data warehouse with current information

Analyze information on-the-fly to drive an application

Report to a variety of formats using widely distributed information

Implement a path to a service-oriented architecture with a minimal impact to existing IT infrastructure
http://www.ipedo.com/html/eii_zone_techoverview.html
http://www.datawarehouse.com/article/?articleID=4942
3.Andmeait ja andmeaidandus. Klassikaline ülesehitus,
komponentide selgitus.
The data warehouse architecture is based on a relational database management system server that functions
as the central repository for informational data. Operational data and processing is completely separated from
data warehouse processing. This central information repository is surrounded by a number of key components
designed to make the entire environment functional, manageable and accessible by both the operational
systems that source data into the warehouse and by end-user query and analysis tools.
Typically, the source data for the warehouse is coming from the operational applications. As the data enters the
warehouse, it is cleaned up and transformed into an integrated structure and format. The transformation
process may involve conversion, summarization, filtering and condensation of data. Because the data contains
a historical component, the warehouse must be capable of holding and managing large volumes of data as well
as different data structures for the same database over time.
Components of data warehousing:
Data Warehouse Database
The central data warehouse database is the cornerstone of the data warehousing environment. This database is
almost always implemented on the relational database management system (RDBMS) technology. However,
this kind of implementation is often constrained by the fact that traditional RDBMS products are optimized for
transactional database processing. Certain data warehouse attributes, such as very large database size, ad hoc
query processing and the need for flexible user view creation including aggregates, multi-table joins and drilldowns, have become drivers for different technological approaches to the data warehouse database.
Sourcing, Acquisition, Cleanup and Transformation Tools
A significant portion of the implementation effort is spent extracting data from operational systems and putting
it in a format suitable for informational applications that run off the data warehouse.
The data sourcing, cleanup, transformation and migration tools perform all of the conversions, summarizations,
key changes, structural changes and condensations needed to transform disparate data into information that
can be used by the decision support tool. They produce the programs and control statements, including the
COBOL programs, MVS job-control language (JCL), UNIX scripts, and SQL data definition language (DDL)
needed to move data into the data warehouse for multiple operational systems. These tools also maintain the
meta data. The functionality includes:

Removing unwanted data from operational databases

Converting to common data names and definitions

Establishing defaults for missing data

Accommodating source data definition changes
Meta data
Meta data is data about data that describes the data warehouse. It is used for building, maintaining, managing
and using the data warehouse. Meta data can be classified into:
Technical meta data, which contains information about warehouse data for use by warehouse designers and
administrators when carrying out warehouse development and management tasks.
Business meta data, which contains information that gives users an easy-to-understand perspective of the
information stored in the data warehouse.
Access Tools
The principal purpose of data warehousing is to provide information to business users for strategic decisionmaking. These users interact with the data warehouse using front-end tools. Many of these tools require an
information specialist, although many end users develop expertise in the tools. Tools fall into four main
categories: query and reporting tools, application development tools, online analytical processing tools, and
data mining tools.
Data Marts
The concept of a data mart is causing a lot of excitement and attracts much attention in the data warehouse
industry. Mostly, data marts are presented as an alternative to a data warehouse that takes significantly less
time and money to build. However, the term data mart means different things to different people. A rigorous
definition of this term is a data store that is subsidiary to a data warehouse of integrated data. The data mart is
directed at a partition of data (often called a subject area) that is created for the use of a dedicated group of
users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data. Sometimes, such
a set could be placed on the data warehouse rather than a physically separate store of data. In most instances,
however, the data mart is a physically separate store of data and is resident on separate database server, often
a local area network serving a dedicated user group. Sometimes the data mart simply comprises relational
OLAP technology which creates highly denormalized dimensional model (e.g., star schema) implemented on a
relational database. The resulting hypercubes of data are used for analysis by groups of users with a common
interest in a limited portion of the database.
These types of data marts, called dependent data marts because their data is sourced from the data
warehouse, have a high value because no matter how they are deployed and how many different enabling
technologies are used, different users are all accessing the information views derived from the single integrated
version of the data.
Data Warehouse Administration and Management
Data warehouses tend to be as much as 4 times as large as related operational databases, reaching terabytes
in size depending on how much history needs to be saved. They are not synchronized in real time to the
associated operational data but are updated as often as once a day if the application requires it.
In addition, almost all data warehouse products include gateways to transparently access multiple enterprise
data sources without having to rewrite applications to interpret and utilize the data. Furthermore, in a
heterogeneous data warehouse environment, the various databases reside on disparate systems, thus requiring
internetworking tools. The need to manage this environment is obvious.
Managing data warehouses includes security and priority management; monitoring updates from the multiple
sources; data quality checks; managing and updating meta data; auditing and reporting data warehouse usage
and status; purging data; replicating, subsetting and distributing data; backup and recovery and data
warehouse storage management.
Information Delivery System
The information delivery component is used to enable the process of subscribing for data warehouse
information and having it delivered to one or more destinations according to some user-specified scheduling
algorithm. In other words, the information delivery system distributes warehouse-stored data and other
information objects to other data warehouses and end-user products such as spreadsheets and local databases.
Delivery of information may be based on time of day or on the completion of an external event. The rationale
for the delivery systems component is based on the fact that once the data warehouse is installed and
operational, its users don’t have to be aware of its location and maintenance. All they need is the report or an
analytical view of data at a specific point in time. With the proliferation of the Internet and the World Wide Web
such a delivery system may leverage the convenience of the Internet by delivering warehouse-enabled
information to thousands of end-users via the ubiquitous world wide network.
http://www.tdan.com/i003fe11.htm
4. Metaandmed, nende vajadus, tagamine ja kasutamine.
Metadata describes the data about data elements and attributes (such as name, size, or data type), data about
records or data structure (such as length, fields, columns), and data about data (where located, how
associated, ownership). It may include descriptive information about content, quality, conditions, or
characteristics of the data.
For web services, the metadata is wrapped around the basic functionality so that the basic service can be used
by reference.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.iq/html/iqintro/meta.htm
5.ETL (extraction, transformation, loading) , komponentide
funktsioonid ja nendega seotud probleemid.
ETL utilities are most typically associated with data warehousing where some measure of data latency can be
tolerated. These tools are normally used to process batches of data in bulk mode and perform data cleansing
and transformation during this process. Meta data management is a feature that is required by most users of
ETL tools and data is usually staged in a staging area where it is cleansed, verified and transformed prior to
being loaded into the target system. During this process the meta data repository is generated and maintained.
ETL tools are mostly used in scenarios where bi-directional replication is not required. These tools are almost
always pure data movement tools and not well suited for the requirements associated with true real-time data
integration. Data movement occurs between the different source systems and the
operational data store or data warehouse. ETL technology normally exists as a tool or utility and does not
always provide the total data integration solution required in the complex environment of realtime,
cross-platform data integration.
http://www.itcollege.ee/~gseier/dmdirect_article.cfm.html
6.Andmete kvaliteediga seotud probleemid.
Essential problems and causes of insufficient data quality are listed as:
• Incorrect values, missing values and references, duplicates (these do not interfere operative systems).
• Inconsistent data between different systems.
• Incorrect data gathering and data operations.
• Insufficient plausibility checks in operative systems (e.g. during data input).
• Changes in the operative systems that are not documented or forwarded to other systems.
• Insufficient modelling and redundant data.
• System problems (technical)
7.Andmete kvaliteedi tagamise poliitika.
Possibilities for ensuring data quality are mentioned as:
• Quality checks before the final loading into the Data Warehouse Data Base.
• Data Cleansing in the ETL process.
• Loading and tagging problematic data (e.g. referential integrity or value domains)
• Automatic correction and data cleansing (e.g. format errors).
• Manual correction and data cleansing (e.g. data interpretation; frequent the problems are already
known by the domain expert).
• Feedback to data suppliers about test results (for possible data correction and further data delivery).
• Error location and co-ordination with data suppliers.
• Organisational approaches.
Interestingly not one enterprise listed the integration of the quality specification and quality measurement
in the meta data management. If possible the data quality lacks should be reported to the data suppliers
and improvement should start at their causes (proactive). A continuous contact between the central data
warehouses and source systems is therefore useful.
Typical objectives of a data quality program include:
o
o
o
o
o
o
o
Eliminate redundant data
Validate data at input
Eliminate false null values
Ensure data values fall within defined domains
Resolve conflicts in data
Ensure proper definition and use of data values
Establish and apply standards.
8.OLTP (transaktsioonipõhised) ja DSS (otsustussüsteemide)
andmebaaside arhitektuuri ja toimimise eripärad (Sybase IQ baasil).
OLTP andmebaasi süsteemid on parimad muutuvate andmete administreerimiseks ning on sobiv just reaalajas
toimuvate muutuste ja suure arvu kasutajate jaoks. Kuigi päringute suurused on väiksed tehakse neid palju ja
üheaegselt. Näited: pank, piletite müük. Olulised omadused on andmete lukustamise võimalused (concurrency)
muudatuste tegemise hetkel ning võimalus veenduda et muudatuste protsess on edukalt lõpetatud (Atomicity).
OLTP süsteemid nõuavad väga hästi disainitud andmestruktuuri, optimaalset indekseerimist ning võimaldavad
teha varundust jooksvalt.
DSS süsteemid on ideaalsed muutumatute andmete päringute sooritamiseks (statistika, arhiivid, andmeladu).
DSS andmebaasides on tabelid tavaliselt väga indekseeritud ning anmeid eeltöödeldakse ja organiseeritakse et
erinevate päringutüüpidega kohanduda. Kuna kasutajad ei muuda andmeid pole concurrency ja atomicity
probleemiks. Andmeid uuendatakse tavaliselt pikemate perioodide tagant.
9.Sybase IQ tabelite administreerimine.
Sybase IQ applications use SQL to access Sybase IQ databases. The SQL interface is fully compatible with the
Sybase Adaptive Server Anywhere and Sybase Adaptive Server Enterprise SQL syntax. It is also ANSI SQL-92
compliant, and supports both ODBC 3.0 and JDBC call-level SQL interfaces. Java stored procedures can be
employed to place business logic in the database.
Creating tables in Sybase Central
To create a table using Sybase Central, see "Managing tables" in Introduction to Sybase IQ.
SQL statement for creating tables. The SQL statement for creating tables is CREATE TABLE.
CREATE TABLE skill (
skill_id INTEGER NOT NULL,
skill_name CHAR( 20 ) NOT NULL,
skill_type CHAR( 20 ) NOT NULL
)
Indexes and IQ UNIQUE
If you estimate IQ UNIQUE incorrectly, there is no penalty for loads; the Optimizer simply uses the next larger
index. For queries, if you estimate IQ UNIQUE incorrectly and you have an HG, LF, or storage-optimized default
index, the Optimizer ignores the IQ UNIQUE value and uses the actual number of values in the index. If you do
not have one of these indexes and your estimate is wrong by a significant amount (for example, if you specify
IQ UNIQUE 1000000 when the actual number of unique values is 12 million), query performance may suffer.
To change the value of IQ UNIQUE for an existing index, run the sp_iqrebuildindex procedure. For details, see
Using the ALTER TABLE statement
The following command adds a column to the skill table to allow space for an optional description of the skill:
ALTER TABLE skill
ADD skill_description CHAR( 254 )
The following statement changes the name of the entire table:
ALTER TABLE skill
RENAME qualification
These examples show how to change the structure of the database. The ALTER TABLE statement can change
many characteristics of a table—foreign keys can be added or deleted, and so on. However, you cannot use
MODIFY to change table or column constraints. Instead, you must DELETE the old constraint and ADD the new
one. In all these cases, once you make the change, stored procedures, views, and any other item referring to
this column will no longer work.
Altering tables in a join index
You cannot ADD, DROP or MODIFY a base table column that participates in a join condition of a join index. To
alter joined columns, you must first drop the join index, alter the table, and then recreate the join index.
Dropping tables
The following DROP TABLE statement deletes all the records in the skill table and then removes the definition of
the skill table from the database
DROP TABLE skill
Like the CREATE statement, the DROP statement automatically executes a COMMIT before and after dropping
the table. This makes permanent all changes to the database since the last COMMIT or ROLLBACK.
The DROP statement also drops all indexes on the table, except if any column in the table participates in a join
index.
Dropping a table in Sybase Central
Connect to the database.
Click the Tables folder for that database.
Right-click the table you wish to delete, and select Delete from the pop-up menu.
Creating a primary key
The following statement creates the same skill table as before, except that a primary key is added:
CREATE TABLE skill (
skill_id INTEGER NOT NULL,
skill_name CHAR( 20 ) NOT NULL,
skill_type CHAR( 20 ) NOT NULL,
primary key( skill_id )
)
The primary key values must be unique for each row in the table which, in this case, means that you cannot
have more than one row with a given skill_id. Each row in a table is uniquely identified by its primary key.
Columns in the primary key are not allowed to contain NULL. You must specify NOT NULL on the column in the
primary key.
Creating a primary key in Sybase Central
Creating a primary key in Sybase Central
Connect to the database.
Click the Tables folder for that database.
Right-click the table you wish to modify, and select Properties from the pop-up menu to display its property
sheet.
Click the Columns tab, select the column name, and either click Add to Key or Remove from Key. Column
values must be unique.
Creating foreign keys
You can create a table named emp_skill, which holds a description of each employee's skill level for each skill in
which they are qualified, as follows:
CREATE TABLE emp_skill(
emp_id INTEGER NOT NULL,
skill_id INTEGER NOT NULL,
"skill level" INTEGER NOT NULL,
PRIMARY KEY( emp_id, skill_id ),
FOREIGN KEY REFERENCES employee,
FOREIGN KEY REFERENCES skill
)
The emp_skill table definition has a primary key that consists of two columns: the emp_id column and the
skill_id column. An employee may have more than one skill, and so appear in several rows, and several
employees may possess a given skill, so that the skill_id may appear several times.
The emp_skill table also has two foreign keys. The foreign key entries indicate that the emp_id column must
contain a valid employee number that is a primary key in the skill table from the employee table, and that the
skill_id must contain a valid entry that is a primary key in the skill table from the skill table.
A table can only have one primary key defined, but it may have as many foreign keys as necessary.
Creating a foreign key in Sybase Central
Connect to the database.
Click the Tables folder for that database.
Click the table holding the primary key, and drag it to the foreign key table.
When the primary key table is dropped on the foreign key table, the Foreign Key Wizard is displayed, which
leads you through the process of creating the foreign key.
All the information about tables in a database is held in the system tables. The information is distributed among
several tables. You can use Sybase Central or DBISQL to browse the information in these tables. Type the
following command in the DBISQL command window to see all the columns in the SYS.SYSTABLE table:
SELECT * FROM SYS.SYSTABLE
Displaying system tables in Sybase Central
Connect to the database.
Right-click the database, and select Filter Objects from the pop-up menu.
Select SYS and OK.
When you view the database tables or views with Show System Objects checked, the system tables or views
are also shown.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.iq/html/iqintro/meta.htm
http://216.239.59.104/search?q=cache:Z1XeE5DiaWMJ:www.sybase.cz/prod/dbase/pdf/IQ_DBAssoc.pdf+syba
se+IQ+indexing+howto&hl=en
10.Sybase IQ salvestatavate andmete mahu vähendamise
põhimõtted.
Sybase IQ employs patented bit-wise indexing and compression technology to index all the
fields in a data warehouse database. This approach aids query performance in addition to reducing disk storage.
Bit-wise indexing reduces disk space requirements. Unlike traditional RDBMSs, where indexes occupy additional
disk space over and above that required by the actual data, Sybase IQ bit-wise indexing does not require any
additional disk space. In Sybase IQ, the index is the database, and although disk space requirements for this
index will vary by application, the amount of disk space required by a Sybase IQ database (including indexes) is
nearly always smaller than the size of the input data.
Sybase IQ indexes offer these benefits over traditional indexing techniques:





Index sizes remain small. The entire database can be fully indexed and made available for ad hoc
queries in the same space that would be needed to store the raw data. Most traditional databases need
three times as much space.
Queries are resolved by efficiently combining and manipulating indexes on only the relevant columns.
This avoids time-consuming table scans.
I/O is minimized, eliminating potential bottlenecks.
Because indexes are compact, more data can be kept in memory for subsequent queries, thereby
speeding throughput on iterative analysis.
Tuning is data-dependent, allowing data to be optimized once for any number of ad hoc queries.
11.Sybase IQ indeksite põhitüübid, nende otstarbeka kasutamise
põhimõtted, indeksite loomine.
For any column that has no index defined, or whenever it is the most effective, query results are produced
using the default index. This structure is fastest for projections, but generally is slower than any of the three
column index types you define for anything other than a projection. Performance is still faster than most
RDBMSs since one column of data is fetched, while other RDBMSs need to fetch all columns which results in
more disk I/O operations.
If a column is used only in projections, even if some of the queries return a small number of rows, Low_Fast
and High_Non_Group indexes are redundant because the default structure is equally as fast for projecting a
small number of rows.
The Low_Fast (LF) index type
This index is ideal for columns that have a very low number of unique values (under 1,000) such as sex,
Yes/No, True/False, number of dependents, wage class, and so on. LF is the fastest index in Adaptive Server
IQ.
When you test for equality, just one lookup quickly gives the result set. To test for inequality, you may need to
examine a few more lookups. Calculations such as SUM, AVG, and COUNT are also very fast with this index.
As the number of unique values in a column increases, performance starts to degrade and memory and disk
requirements start to increase for insertions and some queries. When doing equality tests, though, it is still the
fastest index, even for columns with many unique values.
Recommended use
Use an LF index when:
A column has fewer than 1,000 unique values.
A column has fewer than 1,000 unique values and is used in a join predicate.
Never use an LF index for a column with 10,000 or more unique values. If the table has fewer than 25,000
rows, use an HG index, as fewer disk I/O operations are required for the same operation.
Advantages and disadvantages of Low_Fast
The High_Group (HG) index type
The High_Group index is commonly used for join columns with integer data types. It is also more commonly
used than High_Non_Group because it handles GROUP BY efficiently.
Use an HG index when:
The column will be used in a join predicate
A column has more than 1000 unique values
Foreign key columns require their own, individual HG index. However, if a join index exists, the same column
cannot have both an explicitly created HG index and a foreign key constraint.
The High_Non_Group (HNG) index type
Add an HNG index when you need to do range searches.
An HNG index requires approximately three times less disk space than an HG index requires. On that basis
alone, if you do not need to do group operations, use an HNG index instead of a HG index.
Conversely, if you know you are going to do queries that a HG index handles more efficiently, or if the column
is part of a join and/or you want to enforce uniqueness, use a HG index.
Using the HNG index in place of a HG index may seriously degrade performance of complex ad-hoc queries
joining four or more tables. If query performance is important for such queries in your application, choose both
HG and HNG.
Use an HNG index when:
The number of unique values is high (greater than 1000)
You don't need to do GROUP BY on the column
The Compare (CMP) index type
A Compare (CMP) index is an index on the relationship between two columns. You may create Compare indexes
on any two distinct columns with identical data types, precision, and scale. The CMP index stores the binary
comparison (<, >, or =) of its two columns.
The CMP index can be created on columns that are NULL, NOT NULL, or a mixture. The CMP index cannot be
unique. Note that numeric and decimal data types are considered identical. You may create CMP indexes on
them when precision and scale are identical. For CHAR, VARCHAR, BINARY, and VARBINARY columns, precision
means having the same column width.
The following restrictions apply to CMP:
You can drop CMP indexes.
CMP indexes cannot be unique.
CMP indexes are not replicated in underlying join indexes.
A partial width insert into a table is disallowed when not all columns of a CMP index are part of the insert.
An exception is raised if you attempt to alter or delete a column that is defined in a CMP index.
Users cannot ALTER TABLE MODIFY an existing column that is defined in a CMP index.
CMP indexes do not support the BIT, FLOAT, DOUBLE, and REAL data types.
The Containment (WD) index type
This index allows you to store words from a column string of CHAR and VARCHAR data.
Use a WD index for the fastest access to columns that contain a list of keywords (for example, in bibliographic
record or Web page).
The
You
The
The
The
following restrictions apply to WD:
cannot specify the UNIQUE attribute.
WD index is used only with the CONTAINS or LIKE predicate.
column-name must identify a CHAR or VARCHAR column in a base table
minimum permitted column width is 3 bytes and the maximum permitted column width is 32767 bytes.
Three index types are used to process queries involving date, time, or datetime quantities:
A DATE index is used on columns of data type DATE to process certain queries involving date quantities.
The TIME index is used on columns of data type TIME to process certain queries involving time quantities.
The DTTM index is used on columns of data type DATETIME or TIMESTAMP to process certain queries involving
datetime quantities.
Recommended use
Use a DATE, TIME, or DTTM index in the following two cases, when the DATE, TIME, DATETIME, or TIMESTAMP
column is used in queries containing date and time functions and operations:
For a simple equality predicate (no DATEPART) with a DATE, TIME, DATETIME, or TIMESTAMP column, LF and
HG indexes have the best performance. If an LF or HG index is not available, then the DATE, TIME, or DTTM
index is used to get the result.
If a DATE, TIME, DATETIME, or TIMESTAMP column is used in the GROUP BY clause or in the WHERE/HAVING
clauses for equalities (including join conditions) or IN predicates, the column needs an LF or HG index, as only
these indexes can do fast equality. Also see the section " Additional indexes" for index recommendations for
DATE, TIME, DATETIME, and TIMESTAMP columns.
For a query with an equality predicate (= or !=), if one side of the comparison is a DATEPART expression or
some other date and time function (e.g., YEAR, QUARTER, DAY, MINUTE), and the other side of the comparison
is a constant expression (including a constant value or host variable), then the DATE, TIME, or DTTM index is
used (if the index is available) to get the result set.
For example, the DATE, TIME, or DTTM index is used in the following queries:
SELECT * FROM tab WHERE DATEPART(YEAR, col1) = 2002;
SELECT * FROM tab WHERE DATEPART(HOUR, col2) = 20;
SELECT * FROM tab WHERE MINUTE (col3) != 30;
SELECT * FROM tab WHERE DATEPART(MONTH, col2) = @tmon;
where @tmon is an INTEGER host variable.
The DATE, TIME, and DTTM indexes do not support some date parts (Calyearofweek, Dayofyear, Millisecond)
and DATEPART range predicates. For example,
The DATE, TIME, and DTTM indexes have performance consistent with the HNG index. Compared to HNG,
DATE, TIME, and DTTM indexes are generally faster (up to twice as fast) than HNG in the supported cases. In
the special cases discussed in the "Recommended use" section, the performance of the DATE, TIME, and DTTM
indexes is even better. Therefore, an HNG index is not necessary in addition to a DATE, TIME, or DTTM index on
a column of DATE, TIME, DATETIME, or TIMESTAMP data type.
Additional indexes
The recommendation is to always have a DATE, TIME, or DTTM index on a column of DATE, TIME, DATETIME,
or TIMESTAMP data type, if the column is referenced in the WHERE clause, in ON conditions, or in the GROUP
BY clause. In addition, the HG or LF index may also be appropriate for a DATE, TIME, DATETIME, or TIMESTAMP
column, especially if you are evaluating equality predicates against the column. A LF index is also
recommended, if you frequently use the column in the GROUP BY clause and there are less than 1000 distinct
values (i.e., less than three years of dates).
Optimizing performance for ad hoc joins
When you create an additional column index, the CREATE INDEX command creates the new index as part of the
individual table and as part of any join indexes that include the column. CMP and multicolumn HG indexes are
the only exception to this rule. If the existing column indexes in the individual table already contain data, the
CREATE INDEX statement also inserts data into the new index from an existing index. This ensures data
integrity among all the column indexes for columns within an individual table. Data is also inserted and
synchronized automatically when you add an index to previously loaded tables that are part of a join index.
http://sybooks.sybase.com:80/onlinebooks/groupiq/iqg1250e/iqapg/@Generic__BookView;pt=462?DwebQuery=Indexing
Vastused peaksid olema põhjalikud, lühivastused ei tule arvesse. Materjale ümber kirjutada ja kaasvõitlejatelt
copy-paste teha pole mõtet.
Lugemiseks:
www.datawarehouse.com/article/?articleID= ……… (punktide asemele tuleb kirjutada artikli number )
Relatsioonilistest andmebaasidest: 4436
Andmekvaliteet: 2912 4681 7148
Metaandmed: 4978 4979
Andmete integreerimine: 4942 4489 4897 1014857
Ka www.dmreview.com/article_sub.cfm?articleId=8263 võiks huvi pakkuda.
Andmeladundus: 4916 4864 4814 4850 4780 4662 4473
www.dmreview.com/editorial/dmreview/print_action.cfm?EdID=6248 annab üht-teist struktureerimata andmete
käsitlemisest, nagu ka samalt aadressilt EdID=5891.
Metaandmetega algtutvuseks sobib
www.ukoln.ac.uk/metadata/publications/nutshell
www.ukoln.ac.uk/metadata/review.html
„Andmebaaside integreerimine ja optimeerimine“
Kodused tööd.
Kuupäeva parser. Andmetes võib kuupäev alla esitatud väga mitmesugusel kujul. Näiteks eesti keeles on tavaks
kirjutada kuupäev kujul 11. jaan(uar(il)) (20)04(.)a(.) või 11.01.04, või 11.01.2004 . Teistes keeltes ja
kultuurides on kuupäevade esitusviisid erinevad – ameeriklastel üldjuhul mm/dd/(yy)yy, inglastel dd/mm/yyyy
jne. Kuupäeva komponente eraldavad sümbolid võivad olla erinevad.
Euroopa Liidu standardi kohaselt tuleb kuupäev esitada kujul yyyy-mm-dd (see peaks olema ka Eesti
andmetöötluse standardiks).
Ülesanne: Projekteerida ja kirjutada programm(parser), mis teisendab standardsele kujule sisendisse antava
suvalises formaadis kuupäeva. Programm peaks olema õpivõimeline – kord sisendisse sattunud ja lahendatud
formaadi kohta salvestatakse kõik teisenduseks vajalik. Kui sisendformaat pole tuntav, peaks interaktiivselt
küsima tegevusjuhendeid.
Ideed eesti keeles kirjutatult, programmi võib realiseerida suvalises programmeerimiskeeles.
Ülesanne on väga mahuks; idee kirjeldusele see ei mõju, realisatsioonis võib piirduda 3-4 eriti markantse
näitega.
Numbrid sõnadega (eesti keeles). Aruandluses nõutakse tihti lisaks numbritele ka nende piisavalt
võltsimiskindlat sõnalist esitust. Numbrite võltsimiskindluse tõstmiseks lisatakse neile vasakule täitmata
positsioonidesse sümbolid, millest on raske numbreid tuletada, näit. tärn(*). Sõnalises vormis tuleb eesti keeles
tähelepanu pöörata sajaliste esitusele – näit. tavapärase „sada kaks“ (102) asemel tuleb kasutada vormi „ükssada
kaks“. Tähelepanu ka nn. „teistmetele“ !
Programm tuleb esitada SQLis (Oracle PL/SQL ei kõlba) soovitavalt andmebaasi salvestatud funktsioonina
(protseduurina).
Aeglaselt muutuvate andmete ökonoomne salvestamine koos muutumiste ajaloo säilimisega. Tüüpsituatsioon
andmeaidanduses, kus põhisüsteemidest tulevates andmetes muutuvad ainult vähesed väljad, nii et andmete nn.
hetkvõtteid tervikuna pole otstarbeks säilitada. Põhitingimuseks on, et põhisüsteemide normaalset tööd häirida ei
tohi ning nende poolt mingit täiendavat teavet muutuste kohta andmetes ei tule. Selgitamaks, mis ja kuidas on
muutunud võrreldes eelmise andmevõtuga peab olema loodud mingi muutuste tuvastamise ja salvestamise
mehhanism. Eeskujuks võiks võtta ühe artiklis „Improved ETL Change Detection through Operational Meta
Data“ by M. Jennings (link: www.datawarehouse.com/article/?articleID=2982) toodud näidetes või metoodika,
millest ma olen loengutes rääkinud.
Kodutööde esitamine on eksami eduka sooritamise eelduseks ning need tuleks esitada vähemalt 1 nädal
enne eksamit (eksam on 13. ja 14. jaanuaril 2005. a.).