Download Mapping - The Atlas Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Four Creeks Unincorporated Area Council
System Architecture
Four Creeks UAC Atlas Program
A TLAS S YSTEM A RCHITECTURE
G L O S S AR Y
A data warehouse (DW) is a database used for reporting. The data is uploaded from the operational
systems for reporting. The data may pass through an operational data store for additional operations
before it is used in the DW for reporting.
A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is
used to store raw data for use by developers (analysis and support). The integration layer is used to
integrate data and to have a level of abstraction from users. The access layer is for getting data out for
users.
This definition of the data warehouse focuses on data storage. The main source of the data is cleaned,
transformed, catalogued and made available for use by managers and other business professionals for
data mining, online analytical processing, market research and decision support (Marakas & OBrien
2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to
manage the data dictionary are also considered essential components of a data warehousing system.
Many references to data warehousing use this broader context. Thus, an expanded definition for data
warehousing includes business intelligence tools, tools to extract, transform and load data into the
repository, and tools to manage and retrieve metadata.
A data mart (DM) is the access layer of the data warehouse (DW) environment that is used to get data
out to the users. The DM is a subset of the DW, usually oriented to a specific business line or team.
In practice, the data mart and data warehouse each tend to imply the presence of the other in some
form. However, most writers using the term seem to agree that the design of a data mart tends to start
from an analysis of user needs and that a data warehouse tends to start from an analysis of what data
already exists and how it can be collected in such a way that the data can later be used. A data
warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data
repository that may or may not derive from a data warehouse and that emphasizes ease of access and
usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but
somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate
need.
G L O S S AR Y
An operational data store (or "ODS") is a database designed to integrate data from multiple
sources for additional operations on the data. The data is then passed back to operational systems
for further operations and to the data warehouse for reporting.
Because the data originates from multiple sources, the integration often involves cleaning,
resolving redundancy and checking against business rules for integrity. An ODS is usually
designed to contain low-level or atomic (indivisible) data (such as transactions and prices) with
limited history that is captured "real time" or "near real time" as opposed to the much greater
volumes of data stored in the Data warehouse generally on a less-frequent basis.
UAC Atlas Program – 7 November 2011
System Architecture - 1
Four Creeks Unincorporated Area Council
Data mining (the analysis step of the Knowledge Discovery in Databases process[1], or KDD),
a relatively young and interdisciplinary field of computer science,[2][3] is the process of
discovering new patterns from large data sets involving methods from statistics and artificial
intelligence but also database management. In contrast to for example machine learning, the
emphasis lies on the discovery of previously unknown patterns as opposed to generalizing known
patterns to new data.
The term is a buzzword, and is frequently misused to mean any form of large scale data or
information processing (collection, extraction, warehousing, analysis and statistics) but also
generalized to any kind of computer decision support system including artificial intelligence,
machine learning and business intelligence. In the proper use of the word, the key term is
discovery, commonly defined as "detecting something new". Often the more general terms
"(large scale) data analysis" or "analytics" are more appropriate.
The actual data mining task is the automatic or semi-automatic analysis of large quantities of
data in order to extract previously unknown interesting patterns such as groups of data records
(cluster analysis), unusual records (anomaly detection) and dependencies (association rule
mining). These patterns can then be seen as a kind of summary of the input data, and used in
further analysis or for example in machine learning and predictive analytics. For example, the
data mining step might identify multiple groups in the data, which can then be used to obtain
more accurate prediction results by a decision support system. Neither the data collection, data
preparation or result interpretation and reporting are part of the data mining step, but do belong
to the overall data mining process as additional steps.
The related terms data dredging, data fishing and data snooping refer to the use of data mining
methods to sample parts of a larger population data set that are (or may be) too small for reliable
statistical inferences to be made about the validity of any patterns discovered. These methods
can, however, be used in creating new hypotheses to test against the larger data populations.
UAC Atlas Program – 7 November 2011
System Architecture - 2