Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
• Data Profiling
https://store.theartofservice.com/the-data-profiling-toolkit.html
Business intelligence Amount and quality of available data
1
Before implementation it is a
good idea to do data profiling
https://store.theartofservice.com/the-data-profiling-toolkit.html
Business intelligence Amount and quality of available data
Data Profiling: check
inappropriate value, null/empty
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data quality - Overview
1
Data profiling - initially assessing the data to
understand its quality challenges
https://store.theartofservice.com/the-data-profiling-toolkit.html
Extract, transform, load - Challenges
1
The range of data values or data quality in an
operational system may exceed the
expectations of designers at the time
validation and transformation rules are
specified. Data profiling of a source during
data analysis can identify the data conditions
that must be managed by transform rules
specifications. This leads to an amendment
of validation rules explicitly and implicitly
implemented in the ETL process.
https://store.theartofservice.com/the-data-profiling-toolkit.html
Extract, transform, load - Virtual ETL
1
By using a persistent metadata repository,
ETL tools can transition from one-time
projects to persistent middleware,
performing data harmonization and data
profiling consistently and in near-real time.
https://store.theartofservice.com/the-data-profiling-toolkit.html
Extract, transform, load - Tools
1
Many ETL vendors now have data profiling, data
quality, and metadata capabilities
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data profiling
1
Data profiling is the process of examining
the data available in an existing data
source (e.g. a database or a file) and
collecting statistics and information about
that data. The purpose of these statistics
may be to:
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data profiling - Introduction
1
Thus the purpose of data profiling is
both to validate metadata when it is
available and to discover metadata
when it is not
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data profiling - How to do Data Profiling
1
Normally purpose-built tools are
used for data profiling to ease the
process
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data profiling - When to Conduct Data Profiling
An additional time to conduct data
profiling is during the data warehouse
development process after data has
been loaded into staging, the data
marts, etc
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data profiling - Benefits of Data Profiling
1
Although data profiling is effective,
then do remember to find a suitable
balance and do not slip into “analysis
paralysis”.
https://store.theartofservice.com/the-data-profiling-toolkit.html
Surveillance - Data mining and profiling
1
Data profiling can be an extremely powerful
tool for psychological and social network
analysis
https://store.theartofservice.com/the-data-profiling-toolkit.html
Prototype - Data prototyping
To achieve this, a data architect uses a
graphical interface to interactively develop
and execute transformation and cleansing
rules using raw data. The resultant data is
then evaluated and the rules refined. Beyond
the obvious visual checking of the data onscreen by the data architect, the usual
evaluation and validation approaches are to
use Data profiling software and then to insert
the resultant data into a test version of the
target application and trial its use.
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data loading - Virtual ETL
By using a persistent metadata
repository, ETL tools can transition
from one-time projects to persistent
middleware, performing data
harmonization and data profiling
consistently and in near-real time.
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Angoss - Software
* KnowledgeSEEKER is a data mining
product. Its features include data profiling,
data visualization and decision tree
analysis.[http://www.comsol.ch/content.ph
p?si=317id=132anzeige=Angoss%20Prod
ucts COMSOL ONLINE - ANGOSS Knowledge Engineering] It was first
released in 1990.
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Angoss - Software
1
* KnowledgeSTUDIO is a data mining
and predictive analytics suite for the
model development and deployment
cycle. Its features include data
profiling, data visualization, decision
tree analysis, predictive modeling,
implementation, scoring, validation,
monitoring and scorecard
development.
https://store.theartofservice.com/the-data-profiling-toolkit.html
Integration competency center - Central services ICC
It also offers more support for
development projects, providing
management, development resources,
data profiling, data quality, and unit
testing
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
IBM Infosphere - IBM InfoSphere software
1
* IBM InfoSphere Information Analyzer [
http://www01.ibm.com/software/data/infosphere/in
formation-analyzer/ IBM - Data
Profiling, Data Rules and Quality
Monitoring - InfoSphere Information
Analyzer - Software] to profile and track
data quality
https://store.theartofservice.com/the-data-profiling-toolkit.html
Talend - Data management
1
* Talend Open Studio for Data Quality:
an open source data profiling tool that
examines the content, structure and
quality of complex data structures
https://store.theartofservice.com/the-data-profiling-toolkit.html
Oracle Warehouse Builder - Features
Further it offers capabilities for
Relational model|relational,
Dimensional modeling|dimensional
and metadata modeling|metadata
data modeling, data profiling, data
cleansing and data auditing
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Oracle Warehouse Builder - History
The 10gR1 release was essentially a
certification of the 10g database, and
the 10gR2 release (code named Paris)
was a huge release incorporating a
wide spectrum of functionality from
dimensional modelling to data profiling
and quality
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data quality assurance
1
'Data quality assurance' is the process
of Data profiling|profiling the data to
discover inconsistencies and other
anomalies in the data, as well as
performing data cleansing activities
(e.g. removing outliers, missing data
interpolation) to improve the data
quality .
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data quality assurance - Overview
1
#Data profiling - initially assessing the data to
understand its quality challenges
https://store.theartofservice.com/the-data-profiling-toolkit.html
Data movement - Virtual ETL
1
By using a persistent metadata repository,
ETL tools can transition from one-time
projects to persistent middleware,
performing data harmonization and data
profiling consistently and in near-real time.
https://store.theartofservice.com/the-data-profiling-toolkit.html
Jumper 2.0 - Features
1
* User published data
profiling
https://store.theartofservice.com/the-data-profiling-toolkit.html
Information Server - Architecture overview
:*Understand — data profiling and
metadata creation to understand the
content, quality, and structure of
information as it resides in source systems
1
https://store.theartofservice.com/the-data-profiling-toolkit.html
Information Server - History
1
The core technologies of an information
server are not new. Data integration
technologies like extract, transform, and
load (ETL), data cleansing and matching
(both relational and probabilistic
approaches), data profiling, and data
federation or replication have been
around for many years. Reputable
vendors and several discrete but interrelated markets focus on solutions for
these differing styles of data integration
(ETL, data quality, data replication, data
federation, etc.).
https://store.theartofservice.com/the-data-profiling-toolkit.html
Covert surveillance - Data mining and profiling
1
Data profiling can be an extremely powerful tool for
psychological and social network analysis
https://store.theartofservice.com/the-data-profiling-toolkit.html
For More Information, Visit:
• https://store.theartofservice.co
m/the-data-profilingtoolkit.html
The Art of Service
https://store.theartofservice.com