Download data warehousing and data mining applications for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Warehousing And Data Mining Applications For Atmospheric Studies
DATA WAREHOUSING AND DATA MINING APPLICATIONS FOR
ATMOSPHERIC STUDIES
1
VENKATA SHESHANNA KONGARA, 2D. PUNYASESUDU
Research Scholar, Department of Computer Science & Technology, Sri Krishnadevaraya, University, Anantapur, Andhra
Pradesh, India
Professor, Department of Physics, Rayalaseema University, Kurnool, Andhra Pradesh, India
Abstract— Meteorology is an important area of practice and research of the atmospheric considerations that focuses on
weather conditions. In current global scientific environment the atmospheric data and its information is one of the most
valuable asset for scientists and researchers to evaluate the similarity in appearance with hidden patterns for analytical
reporting and atmospheric forecasting. There are exceptional opportunities in the information systems to analyze these
datasets and extract the useful information to execute and determine the imminent directions in the atmospheric disciplines.
The data warehousing and data mining applications are the most emerging technologies which are endorsed that information
to be easily and efficiently accessed while building and deploying data driven analytics for better knowledge in assisting the
right decision making activities. However there has not been much work done to relate meteorology with data warehousing
and data mining in general and practices. Hence this paper presents the importance of various data warehousing and data
mining applications from these streams including star schema based data model, efficient architecture framework and
methodology to process the atmospheric data. And also discussed the approaches how to analyze, integrate and manage large
volume of atmospheric data with query and information analysis techniques for effective scientific decision supporting and
predictive analysis. The proposal will provide the reliable solutions and improve the productivity and decision making
efficiency in the meteorological domain.
Keywords—Atmospheric data, Data warehousing, Data mining, Decision supporting, Hidden patterns, Information systems
and Predictive analysis
I.
Data warehouse is pretend by W. H. Inmon [1] in the
book “Building the Data Warehouse” (1996). He gave
the first definition of data warehouse as “A warehouse
is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of
management's decision making process”. Initially
data warehouse is used in commercial business to help
manager’s decision. In these years, data warehouse is
progressively used in wider fields which include many
scientific fields. It is a massive collection of storage
area which serves as a centralized repository of all the
data collected from various departments or processing
units from the large organizations and managed
systematically for meaningful information and
analysis for effective decision supporting. The
illustration of the definition is as following
Subject oriented: Means that all the relevant datasets
storage is organized into specific subject area with
summarized information.
Integrated: Means that datasets storage is to be
distributed from heterogeneous sources, which have to
be integrated and data are made consistent with
globally accepted standards and measurements
Time-variant: Means that data stored may not be
current but differ with representing the long term time
window like 5 to 10 years.
Non-volatile: Means that data storage is never
over-written or deleted, once committed, the data are
static, read-only, though managed for future analysis.
INTRODUCTION
Meteorology is an essential area of practice and
research about Atmosphere from the earth to higher
levels in the space. However the studying approach
has been changed while finding the facts and trends to
improve the scope of forecasting and evaluate the
effects of dynamically changing atmospheric
conditions. There are different statistical and
scientific methods to process the meteorological
datasets and measure the correlated innovations.
However researchers are facing technical challenges
in storing, retrieving, managing and exploration of
these structured and un-structured data which is very
large in size. Hence the approaches of data
maintenance and conversion involves in time
consuming, expensive, and complex to mitigate the
accurate results. There are a number of exceptional
opportunities in the information systems to analyze
these datasets and extract the useful information to
execute and determine the imminent directions in the
atmospheric disciplines. The data warehousing and
data mining applications are the most emerging
technologies with powerful data managing concepts.
Which are endorsed that information to be easily and
efficiently accessed while building and deploying data
driven analytics for better knowledge in assisting the
right decision making activities.
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
75
Data Warehousing And Data Mining Applications For Atmospheric Studies
In order to understand data warehouses, it is important
to learn some essential theories, concepts, domains,
techniques and models for data management and
analysis. There are many data warehousing
approaches and research & development activities
which are initiated during few decades in various
fields. However in current scenario, as there are many
innovations in the hardware and software applications
usages there is a necessity to acknowledge the new
data challenges from innovative technology trends
and the value of real time insight and analysis , Hence
Inmon[1] who is the father of data warehousing
revisited the existing data warehousing framework
and functionalities to upgraded with DW2.0 [2], Since
there has been lot of progress in architecture,
technology and information systems these advances
have been pushed into the next generation of data
warehousing that includes many missing features and
functions that were not recognized as part of a data
warehouse.– the next generation of data warehousing
that has many integrated features, which support the
present technology trends will be strategic for most of
the organizations. Apart from that Dan Linstedt [3]
developed a new Data Model as discussed a
patent-pending technique called the Data Vault™ –
the next evolution in data modeling for enterprise data
warehousing. The Data Vault is a detail oriented,
historical tracking and uniquely linked set of
normalized tables that support one or more functional
areas of business. It is a hybrid approach
encompassing the best of breed between 3rd normal
form (3 NF) and star schema. The design is flexible,
scalable, consistent, and adaptable to the needs of the
enterprise. It is a data model that is architected
specifically to meet the needs of today’s enterprise
data warehouses.
The data Vault model follows in all the characteristics
as defined by Bill Inmon [1], excepting the subject
oriented feature of the data warehouse definition, to
whatever extent in the Data vault model those subject
oriented structures are substituted as functionally
based structures. The business keys are organized in
horizontal in nature and providing the visibility across
lines of business. Bill Inmon [1] also granted that the
Data Vault is the optimal choice for modeling the
Enterprise Data Warehousing in the DW 2.0 [2] [3]
proposal.
The rest of the paper is organized as follows. In section
II review the related work for atmospheric studies.
Details of the data warehousing applications for
atmospheric data are described in Section III. The data
mining applications for atmospheric data are
presented in section IV and conclude the paper in
Section V.
II.
The methods presented by Jayanthi T et al [4] to
develop large Scientific & Technical Databases (i.e.
Data Warehouse) creation and reconciliation of data
object and establishing metadata standards are very
important. Further defining metadata semantics,
creating discipline specific data dictionaries,
information models for organizing metadata, and data
models for describing data set structure also need to be
worked out. They discussed the Main Data Centres
available at IGCAR, Design and Development of
Scientific & Technical Data Warehouse, its
importance and visualization techniques leading to
knowledge discovery.
Ramon Lawrence [5] proposed an architecture for
archiving and analyzing real-time scientific data that
isolates researchers from the complexities of data
storage and retrieval. Data access transparency is
achieved by using a database to store metadata on the
raw data, and retrieving data subsets of interest using
SQL queries on metadata
Wang Zhijuan et al [6] presented a UML-based data
warehouse design method that spans the three design
phases (conceptual, logical and physical). Their
method comprises a set of Meta models used at each
phase, as well as a set of transformations that can be
semi-automated.
Ruilian Hou [7] introduced the development and
conception of data warehouse and database and
research the relationship between database and data
warehouse, and has studied the difference between
their technologies. He also discusses the combination
and application of the database and date warehouse
technology.
Gerasimos Marketos et al [8] discussed the
architecture of a so-called seismic data management
and mining system (SDMMS) for quick and easy data
collection, processing and visualization. The SDMMS
architecture includes, among others, a seismological
database for efficient and effective querying and a
seismological data warehouse for OLAP analysis and
data mining
Xiaoguang Tan [9] expressed that data warehouse as a
new kind of Artificial Intelligence (AI) system that
combines database and meteorological graphics
technology. It helps forecasters accumulate, manage
and use their knowledge in operational forecast. It is a
new generation of DSS. Obviously data warehouse
will not become whole system of forecaster’s
workbench, because operational forecast mission is
very complex. But it is a system to help forecasters
accumulate, manage and use their knowledge.
Aditya Kumar Gupta et al [10] proposed a
multidimensional data warehouse for agriculture that
provides solutions for farmers and gives a response of
their ad-hoc quires. This multidimensional schema
further promotes star schema and snowflake schema
that are commonly used to design data warehouses.
And also normalization is applied to store the data into
RELATED WORK
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
76
Data Warehousing And Data Mining Applications For Atmospheric Studies
star schema and duplicate values are removed, so that
space and time complexities could be minimized
Keshav Dev Gupta et al [11] discussed that, in order to
accurately reflect the user’s requirements into an out
of error, easy to understand, and easily extendable data
warehouse schema, special attention should be paid at
the dimensional modeling phase. And also present a
set of user modeling requirements
Vuda Sreenivasarao et al[12] discussed an overview of
scientific data warehouse and OLAP technologies,
with an emphasis on their data warehousing
requirements, The methods that were used include the
efficient computation of data cubes by integration of
MOLAP and ROLAP techniques, the integration of
data cube methods with dimension relevance analysis
and data dispersion analysis for concept description
and data cube based multi-level association,
classification, prediction and clustering techniques
Folorunsho Olaiya et al [13] investigated the use of
data mining techniques in forecasting maximum
temperature, rainfall, evaporation and wind speed.
This was carried out using Artificial Neural Network
and Decision Tree algorithms and meteorological data.
A data model for the meteorological data was
developed and this was used to train the classifier
algorithms. The performances of these algorithms
were compared using standard performance metrics,
and the algorithm which gave the best results used to
generate classification rules for the mean weather
variables. A predictive Neural Network model was
also developed for the weather prediction program and
the results compared with actual weather data for the
predicted periods.
Meghali A. Kalyankar et al [14] discussed on Data
mining Process for weather data to study weather data
using data mining technique like clustering technique.
By using this technique one can acquire Weather data
and can find the hidden patterns inside the large
dataset so as to transfer the retrieved information into
usable knowledge for classification and prediction of
climatic condition.
Gaurav J. Sawale et al [15] discussed how to use a data
mining technique to analyze the Metrological data
like Weather data. A variety of data mining tools and
techniques are available in the industry, but they have
been used in a very limited way for meteorological
data. And also explained how a neural network-based
algorithm for predicting the atmosphere for a future
time and a given location is presented.
III.
relationship between the atmospheric entities and the
objects in any type of processing rules to organize the
data management in an efficient way as shown in
Figure 1. However due to the limitations in data
integrity constraints with moderate schema design
approaches for large historical data, it makes a move
to dimensional modeling, and still there is a room to
connecting the relational databases directly with
currently available business intelligence applications
by querying and expanding the knowledge for
effective decision supporting.
Figure 1:Relational data model used for atmospheric data in
business intelligence reporting and data mining.
3.2. Dimensional Modeling with Star schema
The Dimensional modeling (DM) is different from
entity-relationship modeling (ER) and schema defined
with set of methods and concepts used in the data
warehouse design, each entity comes up with a context
which is a dimension table and qualifies a measurable
number (Fact Table), each fact table is associated in
centre of the schema surrounded by multiple
dimension tables like a star as shown in Figure 2 for
rapid query processing. According to data
warehousing expert Ralph Kimball [16] Dimensional
Modeling does not necessarily involve a relational
database. The same modeling approach, at the logical
level, can be used for any physical form, such as
multidimensional database or even flat files.
DATA WAREHOUSING
APPLICATIONS FOR ATMOSPHERIC
DATA
Figure 2: Dimensional modeling with star schema.
Ms. Alpa R. Patel et al [17] discussed that the
conceptual Entity-Relationship (ER) is extensively
used for database design in relational database
environment, which emphasized on day-to-day
operations. Multidimensional data (MD) modeling,
3.1. Advances in Relational databases and ER
Modeling:
The Entity Relationship (ER) Modeling is one of
the best data modeling technique which represents the
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
77
Data Warehousing And Data Mining Applications For Atmospheric Studies
on the other hand, is crucial in data warehouse design,
which targeted for managerial decision support. It
supports decision making by allowing users to
drill-down for a more detailed information, roll-up to
view summarized information, slice and dice a
dimension for a selection of a specific item of interest
and pivot to re-orient the view of MD data. They also
explored how the Multidimensional model can be used
as the yardstick of data warehouse design instead of
ER Model.
3.3. Data Extraction, Transformation and Loading
(ETL)
Extraction, Transformation and Loading (ETL)
systems is the back bone of the data warehouse
development. The ETL prototype control the data
availability in three different phases as showing figure
3., in extraction phase the data sets involves such as
text files spreadsheet and legacy systems for day to day
atmospheric operational data, in transformation phase
it maintains the standard structure with data
consistency and in loading phase it deals the semi
transformed and fully transformed dimensions data
and its metadata.
3.4. Data Organizing with Data Marts
Data warehouse is a large volume of enterprise data;
and the data mart is a subject oriented functional data
storage area specific to a group/departmental data
from the data warehouse as shown in figure 4. Usually
the data is extracted from the data warehouse and
organized in data marts with data de-normalization
and applied indexes to support the end users analysis.
Figure 4: Data Mart with subject area
Paulraj M et al [19] discussed that Data
warehousing collects the data at different levels (i.e.,
departmental, operational, functional) and stored as a
collective data repository with better storage
efficiency. Various data warehousing models
concentrate on storing the data more efficiently and
quickly. In addition accessibility of data from the
warehouse needs better understanding of the structure
in which the data layers are stored in the repository.
However function requirements of users are not easily
understood by the data warehouse model. It needs
efficient decision support system to extract the
required user demanded data from data warehouse.
They built a Functional Layer Interfaced Data Mart
Architecture (FLIDMA) to provide a better decision
support system for larger corporate and enterprise data
applications.
3.5. Online analytical processing (OLAP) and
Multidimensional Analysis
Online analytical processing and multidimensional
analysis techniques are the key data processing and
presentation techniques in the arm of business
intelligence application layer with broad category of
applications and methods like drill-down analysis,
roll-up analysis, drill-through analysis, slice and dice
analysis for gathering, sorting, analyzing and
presenting data to help end users to make better
business Decisions. As shown in figure 5 business
intelligence applications includes the activities of
business value drivers, decision support systems,
query and reporting, online analytical processing,
statistical analysis and forecasting.
Figure 3:Extraction, Transformation and Loading system
prototype.
Shaker H. Ali El-Sappagh et al [18] discussed that
Extraction–transformation–loading (ETL) tools are
pieces of software responsible for the extraction of
data from several sources, its cleansing,
customization, reformatting, integration, and
insertion into a data warehouse. Building the ETL
process is potentially one of the biggest tasks of
building a warehouse; it is complex, time consuming,
and consume most of the data warehouse project’s
implementation efforts, costs, and resources. Building
a data warehouse requires focusing closely on
understanding three main areas: the source area, the
destination area, and the mapping area (ETL
processes). The source area has standard models such
as entity relationship diagram, and the destination
area has standard models such as star schema, but the
mapping area has not a standard model. Hence they
have discussed about a model for conceptual design of
ETL processes and proposed a method with some
enhancements from existing to support the missing
mappings of the ETL system
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
78
Data Warehousing And Data Mining Applications For Atmospheric Studies
survey of mining of the conditional hybrid
dimensional association rule mining. From this
comparative study, the Boolean matrix based
approach is the best suited for mining
multidimensional association rule and for mining
conditional hybrid dimensional association rule. A
Boolean matrix based approach has been used to find
the frequent item sets, the items forming a rule come
from different dimensions
4.2. Classification and Decision Trees based
analysis
Classification is a supervised machine learning
technique used to build a model, once model has been
built and applied to unseen data for prediction of class
label. Building the accurate and efficient classifiers for
large databases is one of the essential tasks of data
mining and machine learning research. Building
effective classification systems is one of the central
tasks of data mining. Many different types of
classification techniques are available that includes
Decision Trees, Naive-Bayesian methods, Neural
Networks, Logistic Regression , Support Vector
Machines (SVM) and KNN (K- Means) etc. Decision
tree technique is used for finding classification
because it is ease of use for practice.
Sharma, N et al [22] discussed that troposphere
temperature measurements at high temporal, spatial,
and vertical resolutions are required for many
meteorological studies. Radiosonde and Global
Positioning System radio occultation (GPSRO)
observations have very high vertical resolutions but
poor in spatial and temporal coverage. Although the
sounders on geostationary satellites can provide high
temporal and spatial resolutions, their vertical
resolution is poor. Hence they proposed a method to
increase the vertical resolution of troposphere
temperature profiles obtained from geostationary
satellite observations based on an artificial neural
network (ANN) approach so that high-resolution
temperature profiles are available in all four
dimensions.
4.3. Cluster based analysis
The clustering is an unsupervised machine learning
technique, in which the class label not know in
advance consequently it is used to divide the data into
different clusters based on the similarity within the
cluster and dissimilar to another cluster.
Many different types of clustering techniques are
available that includes partitioning, hierarchical and
density based clustering etc. In this mainly k-means
clustering algorithm is mostly used in number of
applications.
A. Santhi Latha et al [23] discussed that how cluster
analysis can be helpful for mining spatial data. Cluster
analysis divides data into meaningful or useful groups
(clusters). If meaningful clusters are the goal, then the
resulting clusters should capture the “natural”
structure of the data.
Figure 5: Data warehousing and business intelligence OLAP
Cube.
IV.
DATA MINING APPLICATIONS FOR
ATMOSPHERIC DATA
Data mining is a process of identifying the
knowledge from the various atmospheric datasets to
find unknown patterns or rules for information
analysis to predict future trends and behaviors in
effective decision supporting. Current global scientific
environment data mining has become a powerful
information technology tool to evaluate the
resemblance with hidden patterns for atmospheric
forecasting. J. Han and M. Kamber [20] stated that
data mining is a multidisciplinary field with various
data mining techniques in drawing work from areas
including database technology, machine learning,
statistics, pattern recognition, information retrieval,
neural networks, knowledge-based systems, artificial
intelligence, high-performance computing, and data
visualization for knowledge discovery.
Various data mining techniques such as association
rules, Classification, regression, Clustering, Outlier
analysis and neural network based applications are
broadly used in atmospheric studies
4.1. Association Rules
Association rule mining is the mining of
Association rules for finding the relationships
between data items in large datasets the goal of
association rule mining is to find interesting patterns
in various fields. The two measures of interesting
patterns that are used most often in association rule
mining are support and confidence. The support of a
rule represents the percentage of transactions that a
given rule satisfies. The support of a rule is the
probability that both X and Y, where X is the
antecedent and Y is the consequent of the rule, are
contained in a transaction in the data that is being
mined. Association rules are considered interesting if
they satisfy both a minimum support threshold and a
minimum confidence threshold. The apriori algorithm
mostly used to find the frequent patterns, many
different types of association techniques are available
that quantitative association rules, multi dimensional
association rules, multilevel association rules,
Boolean association rules, etc.
Nilam K. Nakod et al [21] presented the overall
survey of mining multidimensional as well as the
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
79
Data Warehousing And Data Mining Applications For Atmospheric Studies
V.
Architecture
The data warehouse architecture typically consists of
several components which consolidate raw data from
several scientific operational and legacy systems to
support a variety of data presentation and data mining
analytical tools in the front-end and the data
collection, pre-processing data is in the back-end part
are considered as a data ware house.
ARCHITECTURAL FRAMEWORK AND
METHODOLOGY FOR ATMOSPHERIC
DATA
5.1. Architectural framework
As acknowledged in the Introduction of this paper
data warehousing and data mining applications are
the best suitable for atmospheric studies. However it is
an essential to improve architectural frame work and
methodology in current fast growing technology
trends and the data processing needs in a better way.
The goal of the framework is to simplify the design,
implementation, and management of data for efficient
solutions. A survey conducted to present in this paper,
different authors discussed on approaches used for
framework design. M. Laxmaiah et al [24] discussed
with a conceptual metadata framework for spatial data
warehouse. Ginjala Srikanth Reddy et al [25]
discussed the importance of data warehousing and
data mining concepts and suggested ad-hoc
data-mining framework for data warehouse
technology with association rules based data-mining
framework that is tightly integrated with the data
warehousing technology. Expressed their framework
has several advantages over the use of separate data
mining tools Nenad Jukic et al [26] address the issue
of failure of data warehousing projects due to
inadequate requirements collection and definition
process. They have described a framework that can
help accomplish the objective of developing a
business-driven, actionable set of requirements. They
expressed that the framework would consist of a series
of steps to facilitate for collection and definition of
requirements in data warehousing projects.
A comprehensive survey conducted on framework and
identified set of limitations in practices and
implementation factors in data warehousing and data
mining application design. Hence in this learning a
new prototype framework has been proposed as shows
in figure 6. It enables the reconciliation of an overall
outlook for developers and resources involved in the
project. And also helps in reducing the costs and easier
to see the control of changes on the whole
implementation.
Figure 7: Data warehousing and mining architecture.
5.2. Atmospheric dimensional model
The dimensional modeling is plays an important role
in data warehouse design. As an implementation case
study collected 15 years of large volume of
meteorological radiosondes data from British
Atmospheric Data Centre (BDAC) are used for the
water vapour studies and built a dimensional model.
The radiosondes data which obtained are containing
stations details, air pressure, air temperature, dew
point temperature, wind speed, wind direction and
various parameters are processed with calculated
measures and its datasets into star based dimensional
model as shown in Figure 8
Figure 8: Star Schema based dimensional model for atmospheric
data.
5.3. Atmospheric data warehouse building
methodology
The data warehousing development is obviously
complex moreover at the end of the results is
expensive and time consuming. Presently there are
several methodologies available in the data
Figure 6: framework prototype for data warehousing and data
mining applications development
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
80
Data Warehousing And Data Mining Applications For Atmospheric Studies
warehousing market; Saroop, S et al [27] conducted
comparative study of different approaches used in data
warehouse design. Different authors have proposed
different techniques at different levels. However the
data processing approaches are different from data
warehouse to data mining framework, hence
proposing the integrated IDCARD methodology for
end to end data warehousing and data mining
application development as shown in the figure 6.
Implication prototype framework.
The comprehensive IDCARD methodology will cover
the entire process of data warehousing and data
mining requirements of the end users. The
methodology consists of 6 different phases like
Initiate (I), Design (D), Construct (C), Review (R) and
Decision (D) as a sequential step by step process to
execute in an efficient way with set of standard and
guidelines. The proposal will establish a link between
the methodology and the requirement domain to
improve the effectiveness of project implementation
for effective decision supporting.
Initiate (I) phase:
The main purpose of Initiate phase is to identify the
scope, goals, objectives and technical requirements of
the DW and DM applications.
project with high level and low-level design
specifications.
Following are the key activities in this phase and the
process flow is shown in figure 10.
 Logical design/physical design for the system
 Data model design for entire system
 ETL Mappings design with data assessment
 OLAP design with presentation engine
 Design the DW/DM Architecture
 Data visualizations design
 DWH and DM process design
 Integration testing design
 Meta data repository design for DWH and DM
tools usage
Figure 10: Design process flow diagram.
Construct (C) Phase:
The main purpose of construct phase is to construct
entire system using the integration services, complete
the technical documentation, and to execute the test
cases.
Figure 9: Initiate process flow diagram.
Following are the key activities in this phase and the
process flow is shown in figure 9.
 Describe the project scope with understanding
the requirements domain
 Initiate the meetings and Requirements
gathering
 Consolidates the gaps if any from the
requirements and technical implications
 Initiate the architecture and infrastructure
plans and software selection
 Outline the project risks for ETL/ Reports and
data mining process
 distinguish the quality assurance and test
planning
Design (D) Phase:
The main purpose of design phase includes logical
design and building architectural components of a
Figure 11: Construct process flow diagram
Following are the key activities in this phase and the
process flow is shown in figure 11.
 Construct the Integration systems.
 Develop the ETL programs
 Build the DW/Data Mart
 BI/DM applications or Reports
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
81
Data Warehousing And Data Mining Applications For Atmospheric Studies
 Develop ETL/BI/DM unit tests
 Build the Metadata repository
 Construct the User Acceptance Test (UAT)
cases
Arrange (A) and Review (R) Phase:
The purpose of arrange and review phase is to ensure
that the system integrations meets the requirements
documented in the initiate phase. Then review the
system with quality standards through the quality
assurance plan and user acceptance testing. Then it is
deployed into production.
 Data mining model prototype/algorithm
evaluation
 Information Analysis
 Knowledge deployment
 Decision implementation
CONCLUSION
The data warehousing and data mining applications
are the most emerging technologies which are
endorsed that information to be easily and efficiently
accessed while building and deploying data driven
analytics for better knowledge in assisting the right
decision making activities. Hence various data
warehousing and data mining techniques are
discussed. Apart from that a comprehensive survey
conducted on data warehousing and data mining
framework and identified a set of limitations in
practices and implementation factors in data
warehousing and data mining application design.
Hence with this learning a new prototype framework
has been proposed for data warehousing and data
mining application design for large volume of
atmospheric data. And also conducted another survey
in usage of the data mining techniques in the field of
atmospheric studies and found that neural networks
are the best reliable source for classifying the
atmospheric data and predictive analysis.
Figure 12: Arrange (A) and Review (R) process flow diagram.
Following are the key activities in this phase and the
process flow is shown in figure 12.
 Arrange the system Integration tests and
review
 User acceptance test and review
 Performance tuning and capacity planning and
review
 Production rollout and data review for data
mining
REFERENCES
[1]
Inmon. W, "Build the data warehouse", John Wiley and Sons,
New York, 1996.
[2] Inmon, W H - "DW 2.0 Architecture For The Next Generation of
Data Warehousing" Morgan – Kaufman, 2008.
[3] Dan Linstedt, "Data Vault Series 1 – Data Vault Overview",
Data Vault Series, The Data Administration Newsletter,
Retrieved 12 September 2011.
[4] Jayanthi T, Ananthanarayanan S. and Rajeswari S,"Scientific
Data Warehouse and Visualization Techniques", Computer
Division, IGCAR, Kalpakkam.
[5] Ramon Lawrence, "An Architecture for Real-Time Warehousing
of
Scientific
Data",
Department
of
Computer
Science,University,USA,http://www.unidata.ucar.edu/projects/i
dd/overview/idd.html
[6] Wang Zhejiang ,Wei Hongchang and Wu Xuefang , "A Data
Warehouse Design Method”, 2012, International Conference on
" Computer Science & Service System (CSSS), Page(s): 2063 2066
[7] Ruilian Hou ,"Analysis and research on the difference between
data warehouse and database " International Conference on
Computer Science and Network Technology (ICCSNT), 2011
Volume: 4 , Publication Year: 2011 , Page(s): 2636 - 2639
[8] Gerasimos Marketos, Yannis Theodoridis and Ioannis S.
Kalogeras," Seismological Data Warehousing and Mining",
University of Piraeus, Greece
[9] Xiaoguang Tan, "Data Ware housing and its potential using in
weather forecast", Institute of Urban Meteorology, CMA,
Beijing, China
[10] Aditya Kumar Gupta, Bireshwar Dass Mazumdar,
"Multidimensional
Schema
for
Agriculture
Data
Warehouse"(IJRET), 2013 Volume: 2, Page(s): 245 - 253
[11] Keshav Dev Gupta, Jyoti Gupta and Prakati Prasoon "Novel
Architecture with Dimensional Approach of Data Warehouse",
(IJARCSSE), 2013, Volume :3, Page(s): 301-303
Decision (D) Phase:
The purpose of the decision phase is to ensure that the
data mining data assessment and identify the
practically associated algorithms for an efficient
training and result Interpretation
Figure 13: Decision implementation process flow diagram.
Following are the key activities in this phase and the
process flow is shown in figure 12.
 Data reconciliation for data mining
 Data sampling analysis
Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
82
Data Warehousing And Data Mining Applications For Atmospheric Studies
[12] Vuda
Sreenivasarao,Venata Subbareddy Pallamreddy,
"Advanced Data Warehousing Techniques for Analysis,
Interpretation and Decision Support for Scientific Data", (CCIS
198), 2011, Page(s): 162-174
[13] Folorunsho Olaiya, Adesesan Barnabas Adeyemo "Application
of Data Mining Techniques in Weather Prediction and Climate
Change Studies",(MECS), 2012,page(s) 51-59
[14] Meghali A. Kalyankar,Prof. S. J. Alaspurkar "Data Mining
Technique to Analyse the Metrological Data", (IJARCSSE),
2013, Volume :3, Page(s) :114-118
[15] Gaurav J. Sawale, Dr. Sunil R. Gupta, "Use of Artificial Neural
Network in Data Mining For Weather Forecasting", (IJCSA),
2013, Volume:6, Page(s) 383-387
[16] Ralph Kimball, Margy Ross, Warren Thornthwaite, and Joy
Mundy (January 10, 2008). The Data Warehouse Lifecycle
Toolkit: Expert Methods for Designing, Developing, and
Deploying Data Warehouses (Second ed.). Wiley. ISBN
978-0-470-14977-5.
[17] Ms. Alpa R. Patel and Prof F. (Dr.) Jayesh M. Patel,"Data
Modeling Techniques for Data Warehouse", (2012),
International Journal of Multidisciplinary Research, Vol.2, Issue
2, Page(s) 240-246
[18] Shaker H. Ali El-Sappagh, Abdeltawab, M. Ahmed Hendawi
and Ali Hamed El Bastawissy, "A proposed model for data
warehouse ETL processes", Journal of King Saud University –
Computer and Information Sciences (2011) 23, Page(s) 91–104
[19] Paulraj M and Sivaprakasam P,"Functional Behavior Pattern for
DataMart based on Attribute Relativity",(IJCSI),2012, Vol.9,
Issue 4, Page(s) 278-283
[20] J. Han and M. Kamber, “Data Mining-Concepts and Technique”
(The Morgan Kaufmann Series in Data Management Systems),
2nd ed. San Mateo, CA: Morgan Kaufmann, 2006.
[21] Nilam K. Nakod, M.B.Vaidya,"Survey on Multidimensional and
Conditional Hybrid Dimensional Association Rule Mining",
(IJESE), 2013, Volume-1, Issue-4, Page(s) 63-66
[22] Sharma, N.; Ali, M. M., "A Neural Network Approach to
Improve the Vertical Resolution of Atmospheric Temperature
Profiles From Geostationary Satellites," Geoscience and Remote
Sensing Letters, IEEE , vol.10, no.1, pp.34,37, Jan. 2013,
doi:10.1109/LGRS.2012.2191763
[23] A.Santhi Latha ,J.Swapna Priya,Sk.Abdul Kareem and
M.Pavani Devi, "Spatial Data Mining Through Cluster
Analysis", (IJECCE), 2012, Volume:3, Issue No:2, Page(s):
372-375
[24] M.Laxmaiah and A.Govardhan,"A conceptual Meatadata
Framework for Spatial Data Warehouse", International Journal
of Data Mining & Knowledge Management Process (IJDKP),
2013, Vol.3, No.3 Page(s):63-73
[25] Ginjala Srikanth Reddy,Khasim Pasha Sd and Sadalaxmi
Morthala, "Ad-hoc Data-Mining Framework for Data
Warehouse Technology", International Journal of Advanced
Technology & Engineering Research (IJATER), 2012, Volume
2, Issue 4, page(s): 58-67
[26] Nenad Jukic and John Nicholas, "A Framework for Collecting
and Defining Requirements for Data Warehousing Projects",
Journal of Computing and Information Technology - CIT, 2010,
Volume: 18, Issue:4, Page(s):377-384
[27] Saroop, S.; Kumar, M., "Comparison of Data Warehouse Design
Approaches from User Requirement to Conceptual Model: A
Survey," Communication Systems and Network Technologies
(CSNT), 2011 International Conference on , vol., no.,
pp.308,312, 3-5 June 2011,doi: 10.1109/CSNT.2011.161

Proceedings of 5th IACEECE-2013, 22nd September 2013, Hyderabad, India. ISBN: 978-93-82702-30-6
83