Download Analysis of Telecommunication Database using Snowflake Schema

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data model wikipedia , lookup

Data center wikipedia , lookup

Versant Object Database wikipedia , lookup

Predictive analytics wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Progress In Science and Engineering Research Journal
ISSN 2347-6680 (E)
Analysis of Telecommunication Database using Snowflake Schema
Dr. Sanjay Srivastava1, Kaushal Srivastava2, Avinash Pandey3, Akhil Sharma4
1
Professor, 2,3.4Research Scholar, Computer Science, MGMCoET, Noida, India
are
data about the phone calls that traverse their networks in
operating in highly competitive and challenging environment
the form of call detail records Telecommunication
because telecommunication companies routinely generate
companies
Abstract:
Telecommunication
companies
today
and store enormous amounts of high-quality data, have a
very large customer base, and operate in a rapidly changing
and highly competitive environment. Telecommunication
companies utilize data mining to improve their marketing
efforts,
identify
fraud,
and
better
manage
their
telecommunication networks. However, these companies also
also
information,
maintain
such
as
extensive
billing
customer
information;
Telecommunications companies also generate a store an
extensive amount of data related to the operation of their
networks
1.
Star Schema: It represents the multidimensional
face a number of data mining challenges due to the
model dimension in such a way such that it
enormous size of their data sets, the sequential and temporal
resembles like a star. It consists of one or more
aspects of their data, and the need to predict very rare
fact tables at center which can refer to many
events—such as fraud and real time network failure. Huge
dimension
amount of data regarding 2g and 3g customers and also the
type of customers using their mode leads to very huge
amount of data which need better data modelling technique.
tables
as
per
requirement
and
availability.
2.
Snow Flake schema: It can be treated as star
schema extension where each star point explodes
Index Words: Snowflake Schema, Data Base, 2G, 3G,
into many points. It is normalized into multiple
Optimization Algorithm
lookup tables.
3.
Fact Constellation: It is used for complex
I. INTRODUCTION
The
Telecommunication
industry
generates
application which is combination of star schema.
Fact constellation contains two or more fact tables.
large
amount of data. These data includes call-detail data
Comparison between Star and Snow Flake
which describes calling behavior of that customer and
Schema
that data traverse through the network that is nothing but

network data. And the customers who are subscribed to
companies describe the customer data. So these large

has
increased
query
Snowflake schema due to increased joints, takes
more time

existence. These systems helped to detect the frauds and
network issues Telecommunication companies maintain
schema
performance
amount of data was needed to be handled hence the
concept of Human based Expert systems came into
Snowflake
Star schema has increased efficiency due to no
joins

Star schema has redundancy and hence requires
more spaces
Corresponding Author: Mr. Kaushal Srivastava, Research Scholar,

Computer Science, MGMCoET, Noida, India
Email Id: [email protected]
Snow Flake schema is better for dimension
analysis

© 2014 PISER Journal
Star Schema is better for metrics analysis.
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Progress In Science and Engineering Research Journal
ISSN 2347-6680 (E)
III. Telecommunication Data
Warehouse
A. Association Rule of Data Mining
Association rule mining help in taking decisions
regarding
activities
like
discounts,
rate
plan,
promotional offers and combo item at the point of sale.
These rule help to find minimum support and minimum
confidence for given databases. The data model used
here is snowflake schema .Dimensional modelling is
one of the extensive techniques for designing data
warehouses. It arrange dimension table around fact
tables, which contain basic quantitative measurements.
Snow flake schema mode is represented in fig 4.1
B. Snow flake Schema
Snow flake schema is a type of data modelling
technique. Snow flake schema as mentioned above is
improvement of star schema. Snowflake schema is used
to improve the performance of certain queries. Snow
flake schema is better for dimension analysis as
compared to star schema. The schema is mapped with
one fact table surrounded by joined dimension tables
and these dimension table are further extend to
correlated dimension table , forming snow flake like
structure. This type of schema is used to improve query
performance against low cardinality attributes that are
queried independent .Business applications that are
Fig.1.Snow Flake Schema
based on ROLAP architecture may perform finer when
IV. System Architecture and
Implementation
data warehouse schema is snow flaked.
The snow flake schema used here contains six
The system architecture of data mining system contains
dimension table and one fact table. Billing Fact,
GUI which takes input/request from user in the form of
customer, Sales_Rep, Sales org, Bill, Rate Plan Service
data. User get message of data entered successfully.
Line are used dimension tables and fact table. In case of
User requested query is processed and output is get back
customer dimension table Subs attribute is added and in
to user through GUI. Data ware house is also called
case of billing fact table Bill, customer id, sales rep
enterprise data warehouse which is used for analysis of
number, sales org id, rate plan code rate plan type code
data and for reporting purpose. It is used to store current
are only used.
and historical data in data bases. Data warehouse is the
© 2014 PISER Journal
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Progress In Science and Engineering Research Journal
combination of many databases that is integration of
ISSN 2347-6680 (E)
INNER JOIN Bill_Date D
ON F.Bill_Id = D.Id
databases of different department (data mart).Data
mining is the
process of extracting interesting
information or pattern data from large databases or data
INNER JOIN Rate_Plan_Code R
ON F.Rate_Plan = R.Id
INNER JOIN Area_Code S
warehouses. Data mining engine is the heart of any data
mining technique for extracting any technique.
ON F.Service_Line = S.Id
INNER JOIn Cust_type C
ON F.Customer = C.Id
INNER JoIN Sales_Rep_ID S1
ON F.Sales_Rep =S1.Id
INNER JoIN Sales_Org_ID S2
ON S1.Sales_Rep =S2.Id
WHERE
D.YEAR = 2004 AND
R.Rate_Plan = '3g' AND
C.Customer="normal"
S1.Sales Rep="Sales_Rep_Num"
GROUP BY
C.Customer,
R.Rate_Plan
This query will scan through the table once and compute
this result in O (n) operation which makes the better
performance of the execution of query.
VI. Experimental Analysis
Fig 2. System Architecture for the Data Mining
System
The experiment is performed for two purposes to check
The most notable feature of the system architecture is
the performance and accuracy of result through
the tightly coupled integration with the relational
snowflake schema and to check that which type of
database that powers the data warehouse.
customer used which mode of SIM services i.e. 2g or
3G.All the work is done on oracle12c.
V. Optimization Algorithm
The algorithm involves several different parameters that
are database specific .All the query are made on the
basis of this algorithm. This algorithm tries to retrieve
all the essential information needed to extract interesting
knowledge.
SELECT C.Customer,
S2.Sales_org,
SUM(F.Billing_Fact)
Fig.3. Snow Flake Schema
FROM Fact_Sales F
© 2014 PISER Journal
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Progress In Science and Engineering Research Journal
ISSN 2347-6680 (E)
All the experimental result is purely depend on the data
the table as long as there is match between columns of
inserted in the dimension and fact table of snow flake
the tables .Inner join is performed to join rate plan,
schema. The name of the customer is same in the given
customer, bill with fact table bill_ fact.
output because there may be more than one customers
of same name .But customer Key is different in this
The result of following query is given below:
In the result 570 rows are selected which shows us that
case.
there are 570 3g customers of corporate type along with
1. SELECT c.cust_name,
the bill number and rate plan.
C.customer_id,b.bill,r.rate_plan_code
FROM
2. SELECT c.cust_name,c.subs,
billing_fact F inner JOIN
customer c
on f.customer_id=c.customer_id
C.customer_id,b.bill,r.rate_plan_code
FROM
billing_fact F
inner join bill b on f.bill=b.bill
c
inner join customer
on f.customer_id=c.customer_id
inner join rate_plans r on
inner join bill b
f.rate_plan_code=r.rate_plan_code
f.bill=b.bill
WHERE c.cust_type='corporate' and
inner join rate_plans r
c.subs= '3g';
f.rate_plan_code=r.rate_plan_code
on
on
WHERE c.cust_type='corporate' and
c.subs= '2g';
Similarly, in this query similar analysis is performed but
only difference is that subs are 2g. Basically this query
is fired to check 2g customer of corporate type. The
number of 2g customer therefore is 1780.
This query also display bill number and rate plan code
as done in above query. The result of above query is
given below.
Fig.4. Analysis1
In this query, customer name, customer id, bill number,
rate plan code is displayed to check the number of 2g
and 3g customer on the basis of customer type that is
corporate and subscriber mode is 3g. The query uses
inner join, basically inner join select all the rows from
© 2014 PISER Journal
Fig.5. Analysis2
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Progress In Science and Engineering Research Journal
3. SELECT
c.cust_name,
c.cust_type,
C.customer_id,
ISSN 2347-6680 (E)
c.subs,
b.bill,
billing_fact F
r.rate_plan_code FROM
inner
billing_fact F
inner join customer c
join
customer
c
on
f.customer_id=c.customer_id
on f.customer_id=c.customer_id
inner join bill b on f.bill=b.bill
inner join bill b on f.bill=b.bill
inner
join
rate_plans
r
on
inner
f.rate_plan_code=r.rate_plan_code
WHERE
c.cust_type='normal'
join
rate_plans
r
on
f.rate_plan_code=r.rate_plan_code
and
WHERE
c.subs='3g';
c.cust_type='normal'
and
c.subs='2g'
Similarly, in this query customer type is changed from
Similarly this query is fired to check the number of 2g
corporate to normal. And check the number of normal
normal customer. The number of rows retrieved is 770
people using 3g connection. The number of rows
rows. Similarly in this query also rate plan code and
retrieved is 1880.
bill number are also displayed.
The result of following query is given below:
The result of following query is given below:
Fig.6. Analysis3
4. SELECT
c.cust_name,
c.cust_type,
C.customer_id,
c.subs,
b.bill,
r.rate_plan_code FROM
Fig.7. Analysis4
© 2014 PISER Journal
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Progress In Science and Engineering Research Journal
ISSN 2347-6680 (E)
6. SELECT c.cust_city, c.subs,
5. SELECT c.cust_city, c.subs,
c.cust_name, C.customer_id, b.bill,
c.cust_name, C.customer_id,
r.rate_plan_code FROM billing_fact
b.bill,r.rate_plan_code FROM
billing_fact F
F
inner JOIN Sales_rep s
inner join customer c
ON f.sales_rep_num=s.sales_rep_num
on f.customer_id=c.customer_id
inner join customer c
inner join bill b
on f.customer_id=c.customer_id
on f.bill=b.bill
inner join bill b
inner join rate_plans r
on f.bill=b.bill
on .rate_plan_code=r.rate_plan_code
inner join rate_plans r
WHERE c.cust_city='Agra' and
on f.rate_plan_code=r.rate_plan_code
c.subs='3g';
WHERE c.cust_city='Mujfarnagar' and
This query is performed specially to check number of
c.subs='2g';
3G customer along with bill number, rate plan number
they adopted and customer name. The numbers of rows
Similarly this query is fired to check number of 2G
retrieve are around 212. This counting includes both
customer in Mujfarnagar area. The result which we get
corporate and normal customers.
includes both types of customer 2G and 3G both. So
total 189 rows are selected on this basis. The result of
the query is given in Analysis 6.
VII. CONCLUSION
The paper briefly explains about what actually snow
flake schema is. Snow flake schema is basically
distinction of star schema. Snow flake schema is better
for dimension analysis. This type of data modelling
technique of data warehouse in which fact table is
surrounded by correlated dimension table and that
dimension table further extend to other dimension table,
forming snow flake like structure. Snow flake schema is
used to increase performance of certain type of queries.
After performing above analysis, output obtained is
those 2G customers are more than 3g. This comparison
is also performed on the basis of area. This analysis
proofs that performance of snowflake schema for
Fig.8. Analysis 5
dimension analysis.
© 2014 PISER Journal
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Progress In Science and Engineering Research Journal
ISSN 2347-6680 (E)
[17]. Watson, H. J., Annino, D. A., and Wixom, B. H. Current
REFERENCES
Practices
[1]. Online Diagram Making Tools(https://www.gliffy.com/).
in
Data
Warehousing.
Information
Management. 18 (1), (2001).
[2]. The data warehouse toolkit: the complete guide to dimensional
modeling by Ralph Kimball, Margy Ross. — 2nd ed. (ISBN 0471-20024-7).
[3]. Agrawal R., Imielinski T. and A. Swami. Mining Association
Rules between Sets of Items in Large Databases. Proceeding of
ACM SIGMOD International Conference. (1993), 207-216.
[4].
Svetlozar Nestorov and Ad-Hoc Association-Rule Mining
within the Data Warehouse Proceedings of the 36th Hawaii
International Conference on System Sciences - 2003
[5]. Data Mining: Concepts and Techniques by Jiawei Han,
Micheline Kamber, Jian Pei-2nd ediion(ISBN-9780080475585)
[6]. Agrawal R. and Srikant R. Fast Algorithms for Mining
Association Rules. Proceeding of International Conference On
Very Large Databases VLDB. (1994), 487-499.
[7]. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and
A.Verkamo. Fast Discovery of Association Rules. Advances in
Knowledge Discovery and Data Mining. AAAI/MIT Press
(1996).
[8]. Berry, M. and Linoff, G. Data Mining Techniques for
Marketing, Sales and Customer Support. Wiley (1997).
[9]. Chaudhri, S. and Dayal, U. An overview of Data Warehousing
and OLAP Technology. ACM SIGMOD Record 26 (1), (1997),
65-74.
[10]. Hilderman, R.J., Carter, C.L., Hamilton, H.J., and Cercone, N.
Mining Association Rules from Market Basket Data Using
Share Measures and Characterized Itemsets. International
Journal of Artificial Intelligence Tools. 7 (2), (1998), 189-220.
[11]. Inmon, W. H. Building the Data Warehouse. Wiley (1996).
[12]. Kimball, R., Reeves, L., Ross, M., and Thornthwhite, W.The
Data Warehouse Lifecycle Toolkit. Wiley (1998).
[13]. Leavitt, N. Data Mining for the Corporate Masses. IEEE
Computer. 35 (5), (2002), 22-24.
[14]. S. Sarawagi, S. Thomas, R. Agrawal. Integrating Mining with
Relational Database Systems: Alternatives and Implications.
Proceedings of ACM SIGMOD Conference (1998), 343-354
[15]. S. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S.
Fig.9. Analysis 6
Nestorov, A. Rosenthal. Query Flocks: A Generalization of
Association-Rule Mining. Proceedings of ACM SIGMOD
Conference, (1998), 1-12.
[16]. Wang, K., He Y., and Han J. Mining Frequent Itemsets Using
Support Constraints. Proceedings of International Conference
on Very Large Databases VLDB, (2000), 43-52
© 2014 PISER Journal
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058
Systems
Progress In Science and Engineering Research Journal
© 2014 PISER Journal
ISSN 2347-6680 (E)
http://.piserjournal.org/
PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal
Page(s) 052-058