Download Analysis of Telecommunication Database using Snowflake Schema

Progress In Science and Engineering Research Journal ISSN 2347-6680 (E) Analysis of Telecommunication Database using Snowflake Schema Dr. Sanjay Srivastava1, Kaushal Srivastava2, Avinash Pandey3, Akhil Sharma4 1 Professor, 2,3.4Research Scholar, Computer Science, MGMCoET, Noida, India are data about the phone calls that traverse their networks in operating in highly competitive and challenging environment the form of call detail records Telecommunication because telecommunication companies routinely generate companies Abstract: Telecommunication companies today and store enormous amounts of high-quality data, have a very large customer base, and operate in a rapidly changing and highly competitive environment. Telecommunication companies utilize data mining to improve their marketing efforts, identify fraud, and better manage their telecommunication networks. However, these companies also also information, maintain such as extensive billing customer information; Telecommunications companies also generate a store an extensive amount of data related to the operation of their networks 1. Star Schema: It represents the multidimensional face a number of data mining challenges due to the model dimension in such a way such that it enormous size of their data sets, the sequential and temporal resembles like a star. It consists of one or more aspects of their data, and the need to predict very rare fact tables at center which can refer to many events—such as fraud and real time network failure. Huge dimension amount of data regarding 2g and 3g customers and also the type of customers using their mode leads to very huge amount of data which need better data modelling technique. tables as per requirement and availability. 2. Snow Flake schema: It can be treated as star schema extension where each star point explodes Index Words: Snowflake Schema, Data Base, 2G, 3G, into many points. It is normalized into multiple Optimization Algorithm lookup tables. 3. Fact Constellation: It is used for complex I. INTRODUCTION The Telecommunication industry generates application which is combination of star schema. Fact constellation contains two or more fact tables. large amount of data. These data includes call-detail data Comparison between Star and Snow Flake which describes calling behavior of that customer and Schema that data traverse through the network that is nothing but  network data. And the customers who are subscribed to companies describe the customer data. So these large  has increased query Snowflake schema due to increased joints, takes more time  existence. These systems helped to detect the frauds and network issues Telecommunication companies maintain schema performance amount of data was needed to be handled hence the concept of Human based Expert systems came into Snowflake Star schema has increased efficiency due to no joins  Star schema has redundancy and hence requires more spaces Corresponding Author: Mr. Kaushal Srivastava, Research Scholar,  Computer Science, MGMCoET, Noida, India Email Id: [email protected] Snow Flake schema is better for dimension analysis  © 2014 PISER Journal Star Schema is better for metrics analysis. http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Progress In Science and Engineering Research Journal ISSN 2347-6680 (E) III. Telecommunication Data Warehouse A. Association Rule of Data Mining Association rule mining help in taking decisions regarding activities like discounts, rate plan, promotional offers and combo item at the point of sale. These rule help to find minimum support and minimum confidence for given databases. The data model used here is snowflake schema .Dimensional modelling is one of the extensive techniques for designing data warehouses. It arrange dimension table around fact tables, which contain basic quantitative measurements. Snow flake schema mode is represented in fig 4.1 B. Snow flake Schema Snow flake schema is a type of data modelling technique. Snow flake schema as mentioned above is improvement of star schema. Snowflake schema is used to improve the performance of certain queries. Snow flake schema is better for dimension analysis as compared to star schema. The schema is mapped with one fact table surrounded by joined dimension tables and these dimension table are further extend to correlated dimension table , forming snow flake like structure. This type of schema is used to improve query performance against low cardinality attributes that are queried independent .Business applications that are Fig.1.Snow Flake Schema based on ROLAP architecture may perform finer when IV. System Architecture and Implementation data warehouse schema is snow flaked. The snow flake schema used here contains six The system architecture of data mining system contains dimension table and one fact table. Billing Fact, GUI which takes input/request from user in the form of customer, Sales_Rep, Sales org, Bill, Rate Plan Service data. User get message of data entered successfully. Line are used dimension tables and fact table. In case of User requested query is processed and output is get back customer dimension table Subs attribute is added and in to user through GUI. Data ware house is also called case of billing fact table Bill, customer id, sales rep enterprise data warehouse which is used for analysis of number, sales org id, rate plan code rate plan type code data and for reporting purpose. It is used to store current are only used. and historical data in data bases. Data warehouse is the © 2014 PISER Journal http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Progress In Science and Engineering Research Journal combination of many databases that is integration of ISSN 2347-6680 (E) INNER JOIN Bill_Date D ON F.Bill_Id = D.Id databases of different department (data mart).Data mining is the process of extracting interesting information or pattern data from large databases or data INNER JOIN Rate_Plan_Code R ON F.Rate_Plan = R.Id INNER JOIN Area_Code S warehouses. Data mining engine is the heart of any data mining technique for extracting any technique. ON F.Service_Line = S.Id INNER JOIn Cust_type C ON F.Customer = C.Id INNER JoIN Sales_Rep_ID S1 ON F.Sales_Rep =S1.Id INNER JoIN Sales_Org_ID S2 ON S1.Sales_Rep =S2.Id WHERE D.YEAR = 2004 AND R.Rate_Plan = '3g' AND C.Customer="normal" S1.Sales Rep="Sales_Rep_Num" GROUP BY C.Customer, R.Rate_Plan This query will scan through the table once and compute this result in O (n) operation which makes the better performance of the execution of query. VI. Experimental Analysis Fig 2. System Architecture for the Data Mining System The experiment is performed for two purposes to check The most notable feature of the system architecture is the performance and accuracy of result through the tightly coupled integration with the relational snowflake schema and to check that which type of database that powers the data warehouse. customer used which mode of SIM services i.e. 2g or 3G.All the work is done on oracle12c. V. Optimization Algorithm The algorithm involves several different parameters that are database specific .All the query are made on the basis of this algorithm. This algorithm tries to retrieve all the essential information needed to extract interesting knowledge. SELECT C.Customer, S2.Sales_org, SUM(F.Billing_Fact) Fig.3. Snow Flake Schema FROM Fact_Sales F © 2014 PISER Journal http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Progress In Science and Engineering Research Journal ISSN 2347-6680 (E) All the experimental result is purely depend on the data the table as long as there is match between columns of inserted in the dimension and fact table of snow flake the tables .Inner join is performed to join rate plan, schema. The name of the customer is same in the given customer, bill with fact table bill_ fact. output because there may be more than one customers of same name .But customer Key is different in this The result of following query is given below: In the result 570 rows are selected which shows us that case. there are 570 3g customers of corporate type along with 1. SELECT c.cust_name, the bill number and rate plan. C.customer_id,b.bill,r.rate_plan_code FROM 2. SELECT c.cust_name,c.subs, billing_fact F inner JOIN customer c on f.customer_id=c.customer_id C.customer_id,b.bill,r.rate_plan_code FROM billing_fact F inner join bill b on f.bill=b.bill c inner join customer on f.customer_id=c.customer_id inner join rate_plans r on inner join bill b f.rate_plan_code=r.rate_plan_code f.bill=b.bill WHERE c.cust_type='corporate' and inner join rate_plans r c.subs= '3g'; f.rate_plan_code=r.rate_plan_code on on WHERE c.cust_type='corporate' and c.subs= '2g'; Similarly, in this query similar analysis is performed but only difference is that subs are 2g. Basically this query is fired to check 2g customer of corporate type. The number of 2g customer therefore is 1780. This query also display bill number and rate plan code as done in above query. The result of above query is given below. Fig.4. Analysis1 In this query, customer name, customer id, bill number, rate plan code is displayed to check the number of 2g and 3g customer on the basis of customer type that is corporate and subscriber mode is 3g. The query uses inner join, basically inner join select all the rows from © 2014 PISER Journal Fig.5. Analysis2 http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Progress In Science and Engineering Research Journal 3. SELECT c.cust_name, c.cust_type, C.customer_id, ISSN 2347-6680 (E) c.subs, b.bill, billing_fact F r.rate_plan_code FROM inner billing_fact F inner join customer c join customer c on f.customer_id=c.customer_id on f.customer_id=c.customer_id inner join bill b on f.bill=b.bill inner join bill b on f.bill=b.bill inner join rate_plans r on inner f.rate_plan_code=r.rate_plan_code WHERE c.cust_type='normal' join rate_plans r on f.rate_plan_code=r.rate_plan_code and WHERE c.subs='3g'; c.cust_type='normal' and c.subs='2g' Similarly, in this query customer type is changed from Similarly this query is fired to check the number of 2g corporate to normal. And check the number of normal normal customer. The number of rows retrieved is 770 people using 3g connection. The number of rows rows. Similarly in this query also rate plan code and retrieved is 1880. bill number are also displayed. The result of following query is given below: The result of following query is given below: Fig.6. Analysis3 4. SELECT c.cust_name, c.cust_type, C.customer_id, c.subs, b.bill, r.rate_plan_code FROM Fig.7. Analysis4 © 2014 PISER Journal http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Progress In Science and Engineering Research Journal ISSN 2347-6680 (E) 6. SELECT c.cust_city, c.subs, 5. SELECT c.cust_city, c.subs, c.cust_name, C.customer_id, b.bill, c.cust_name, C.customer_id, r.rate_plan_code FROM billing_fact b.bill,r.rate_plan_code FROM billing_fact F F inner JOIN Sales_rep s inner join customer c ON f.sales_rep_num=s.sales_rep_num on f.customer_id=c.customer_id inner join customer c inner join bill b on f.customer_id=c.customer_id on f.bill=b.bill inner join bill b inner join rate_plans r on f.bill=b.bill on .rate_plan_code=r.rate_plan_code inner join rate_plans r WHERE c.cust_city='Agra' and on f.rate_plan_code=r.rate_plan_code c.subs='3g'; WHERE c.cust_city='Mujfarnagar' and This query is performed specially to check number of c.subs='2g'; 3G customer along with bill number, rate plan number they adopted and customer name. The numbers of rows Similarly this query is fired to check number of 2G retrieve are around 212. This counting includes both customer in Mujfarnagar area. The result which we get corporate and normal customers. includes both types of customer 2G and 3G both. So total 189 rows are selected on this basis. The result of the query is given in Analysis 6. VII. CONCLUSION The paper briefly explains about what actually snow flake schema is. Snow flake schema is basically distinction of star schema. Snow flake schema is better for dimension analysis. This type of data modelling technique of data warehouse in which fact table is surrounded by correlated dimension table and that dimension table further extend to other dimension table, forming snow flake like structure. Snow flake schema is used to increase performance of certain type of queries. After performing above analysis, output obtained is those 2G customers are more than 3g. This comparison is also performed on the basis of area. This analysis proofs that performance of snowflake schema for Fig.8. Analysis 5 dimension analysis. © 2014 PISER Journal http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Progress In Science and Engineering Research Journal ISSN 2347-6680 (E) [17]. Watson, H. J., Annino, D. A., and Wixom, B. H. Current REFERENCES Practices [1]. Online Diagram Making Tools(https://www.gliffy.com/). in Data Warehousing. Information Management. 18 (1), (2001). [2]. The data warehouse toolkit: the complete guide to dimensional modeling by Ralph Kimball, Margy Ross. — 2nd ed. (ISBN 0471-20024-7). [3]. Agrawal R., Imielinski T. and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proceeding of ACM SIGMOD International Conference. (1993), 207-216. [4]. Svetlozar Nestorov and Ad-Hoc Association-Rule Mining within the Data Warehouse Proceedings of the 36th Hawaii International Conference on System Sciences - 2003 [5]. Data Mining: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei-2nd ediion(ISBN-9780080475585) [6]. Agrawal R. and Srikant R. Fast Algorithms for Mining Association Rules. Proceeding of International Conference On Very Large Databases VLDB. (1994), 487-499. [7]. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.Verkamo. Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1996). [8]. Berry, M. and Linoff, G. Data Mining Techniques for Marketing, Sales and Customer Support. Wiley (1997). [9]. Chaudhri, S. and Dayal, U. An overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record 26 (1), (1997), 65-74. [10]. Hilderman, R.J., Carter, C.L., Hamilton, H.J., and Cercone, N. Mining Association Rules from Market Basket Data Using Share Measures and Characterized Itemsets. International Journal of Artificial Intelligence Tools. 7 (2), (1998), 189-220. [11]. Inmon, W. H. Building the Data Warehouse. Wiley (1996). [12]. Kimball, R., Reeves, L., Ross, M., and Thornthwhite, W.The Data Warehouse Lifecycle Toolkit. Wiley (1998). [13]. Leavitt, N. Data Mining for the Corporate Masses. IEEE Computer. 35 (5), (2002), 22-24. [14]. S. Sarawagi, S. Thomas, R. Agrawal. Integrating Mining with Relational Database Systems: Alternatives and Implications. Proceedings of ACM SIGMOD Conference (1998), 343-354 [15]. S. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Fig.9. Analysis 6 Nestorov, A. Rosenthal. Query Flocks: A Generalization of Association-Rule Mining. Proceedings of ACM SIGMOD Conference, (1998), 1-12. [16]. Wang, K., He Y., and Han J. Mining Frequent Itemsets Using Support Constraints. Proceedings of International Conference on Very Large Databases VLDB, (2000), 43-52 © 2014 PISER Journal http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058 Systems Progress In Science and Engineering Research Journal © 2014 PISER Journal ISSN 2347-6680 (E) http://.piserjournal.org/ PISER 13, Vol.02, Issue: 03/06 May- June; Bimonthly International Journal Page(s) 052-058

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Analysis of Telecommunication Database using Snowflake Schema