Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Innovative and Emerging Research in Engineering Volume 3, Issue 2, 2016 Available online at www.ijiere.com International Journal of Innovative and Emerging Research in Engineering e-ISSN: 2394 - 3343 p-ISSN: 2394 - 5494 A Survey of Migration from Traditional Relational Databases towards To New Big Data Technology . Miss Dipika Thakare Department of Computer Science & Engineering, North Maharashtra University, India Shri Sant Gadge Baba collage of Engineering & Technology, Bhusawal ABSTRACT Inside this paper principally middle on the data is manage in the conventional relational data bases system and new big data technology .In a word huge amount of data are create in a social media sites and this data are stored and collect at unprecedented rates.There is a enormous challenge not to accumulate and handle the large capacity of data, but also extract important information from it. There are several approaches to processing storing, collecting, and analyzing big data. In a traditional relational databases management system difficult to handling big data. In this paper discuss about why is it necessary transition from relational database management system to big data technology. Keywords-Big data, NoSQL, Relational Databases, Decision making using Big Data, Hadoop I. INTRODUCTION “90% of the world’s data was generated in the last few years.”[2] Due to the latest technologies, communication, devices and device technologies means like social networking sites, the amount of data produced by mankind is increasing rapidly in every year, every month and every day. The amount of data produced by from the beginning of time till 2003 was 5 billion gigabytes. The same amount was created in every two days in 2011, and in every ten minutes in 2013. This rate is still growing a great deal. [2] All this information produced is useful and can be meaningful when processed, it is individual mistreated. International Data Corporation (IDC) believe organizations that are best to make real-time business decisions using Big Data solutions , while those that are unable to hold and make use will gradually more find themselves at a competitive disadvantage in the face potential failure and market.[1] II. RELATIONAL DATABASES The database management system is the set of the interrelated data and set of the program to operate on it. The collection of interrelated data is called as database. Database contains the information of interrelated data. In short relational data base means the information or data are manage in row and column form. A data bases contain information in one and more related table are called as relational database. The row present in table is called as records and column present in table are called as attributes or fields. There are two approaches to storing data in a warehouse are the following: Dimensional- Transaction data are partitioned into "facts" in dimensional approach, the generally transaction data is numeric and "dimensions", to gives context to the facts which are reference to the information.[1] Normalized- Databases normalization is a database design technique by which relational database tables are structured or designed in such a way as to make them user and even system friendly. The tables are grouped together by subject areas that reflect data categories, such as data on customers, products and so on. The normalized structure divides data into entities, which create several tables in a relational database.[1] a) RDBMS to Big Data Migration Testing Big data migration is generally involves in a multiple source system and very large volume of data. However the most of the organization are open source tool should be set up quickly and after multiple customization options. In a migration testing set of entities are selected for testing. All application data is migrated in this cycle. An easily solution can reduce the consecutive testing process.onther challenge defining effective scenarios for each entity. The solid data transformation rules consider in testing to accepting proper sampling method.[3] 128 International Journal of Innovative and Emerging Research in Engineering Volume 3, Issue 2, 2016 b) Big Data Migration Process Hadoop as a service is offered by Amazon web services (AWS).the solution of cloud computing that operational Challenge of running Hadoop and the large scale or medium scale data processing accessible, easy and fast. The generally services are available in simple storage service and EMR elastic MapReduce.The services are prerefered in Amazon Redshirt, a fast, fully manage, data warehouse service.[3] There are three step of process of migration to the Amazon web services Hadoop environment o Cloud Service- Virtual machines/physical machines are used to connect and extract the tables from source databases using Sqoop which pushes them to simple storage service. o Cloud Storage-Simple storage service cloud storage center is used for all the data that is being sent by virtual machines. It stores data in flat file format. o Data processing: Amazon EMR processes and distributes vast amounts of data using Hadoop.[3] III. WHAT IS BIG DATA Big data is collection of large data sets cannot process using computing technique. Big data is not single technique, this technique involves in many area of business. [2] Big data technique becomes applicable for more and more organization. Big data means what is nothing but data sets volume or variety is ability of commonly used software tools to process capture and manage the data within a tolerable to business. [1]The difficulty can comes to the related to data visualization, sharing, storage, capture, search and analytics etc means what the data are very large, very fast and very hard. [1]The very large means petabyte-scale rate collection of data comes from transaction histories. The very fast means the data transmission speed is very large. [1] What comes under the big data? Search Engine Data:Search engines can retrieve lots of data from different databases. Black Box data:-Black box data is a component of airplanes, jets and helicopter etc. It capture recording of microphones and earphones and also capture the voice of the flight crew and performance information of the aircraft. Transport Data:-Transport data include distance, model, capacity and availability of a vehicle. Power Grid Data:-The particular node consumed information holds by the power grid data with respect to a base station. Social media data:-social media data such as twitter, Facebook, hike, whatsApp and any social media sites hold the information and posted by many people. Stock Exchange data:-stock exchange data means hold the information and share of different companies made by the customer and holds information about ‘buy’ and ‘sell’ decision made.[3] The Big Data includes categories of huge volume, high velocity, and extensible variety of data. There are three types of data. Semi Structured data: XML data Structured data: Relational data. Unstructured data: Word, PDF, Text, Media Logs.[3] The Big Data is characterized by following:[1] Variety - The increasingly different types of data that no longer fits into neat, easy to consume structures. Velocity - The frequency of new data is generated, captured, and shared. Veracity - the disarrayed data Volume - The various amounts of data are generated every second this data are larger than what the conventional relational database infrastructures can deal with. The capability of Big Data [1] The business is done transform way To solve today data problem Competitive advantage is build in the marketplace IV. BIG DATA OPERATIONAL AND ANALYTICAL TECHNOLOGIES To handling the big data there are many technologies can be applied. I) Schema-less databases ii) NoSQL (not only SQL) there are many approaches used by NoSQL (Not Only SQL) technology for storing and managing unstructured data. NoSQL (not only SQL) databases separate data storage and data management, the relational databases are combining both of them. In NoSQL database one of the key concepts focuses on the high-performance scalable data storage and provides low-level access to a data management layer. That allows data management tasks 129 International Journal of Innovative and Emerging Research in Engineering Volume 3, Issue 2, 2016 to be written in the application layer. NoSQL databases system can also be called schema-free databases. The most important is the key advantage of schema-free design is that it enables applications to quickly improve the structure of data without table rewrites. The data validity and integrity aspect is compulsory at the data management layer. [1] NoSQL also has considered the function of atomicity, consistency, isolation, and durability (ACID) aspects. [1] The generally does not maintain complete consistency across distributed servers because the load places on databases, mostly in distributed systems. Traditional relational databases implement strict transactional semantics to protect consistency but many NoSQL (Not Only SQL) databases have more scalable architectures that slow down the consistency requirement. [1] Big data operational includes systems like MongoDB this systems provide operational capability for a realtime, interactive workloads where data is primarily captured and stored in a system.[2] NoSQL Big Data systems are designed to take the advantage of new cloud computing architectures that to allow very big computations to be run economically and powerfully.[2] This prepared big data workloads much easier to manage, cheaper, and faster to implement in this system.[2] There are some NoSQL systems can provide coming into patterns and trends based on real-time data with minimal coding and without the need for data scientists. [2] Analytical technology includes systems like MapReduce and Massively Parallel Processing (MPP) database systems. [2]These systems provide analytical capabilities for demonstration and complex analysis that may touch most of the data. MapReduceSystem provides a new method for analyzing data that is corresponding to the capabilities provided by SQL (Structure query language).A system based on MapReduce system that can be scaled up from single servers to more of high and low end machines. Data Intensive Computing is a division of parallel computing application. This application uses a data parallel approach for to process big data. This approach works based on the principle of collection of algorithm, programs or data used to perform working out. Distributed and Parallel system approach of inter-connected individual computers that work together as a single included computing resource is used to process or analyze big data. [1] Distributed file system or network file system allows client nodes to access files through a computer network. In this way a number of users working on multiple machines to storage resources and share files. The client nodes cannot be access the block storage but can interrelate from end to end a network protocol. This enables a limited access to the file system depending on the lists on both server’s capabilities or access and clients which are again totally dependent on the protocol. [1] Apache Hadoop is analytics and stream computing, this is key technology used for to handle big data. Apache Hadoop is the open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It can be scaled up from a single server to thousands of machines and very high degree of responsibility lenience. [1] V. BIG DATA FRAMEWORK This paper express a simple framework under to look at the key components of a big data system in order to work through much architectural decision as you travel around the world of big data.[1] Big data frequently brings four new and very different considerations in venture architecture: Multiple analytics paradigms and computing methods must be supported: Storage models are changing – solutions provide like HDFS (Hadoop Distributed File System) and unstructured data stores.[1] o Data sources have a different scale – many companies work in the multi-terabyte and some in petabyte field.[1] o Speed is critical – ETL (extract-transform-load) batches are real-time streaming from solutions like Storm are required insufficient.[1] Batch processing: Hadoop as a distributed processing engine and that can examine very large amounts of data for apply algorithms that range from the simple to the composite.[1] Interactive analytics: It can be includes distributed MPP system data warehouses with embedded analytics, which facilitate business users to do interactive query and apparition of big data.[1] Real-time database and analytics: These are generally in memory; enable distributed processing and eventgeneration capabilities, cross-data center access to data and scale-out engines that provide low-latency.[1] o VI. WHY TRANSITION FROM RELATIONAL DATABASES TO BIG DATA The following table shows the difference between the traditional relational databases system and Big Data database systems (Hadoop). Outstanding to the huge amounts of data being generated and analyzed real time to provide intelligence to the decision support systems, there are clear need of the time to transition to Big Data. [1] 130 International Journal of Innovative and Emerging Research in Engineering Volume 3, Issue 2, 2016 Table 1: Difference between RDBMS and Hadoop RDBMS Hadoop Data layout Row and column oriented Hadoop is column family oriented Description In a traditional relational data base management system data, information or record is store in a tabular form means row and column form. In a distributed file system that stores huge amount of file data on a cloud of machines and its handles data redundancy. The top of that distributed file system and Hadoop provides API for processing all that stored data that is Map-Reduce. Top of this basic schema and Column Databases like hBase can be build Which type of data support The traditional relational data base management system is support and work only structured data The Hadoop system is support the structured, semi- structure, unstructured data Read / write throughput limits Read /write throughput limits of traditional relational data base management system 1000s queries/second Read /write throughput limits of Hadoop Millions of queries per second Limitations Traditional relational data base management system limited ability to handle streaming data. Hadoop system works well with streaming data Maximum data size The Maximum size of traditional relational data base management system is terabytes. The Maximum size of Hadoop system is Hundreds of Pitabytes SUGGESTED BIG DATA ADOPTION ROADMAP FOR AN ENTERPRISE Following is a recommended outline of the roadmap for adoption of Big Data for an Enterprise: Constant focus on Business value and Innovation Concentrate on Innovative Technology Solution Continuous improvement and future speediness Continue to build and treat target state Enterprise capabilities Architecture and Planning for Enterprise needs Create authority framework Define Business and Technology blueprints Define access points to Big Data Adopt Big Data through Execution and Integration Embed target state Enterprise capabilities in Business Integrate Big Data into existing IMF (Information Management framework) Center on Business value Operationalize Proof of Concepts V. CONCLUSION At some stage in the last 35 years, data management principles are physical and logical independence, declarative querying and cost-based optimization, to a multi-billion dollar in industry. According to IBM 80 percent of planet‘s data is shapeless and most businesses do not even attempt to use this data to their advantage. Once the technology to examine big data reach their reach your peak, it will become easier for companies to analyze enormous 131 International Journal of Innovative and Emerging Research in Engineering Volume 3, Issue 2, 2016 datasets, recognize patterns and then advantageously plan their moves based on consumer requirements that recognized through remarkable data. ACKNOWLEDGEMENT I would like to thank our honorable Principal, Dr. R. P. Singh, our Head of Department, Prof. D. D. Patil, my special thanks to my guide, Miss. Lavina Panjwani & sincere thanks to all the respected teaching faculties of Department of Computer Science & Engineering of Hindi Seva Mandal’s, Shri Sant Gadge Baba College of Engineering & Technology, Bhusawal. My special thanks to all the writers of reference paper that are referred by us. REFERENCES [1] Sangeeta Bansal, Dr. Ajay Rana, “Transitioning from Relational Databases to Big Data” International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 1, January 2014. [2] www.tutorialPoint.com Hadoop tutorial. [3] From Relational Database Management to Big Data: Solutions for Data Migration Testing. [4] A Nevins Partners, “Why is BIG Data Important?” ,White Paper, May 2012. [5] www.hadoop.apache.org 132