Download Data Mining—Why is it Important?

Data Mining—Why is it Important? Data mining starts with the client. Clients naturally collect data simply by doing business; so that is where the entire process begins. But Customer Relationship Management (CRM) Data is only one part of the puzzle. The other part of the equation is competitive data, industry survey data, blogs, and social media conversations. By themselves, CRM data and survey data can provide very good information, but when combined with the other data available it is powerful. Data Mining is the process of analyzing and exploring that data to discover patterns and trends. The term Data Mining is one that is used frequently in the research world, but it is often misunderstood by many people. Sometimes people misuse the term to mean any kind of extraction of data or data processing. However, data mining is so much more than simple data analysis. According to Doug Alexander at the University of Texas, data mining is, “the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.” Data mining consists of five major elements: 1) Extract, transform, and load transaction data onto the data warehouse system. 2) Store and manage the data in a multidimensional database system. 3) Provide data access to business analysts and information technology professionals. 4) Analyze the data by application software. 5) Present the data in a useful format, such as a graph or table. This technique is a game changer in the world of statistical analysis and business. It is important in this realm because it can make predictions that older analyses techniques were simply not capable making. This visual from thearling.commay help understand the evolution and differences of data analysis through the years: Evolutionary Business Enabling Step Question Technologies Data “What was Computers, Collection my total tapes, disks (1960s) revenue in the last five years?” Data Access “What Relational (1980s) were unit databases sales in (RDBMS), New Structured Query England Language (SQL), last ODBC March?” Data “What On-line analytic Warehousing were unit processing & sales in (OLAP), Decision New multidimensional Support England databases, data (1990s) last warehouses March? Drill down to Boston.” Data Mining “What’s Advanced (Emerging likely to algorithms, Today) happen to multiprocessor Product Providers IBM, CDC Characteristics Oracle, Sybase, Informix, IBM, Microsoft Retrospective, dynamic data delivery at record level Pilot, Comshare, Arbor, Cognos, Microstrategy Retrospective, dynamic data delivery at multiple levels Pilot, Lockheed, IBM, SGI, Prospective, proactive information Retrospective, static data delivery Boston computers, numerous unit sales massive startups next databases (nascent month? industry) Why?” Table 1. Steps in the Evolution of Data Mining. delivery Data Mining can be used in many different sectors of business to both predict and discover trends. It is a proactive solution for businesses looking to gain a competitive edge. In the past, we were only able to analyze what a company’s customers or clients HAD DONE, but now, with the help of Data Mining, we can predict what clientele WILL DO. With Data Mining, companies can make better and more effective business decisions – marketing, advertising, etc – decisions that will help these companies grow. For more information about how Data Mining can help discover trends and patterns in your market, contact the market research specialists at The Research Group by calling 410-332-0400 or click here today! Qualitative market research utilizes the disciplines of psychology and sociology to garner emotive insights that drive behavior, and importantly influence decisions. The Research Group’s team of seasoned researchers will assist you in turning those insights into opportunities. 3 Reasons Why Data Mining is (almost) Dead Data Mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information. As the term suggests, the data is mined or queried for insight. For example, retailers use data mining techniques to do basket analysis (customers who bought this also bought that) and to further understand what other factors influence a purchase. Traditionally, data mining has consisted of analysts generating questions to feed to a database in the hope of finding an answer. This could be something like asking the data belonging to a clothing retailer, “Are customers buying Hawaiian shirts in Atlanta?” Sounds very applicable, especially when it comes to the hype around Big Data, doesn’t it? Applicable, yes. Effective? Not so much. Given today’s explosion of “Big Data,” companies need more advanced methods for leveraging their data – methods that don’t rely solely on tribal knowledge, personal experience or best guesses. What’s needed are new technologies and purpose-built solutions that reveal questions to answers no one even knew to ask. That leads me to the three main reasons why traditional data mining methods are going the way of the dodo: 1. The current volume of data is unprecedented. In fact, 15 of 17 sectors in the U.S. have more data stored per company than the entire U.S. Library of Congress. According to IDC, in 2015, an estimated 7.9 zettabytes of data will be produced and replicated – the equivalent of 18 million libraries of congress. With these massive data sets, it’s close to impossible to figure out what to query? The number of queries exponentially explodes with the number of data elements. Should I query about customers buying shirts in Atlanta? Or in summer? Or in summer with a coke? Or with a hot dog?…the list is endless. As one my customers said – “I do not know what questions to ask. Therein is the limitation!” The breadth and depth of this “big” data makes querying seem like trying to strike oil while digging with a toothpick. 2. Added to volume is velocity of the data. The data is piling up faster and faster. A company encounters a continuous stream of real-time data – social media updates, customer feedback, sales figures, financial data, supply chain data, product quality data, product monitoring data and on and on and on. There’s simply not enough time to manually query the data – it’s like a physician trying to diagnose thousands of patients at the same time. The data must constantly inform the end-user – ie. diagnose itself and recommend a treatment – for it to be of any strategic value. 3. As I’ve already discussed, conventional data mining techniques are driven by the analyst – or group of people – tasked with coming up with a hypothesis, which is subjective and vulnerable to personal bias and human error. Given the amount of information that’s out there, asking the right question every time is becoming more and more of a challenge because even the smartest, most experienced analysts “don’t know what they don’t know.” Querying methods are seriously biased by what the analyst thinks to ask. Again, going to back to the striking oil analogy, if the analyst thinks there is oil under a certain rock, that is the only place he will dig. He could be sitting on a gold mine 50 feet away, but he’d completely miss it. Data mining is limited to manual endeavors – why limit company success to antiquated methods that by design fail to leverage the data for all it’s worth? It’s time to usher in new methods – new technologies – for transforming the enterprise from reactive – based on guesstimates, hunches, and flawed insight – to proactive – based on data-driven, actionable insight. CMMI Maturity Level 1, called "Initial", is characterized by "Heroic Efforts". The CMMI identifies no Process Areas at this level. You automatically achieve this level if you can design, develop, integrate, and test. Organizations at Maturity Level 1 are sometimes successful, and sometimes not. Maturity Level 2, called "Managed", is characterized by "Basic Project Management". The seven Process Areas at Maturity Level 2 all deal with management, rather than technical issues: Maturity Level 3, called "Defined", is characterized by "Process Standardization". This is where the bulk of the Process Areas reside in the CMMI. We find that these Process Areas fallinto three main categories:  Technical – The first five Process Areas (Requirements Development, Technical Solution, Product Integration, Verification, and Validation) deal with the technical engineering work.  Process Management – The next three Process Areas (Organizational Process Focus, Organizational Process Definition, and Organizational Training) provide the infrastructure for maintaining and improving the organization's processes.  Management – The last six Process Areas (Integrated Product Management, Risk Management, Integrated Teaming, Integrated Supplier Management, Decision Analysis & Resolution, and Organizational Environment for Integration) all build more management discipline on top of the basic management Process Areas established at Maturity Level 2. Maturity Level 4, called "Quantitatively Managed", is characterized by "Quantitative Management". With the disciplined processes established at Maturity Levels 2 and 3, the organization is now in the position to be able to gain a statistical, numbers-based understanding of its performance, and use that understanding to "manage by fact". The two Process Areas at Maturity Level 4 (Organizational Process Performance and Quantitative Project Management) apply this capability for statistical management to understand the quality of both the processes the organization uses and the products it produces. Maturity Level 5, called "Optimizing", is characterized by "Continuous Process Improvement". Built on the disciplined processes of Maturity Levels 2 and 3, and the quantitative understanding of Maturity Level 4, the two Process Areas at Maturity Level 5 (Organizational Innovation & Deployment and Causal Analysis & Resolution) put the organization on the path of ever-improving performance by understanding and correcting the root causes of problems, and by fostering an environment of innovation and creativity. Why Do People Believe the CMMI Has Little Value? The CMM and CMMI have received a lot of bad press over the years. Most of that bad press can be traced to one of two things: misunderstandings and abuses. Misunderstandings. Many people who open the CMMI book are immediately overwhelmed by the volume of information: five Maturity Levels, two Generic Goals, 12 Generic Practices, 25 Process Areas, 55 Specific Goals, 185 Specific Practices, hundreds of Sub-Practices—nearly a thousand pages in all! It is hard to blame them for feeling that this model must be way too restrictive to be applicable to a real-life organization. Naturally, if your organization is not under a mandate to achieve a Maturity Level rating, then the Practices, and even the Goals in the CMMI take on more of a suggestive flavor. Of course, any organization would do well to take them as exceedingly strong suggestions, given the CMMI’s solid research basis! Abuses. As we said at the beginning of this paper, the SEI designed the CMMI to be a roadmap for process improvement. But what we have seen in practice is organizations requiring their suppliers to achieve specific Maturity Level ratings. This in turn causes those suppliers to turn to the CMMI simply to achieve a rating, even if they have little or no interest in process improvement. When the CMMI is used by an organization that has no interest in process improvement, its use can (and often does) become abuse. Processes are written solely to satisfy a CMMI Appraiser, but with little or no thought for how they will affect the organization's work. Paperwork grows seemingly without bounds, and people feel that they are drowning in "process for process' sake". Those five steps seem easy enough. But organizational change actually involves much more work than the simple mechanics of deciding to make a change. The key players in the organization must all agree on the need for change, as well as the strategy to be employed. Garnering the necessary agreement and establishing momentum are major challenges in and of themselves. But those are topics for another white paper. How can CMMI help? • CMMI provides a way to focus and manage hardware and software development from product inception through deployment and maintenance. – ISO/TL9000 are still required. CMMI interfaces well with them. CMMI and TL are complementary - both are needed since they address different aspects. • ISO/TL9000 is a process compliance standard • CMMI is a process improvement model • Behavioral changes are needed at both management and staff levels. Examples: – Increased personal accountability – Tighter links between Product Management, Development, SCN, etc. • Initially a lot of investment required – but, if properly managed, we will be more efficient and productive while turning out products with consistently higher quality. CMMI Models within the Framework • Models: – Systems Engineering + Software Engineering (SE/SW) – Systems Engineering + Software Engineering + Integrated Product and Process Development (IPPD) – Systems Engineering + Software Engineering + Integrated Product and Process Development + Supplier Sourcing (SS) – Software Engineering only • Representation options: – Staged – Continuous • The CMMI definition of “Systems Engineering” “The interdisciplinary approach governing the total technical and managerial effort required to transform a set of customer needs, expectations and constraints into a product solution and to support that solution throughout the product’s life.” This includes both hardware and software. Maturity Level 1: Initial • Maturity Level 1 deals with performed processes. • Processes are unpredictable, poorly controlled, reactive. • The process performance may not be stable and may not meet specific objectives such as quality, cost, and schedule, but useful work can be done. Maturity Level 2 : Managed at the Project Level • Maturity Level 2 deals with managed processes. • A managed process is a performed process that is also: – Planned and executed in accordance with policy – Employs skilled people – Adequate resources are available – Controlled outputs are produced – Stakeholders are involved – The process is reviewed and evaluated for adherence to requirements • Processes are planned, documented, performed, monitored, and controlled at the project level. Often reactive. • The managed process comes closer to achieving the specific objectives such as quality, cost, and schedule. Maturity Level 3 : Defined at the Organization Level • Maturity Level 3 deals with defined processes. • A defined process is a managed process that: – Well defined, understood, deployed and executed across the entire organization. Proactive. – Processes, standards, procedures, tools, etc. are defined at the organizational (Organization X ) level. Project or local tailoring is allowed, however it must be based on the organization’s set of standard processes and defined per the organization’s tailoring guidelines. • Major portions of the organization cannot “opt out.” Behaviors at the Five Levels CMMI Components • Within each of the 5 Maturity Levels, there are basic functions that need to be performed – these are called Process Areas (PAs). • For Maturity Level 2 there are 7 Process Areas that must be completely satisfied. • For Maturity Level 3 there are 11 Process Areas that must be completely satisfied. • Given the interactions and overlap, it becomes more efficient to work the Maturity Level 2 and 3 issues concurrently. • Within each PA there are Goals to be achieved and within each Goal there are Practices, work products, etc. to be followed that will support each of the Goals. CMMI Process Areas Example For the Requirements Management Process Area: An example Goal (required): “Manage Requirements” An example Practice to support the Goal (required): “Maintain bi-directional traceability of requirements” Examples (suggested, but not required) of typical Work Products might be Requirements traceability matrix or Requirements tracking system Yet another CMMI term: Institutionalization • This is the most difficult part of CMMI implementation and the portion where managers play the biggest role and have the biggest impact • Building and reinforcement of corporate culture that supports methods, practices and procedures so they are the ongoing way of business…….. – Must be able to demonstrate institutionalization of all CMMI process areas for all organizations, technologies, etc. • Required for all Process Areas Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system. Solution 1:ABC Pvt Ltd.  Extract sales information from each database.  Store the information in a common repository at a single site. Scenario 2 One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time. Solution 2  Extract data needed for analysis from operational database.  Store it in warehouse.  Refresh warehouse at regular interval so that it contains up to date information for analysis.  Warehouse will contain data with historical perspective. Scenario 3 Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions. Solution 3  Improve the quality of data before loading it into the warehouse.  Perform data cleaning and transformation before loading the data.  Use query analysis tools to support adhoc queries. What is Data Warehouse?? Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process. Subject-oriented  Data warehouse is organized around subjects such as sales,product,customer.  It focuses on modeling and analysis of data for decision makers.  Excludes data not useful in decision support process. Integration  Data Warehouse is constructed by integrating multiple heterogeneous sources.  Data Preprocessing are applied to ensure consistency.  In terms of data. – encoding structures. – Measurement of attributes. – physical attribute of data – naming conventions. – Data type format Time-variant  Provides information from historical perspective e.g. past 510 years  Every key structure contains either implicitly or explicitly an element of time Nonvolatile  Data once recorded cannot be updated.  Data warehouse requires two operations in data accessing – Initial loading of data – Access of data Operational v/s Information System Features Operational Information Characteristics Operational processing Informational processing Orientation Transaction Analysis User Clerk,DBA,database professional Knowledge workers Function Day to day operation Decision support Data Current Historical View Detailed,flat relational Summarized, multidimensional DB design Application oriented Subject oriented Unit of work Short ,simple transaction Complex query Access Read/write Mostly read Features Operational Information Focus Data in Information out N0. of rec. accessed tens millions Number of users thousands hundreds DB size 100MB to GB 100 GB to TB Priority High prformnc,high availability High flexibility,end-user autonomy Metric Transaction throughput Query througput Operational v/s Information System Data Warehouse Architecture  Data Warehouse server – almost always a relational DBMS,rarely flat files  OLAP servers – to support and operate on multi-dimensional data structures  Clients – Query and reporting tools – Analysis tools – Data mining tools Data Warehouse Schema  Star Schema  Fact Constellation Schema  Snowflake Schema Star Schema  A single,large and central fact table and one table for each dimension.  Every fact points to one tuple in each of the dimensions and has additional attributes.  Does not capture hierarchies directly. SnowFlake Schema  Variant of star schema model.  A single,large and central fact table and one or more tables for each dimension.  Dimension tables are normalized i.e. split dimension table data into additional tables Fact Constellation  Multiple fact tables share dimension tables.  This schema is viewed as collection of stars hence called galaxy schema or fact constellation.  Sophisticated application requires such schema. Building Data Warehouse  Data Selection  Data Preprocessing – Fill missing values – Remove inconsistency  Data Transformation & Integration  Data Loading Data in warehouse is stored in form of fact tables and dimension tables. Case Study  Afco Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda.  There products are sold in North,North West and Western region of India.  They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda.  The President of the company wants sales information. Sales Information Sales Measures & Dimensions  Measure – Units sold, Amount.  Dimensions – Product,Time,Region. Sales Data Warehouse Model Sales Data Warehouse Model Online Analysis Processing(OLAP)  It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP Server  An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure.  OLAP server available are – MOLAP server – ROLAP server – HOLAP server Data Warehousing includes  Build Data Warehouse  Online analysis processing(OLAP).  Presentation. Need for Data Warehousing  Industry has huge amount of operational data  Knowledge worker wants to turn this data into useful information.  This information is used by them to support strategic decision making .  It is a platform for consolidated historical data for analysis.  It stores data of good quality so that knowledge worker can make correct decisions.  From business perspective -it is latest marketing weapon -helps to keep customers by learning more about their needs . -valuable tool in today’s competitive fast evolving world. Data Warehousing Tools  Data Warehouse – SQL Server 2000 DTS – Oracle 8i Warehouse Builder  OLAP tools – SQL Server Analysis Services – Oracle Express Server  Reporting tools – MS Excel Pivot Chart – VB Applications • • • • • • What is Crowdsourcing? How Crowdsourcing works? Types of Crowdsourcing Applications of Crowdsourcing Benefits & Problems of Crowdsourcing Video WHAT IS CROWDSOURCING? • Crowdsourcing is the process of getting work or funding, usually online, from a crowd of people. • The word Crowdsourcing is a combination of Crowd & Outsourcing • Definition's: • Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people or community (a "crowd"), through an open call. • Crowdsourcing is an online, distributed problem solving and production model. • The term crowd sourcing was first used by Jeff Howe in 2006 in an article for wired magazine. The Croud Sourcing Process IN EIGHT STEPS 1- Company has a problem 2- Company broadcasts problem online 3- Online “crowd” is asked to give solutions 4- Crows submits Solutions 5- Crowd vets solutions 6- Company rewards winning solvers 7- Company owns winning solutions 8- Company Profits TYPES OF CROWDSOURCING • Crowd funding • The wisdom of the crowd • Crowdsourcing creative work • Microwork CROWD FUNDING • Crowd funding describes the collective effort of individuals who network and pool their money, usually via the Internet, to support efforts initiated by other people or organizations. This includes disaster relief, startup company funding, free software development, scientific research and many more. THE WISDOM OF THE CROWD • The wisdom of the crowd is the process of taking into account the collective opinion of a group of individuals rather than a single expert to answer a question. CROWDSOURCING CREATIVE WORK • Creative crowdsourcing spans sourcing creative projects such as graphic design, architecture, apparel design, writing, illustration etc. MICROWORK • Microwork is a series of small tasks which together comprise a large unified project, and are completed by many people over the Internet. Microwork is considered the smallest unit of work in a virtual assembly line. It is often used where human intelligence required to complete the task efficiently. APPLICATIONS OF CROWDSOURCING • Testing & Refining a Product  Netflix  SellaBand • Market Research  Threadless  Knowledge Management • Accenture • Wikipedia • Customer Service • My Starbucks ideas • R&D • InnoCentive • P&G Connect & Develop • Polling and Voting • InTrade  Building a new city The History / Genesis of Crowd sourcing 1714- Marine Pocket Clock invented 1936- Toyota Holds a Logo Contest 1955- Syd Opera House Architecture Contest 2001- Wikipedia Launched 2002- American Idol Season 1 2005- Youtube Launched 2006- Crowdsourcing term coined BENEFITS OF CROWDSOURCING • Problems can be explored at comparatively little cost. • Payment is by results. • The organization can tap a wider range of talent than might be present in its own organization • Turn customers into designers • Turn customers into marketers PROBLEMS WITH CROWDSOURCING • Quality • Intellectual property leakage • • • • No time constraint Not much control over development or ultimate product Ill-will with own employees Choosing what to crowd source & what to keep in-house Benefits of Refactoring The Summary: Refactoring is a huge aid in untangling production code without breaking it, and in improving its long-term maintainability. Refactoring helps you achieve: 1. self-documenting code, for better readability and maintainability, which is pretty much the only kind of code documentation that ever seems to stay current (Extract Method and Introduce Local allow you to create function and variable names that are descriptive enough to rarely need comments). Until you experience readable, self-describing code, you don't know what you're missing 2. fine-grained encapsulation, for easier debugging and code reuse: Extract Method automatically determines the parameters needs in order to create a method from the current selection, and handles them correctly. You then know exactly what external information the selected block requires in order to operate. This can be a great aid in untangling complex code during code reviews or debugging. 3. the generalization of existing code, to make it easier to apply existing code to a broader range of problems - as youExtract Method, you can easily replace things like hard-coded constants (perhaps, a connection string, or a table name) with parameters, thus allowing the application of proven code to new contexts. Continues… Understandability More straightforward and well organized (factored) code is easier to understand. Correctness It's easier to identify defects by inspection in code that's easier to understand. Overly complex, poorly structured, Rube Goldberg style code is much more difficult to inspect for defects. Additionally, well componentized code with high coherency of components and loose coupling between components is vastly easier to put under test. Moreover, smaller, well-formed bits under test makes for less overlap in code coverage between test cases which makes for faster and more trustworthy tests (which becomes a self-reinforcing cycle driving toward better and better tests). As well, more straightforward code tends to be more predictable and reliable. Ease of Maintenance and Evolution Well-factored, high quality, easy to understand common components are easier to use, extend, and maintain. Many changes to the system are now easier to make because they have smaller impact and it's more obvious how to make the appropriate changes. Refactoring code does have merit on its own just in terms of code quality and correctness issues, but where refactoring pays off the most is in maintenance and evolution of the design of the software. Often a good tactic when adding new features to old, poorly factored code is to refactor the target code then add the new feature. This often will take less development effort than trying to add the new feature without refactoring and it's a handy way to improve the quality of the code base without undertaking a lot of "pie in the sky" hypothetical advantage refactoring / redesign work that's hard to justify to management. Cloud computing  Definitions of Cloud computing  Architecture of Cloud computing  Benefits of Cloud computing  Opportunities of Cloud Computing  Cloud computing – Google Apps  Grid computing vs Cloud computing Definitions  Cloud computing is using the internet to access someone else's software running on someone else's hardware in someone else's data center. Lewis Cunningham[2]  A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. Ian Foster[9]  A Cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers. Rajkumar Buyya[10] Architecture of Cloud computing Essential Characteristics[7]  On-demand self-service.  A consumer can unilaterally provision computing capabilities such as server time and network storage as needed automatically, without requiring human interaction with a service provider.  Broad network access.  Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs) as well as other traditional or cloudbased software services.  Resource pooling.  The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.  Rapid elasticity.  Capabilities can be rapidly and elastically provisioned - in some cases automatically - to quickly scale out; and rapidly released to quickly scale in.  To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.  Measured service.  Cloud systems automatically control and optimize resource usage by leveraging a metering capability at some level of abstraction appropriate to the type of service.  Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the service. Cloud Service Models SPI Model  Cloud Software as a Service (SaaS)  Cloud Platform as a Service (PaaS)  Cloud Infrastructure as a Service (IaaS) Infrastructure as a Service (IaaS)  The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources.  Consumer is able to deploy and run arbitrary software, which can include operating systems and applications.  The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls). Software as a Service (SaaS)  The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure.  The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email).  The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings. Cloud Deployment Models  Public Cloud.  Private Cloud.  Community Cloud.  Hybrid Cloud. Public Cloud  The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services. Private Cloud  The cloud infrastructure is operated solely for a single organization. It may be managed by the organization or a third party, and may exist onpremises or off-premises. Community Cloud  The cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, or compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises. Hybrid Cloud  The cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for loadbalancing between clouds). Benefits of Cloud Computing      Business Benefits Almost zero upfront infrastructure investment Just-in-time Infrastructure More efficient resource utilization Usage-based costing Reduced time to market Technical Benefits  Automation – “Scriptable infrastructure”  Auto-scaling  Proactive Scaling  More Efficient Development lifecycle  Improved Testability  Disaster Recovery and Business Continuity Opportunities of Cloud Computing  End consumers.  Business customers.  Developers and Independent Software Vendors (ISVs). Google App Engine  Google App Engine enables you to build web applications on the same scalable systems that power Google applications. App Engine applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs grow.  Cost  ?  Pay only for what you actually use.  Exceed the free quota of 500 MB of storage and around 5M pageviews per month.  Trial?  How to Create applications for Cloud computing?  build an App Engine application using standard Java web technologies, such as servlets and JSP.  create an App Engine Java project with Eclipse use the Google Plugin for Eclipse for App Engine development.  use the App Engine datastore with the Java Data Objects (JDO) standard interface.  upload your app to App Engine. Grid computing vs Cloud computing Cloud  Increase computing.  Increase store.  consumption basis.  IBM, Google, Microsoft  Hour, storage, view… Grid      Increase computing. Increase store. project-oriented academia or gov. labs number of service units Collective: interactions across collections of resources, directory services Platform: collection of specialized tools, middleware and services on top of the unified resources toprovide a development and/or deployment platform. Unified Resources: resources that have been abstracted/encapsulated Resource: discovery, negotiation, monitoring, accounting and payment of sharing operations on individual resources Connectivity: communication and authentication protocols Application  Grid Computing emerged in eScience to solve scientific problems requiring HPC.  Cloud Computing is rather oriented towards applications that run permanently and have varying demand for physical resources while running.  the well-known CRM SaaS Salesforce.com.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Mining—Why is it Important?