Download Data Mining—Why is it Important?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining—Why is it Important?
Data mining starts with the client. Clients naturally collect data simply by
doing business; so that is where the entire process begins. But Customer
Relationship Management (CRM) Data is only one part of the puzzle. The
other part of the equation is competitive data, industry survey data, blogs,
and social media conversations. By themselves, CRM data and survey
data can provide very good information, but when combined with the other
data
available
it
is
powerful.
Data Mining is the process of analyzing and exploring that data to discover
patterns and trends.
The term Data Mining is one that is used frequently in the research world,
but it is often misunderstood by many people. Sometimes people misuse
the term to mean any kind of extraction of data or data processing.
However, data mining is so much more than simple data analysis.
According to Doug Alexander at the University of Texas, data mining is,
“the computer-assisted process of digging through and analyzing enormous
sets of data and then extracting the meaning of the data. Data mining tools
predict behaviors and future trends, allowing businesses to make proactive,
knowledge-driven decisions. Data mining tools can answer business
questions that traditionally were too time consuming to resolve. They scour
databases for hidden patterns, finding predictive information that experts
may miss because it lies outside their expectations.”
Data mining consists of five major elements:
1) Extract, transform, and load transaction data onto the data warehouse
system.
2) Store and manage the data in a multidimensional database system.
3) Provide data access to business analysts and information technology
professionals.
4) Analyze the data by application software.
5) Present the data in a useful format, such as a graph or table.
This technique is a game changer in the world of statistical analysis and
business. It is important in this realm because it can make predictions that
older analyses techniques were simply not capable making. This visual
from thearling.commay help understand the evolution and differences of
data analysis through the years:
Evolutionary Business Enabling
Step
Question Technologies
Data
“What was Computers,
Collection
my total
tapes, disks
(1960s)
revenue in
the last
five
years?”
Data Access “What
Relational
(1980s)
were unit databases
sales in
(RDBMS),
New
Structured Query
England Language (SQL),
last
ODBC
March?”
Data
“What
On-line analytic
Warehousing were unit processing
&
sales in
(OLAP),
Decision
New
multidimensional
Support
England databases, data
(1990s)
last
warehouses
March?
Drill down
to Boston.”
Data Mining “What’s
Advanced
(Emerging
likely to
algorithms,
Today)
happen to multiprocessor
Product
Providers
IBM, CDC
Characteristics
Oracle,
Sybase,
Informix, IBM,
Microsoft
Retrospective,
dynamic data
delivery at
record level
Pilot,
Comshare,
Arbor,
Cognos,
Microstrategy
Retrospective,
dynamic data
delivery at
multiple levels
Pilot,
Lockheed,
IBM, SGI,
Prospective,
proactive
information
Retrospective,
static data
delivery
Boston
computers,
numerous
unit sales massive
startups
next
databases
(nascent
month?
industry)
Why?”
Table 1. Steps in the Evolution of Data Mining.
delivery
Data Mining can be used in many different sectors of business to both
predict and discover trends. It is a proactive solution for businesses looking
to gain a competitive edge. In the past, we were only able to analyze what
a company’s customers or clients HAD DONE, but now, with the help of
Data Mining, we can predict what clientele WILL DO.
With Data Mining, companies can make better and more effective business
decisions – marketing, advertising, etc – decisions that will help these
companies grow.
For more information about how Data Mining can help discover trends and
patterns in your market, contact the market research specialists at The
Research Group by calling 410-332-0400 or click here today!
Qualitative market research utilizes the disciplines of psychology and
sociology to garner emotive insights that drive behavior, and importantly
influence decisions. The Research Group’s team of seasoned researchers
will assist you in turning those insights into opportunities.
3 Reasons Why Data Mining is
(almost) Dead
Data Mining (sometimes called data or knowledge discovery) is the
process of analyzing data from different perspectives and
summarizing it into useful information. As the term suggests, the data
is mined or queried for insight. For example, retailers use data mining
techniques to do basket analysis (customers who bought this also
bought that) and to further understand what other factors influence a
purchase.
Traditionally, data mining has consisted of analysts generating
questions to feed to a database in the hope of finding an answer. This
could be something like asking the data belonging to a clothing
retailer, “Are customers buying Hawaiian shirts in Atlanta?” Sounds
very applicable, especially when it comes to the hype around Big
Data, doesn’t it?
Applicable, yes. Effective? Not so much.
Given today’s explosion of “Big Data,” companies need more
advanced methods for leveraging their data – methods that don’t rely
solely on tribal knowledge, personal experience or best guesses.
What’s needed are new technologies and purpose-built solutions that
reveal questions to answers no one even knew to ask.
That leads me to the three main reasons why traditional data mining
methods are going the way of the dodo:
1. The current volume of data is unprecedented. In fact, 15 of 17 sectors
in the U.S. have more data stored per company than the entire U.S.
Library of Congress. According to IDC, in 2015, an estimated 7.9
zettabytes of data will be produced and replicated – the equivalent of
18 million libraries of congress. With these massive data sets, it’s
close to impossible to figure out what to query? The number of
queries exponentially explodes with the number of data elements.
Should I query about customers buying shirts in Atlanta? Or in
summer? Or in summer with a coke? Or with a hot dog?…the list is
endless. As one my customers said – “I do not know what questions
to ask. Therein is the limitation!” The breadth and depth of this “big”
data makes querying seem like trying to strike oil while digging with a
toothpick.
2. Added to volume is velocity of the data. The data is piling up faster
and faster. A company encounters a continuous stream of real-time
data – social media updates, customer feedback, sales figures,
financial data, supply chain data, product quality data, product
monitoring data and on and on and on. There’s simply not enough
time to manually query the data – it’s like a physician trying to
diagnose thousands of patients at the same time. The data must
constantly inform the end-user – ie. diagnose itself and recommend a
treatment – for it to be of any strategic value.
3. As I’ve already discussed, conventional data mining techniques are
driven by the analyst – or group of people – tasked with coming up
with a hypothesis, which is subjective and vulnerable to personal bias
and human error. Given the amount of information that’s out there,
asking the right question every time is becoming more and more of a
challenge because even the smartest, most experienced analysts
“don’t know what they don’t know.” Querying methods are seriously
biased by what the analyst thinks to ask. Again, going to back to the
striking oil analogy, if the analyst thinks there is oil under a certain
rock, that is the only place he will dig. He could be sitting on a gold
mine 50 feet away, but he’d completely miss it.
Data mining is limited to manual endeavors – why limit company
success to antiquated methods that by design fail to leverage the data
for all it’s worth? It’s time to usher in new methods – new technologies
– for transforming the enterprise from reactive – based on
guesstimates, hunches, and flawed insight – to proactive – based on
data-driven, actionable insight.
CMMI
Maturity Level 1, called "Initial", is characterized by "Heroic Efforts". The
CMMI identifies
no Process Areas at this level. You automatically achieve this level if you
can design, develop, integrate, and test. Organizations at Maturity Level 1
are sometimes successful, and sometimes not.
Maturity Level 2, called "Managed", is characterized by "Basic Project
Management". The seven Process Areas at Maturity Level 2 all deal with
management, rather than technical issues:
Maturity Level 3, called "Defined", is characterized by "Process
Standardization". This is
where the bulk of the Process Areas reside in the CMMI. We find that these
Process Areas fallinto three main categories:
 Technical – The first five Process Areas (Requirements Development,
Technical Solution, Product Integration, Verification, and Validation) deal
with the technical engineering work.
 Process Management – The next three Process Areas (Organizational
Process Focus, Organizational Process Definition, and Organizational
Training) provide the infrastructure for maintaining and improving the
organization's processes.
 Management – The last six Process Areas (Integrated Product
Management, Risk Management, Integrated Teaming, Integrated
Supplier Management, Decision Analysis & Resolution, and
Organizational Environment for Integration) all build more management
discipline on top of the basic management Process Areas established at
Maturity Level 2.
Maturity Level 4, called "Quantitatively Managed", is characterized by
"Quantitative Management". With the disciplined processes established at
Maturity Levels 2 and 3, the organization is now in the position to be able to
gain a statistical, numbers-based understanding of its performance, and
use that understanding to "manage by fact". The two Process Areas at
Maturity Level 4 (Organizational Process Performance and Quantitative
Project Management) apply this capability for statistical management to
understand the quality of both the processes the organization uses and the
products it produces.
Maturity Level 5, called "Optimizing", is characterized by "Continuous
Process Improvement". Built on the disciplined processes of Maturity
Levels 2 and 3, and the quantitative understanding of Maturity Level 4, the
two Process Areas at Maturity Level 5 (Organizational Innovation &
Deployment and Causal Analysis & Resolution) put the organization on the
path of ever-improving performance by understanding and correcting the
root causes of problems, and by fostering an environment of innovation and
creativity.
Why Do People Believe the CMMI Has Little Value?
The CMM and CMMI have received a lot of bad press over the years. Most
of that bad press can be traced to one of two things: misunderstandings
and abuses.
Misunderstandings. Many people who open the CMMI book are
immediately overwhelmed by the volume of information: five Maturity
Levels, two Generic Goals, 12 Generic Practices, 25 Process Areas, 55
Specific Goals, 185 Specific Practices, hundreds of Sub-Practices—nearly
a thousand pages in all! It is hard to blame them for feeling that this model
must be way too restrictive to be applicable to a real-life organization.
Naturally, if your organization is not under a mandate to achieve a
Maturity Level rating, then the Practices, and even the Goals in the CMMI
take on more of a suggestive flavor. Of course, any organization would do
well to take them as exceedingly strong suggestions, given the CMMI’s
solid research basis!
Abuses. As we said at the beginning of this paper, the SEI designed the
CMMI to be a roadmap for process improvement. But what we have seen
in practice is organizations requiring their suppliers to achieve specific
Maturity Level ratings. This in turn causes those suppliers to turn to the
CMMI simply to achieve a rating, even if they have little or no interest in
process improvement.
When the CMMI is used by an organization that has no interest in process
improvement, its use can (and often does) become abuse. Processes are
written solely to satisfy a CMMI Appraiser, but with little or no thought for
how they will affect the organization's work. Paperwork grows seemingly
without bounds, and people feel that they are drowning in "process for
process' sake".
Those five steps seem easy enough. But organizational change actually
involves much more work than the simple mechanics of deciding to make a
change. The key players in the organization must all agree on the need for
change, as well as the strategy to be employed. Garnering the necessary
agreement and establishing momentum are major challenges in and of
themselves. But those are topics for another white paper.
How can CMMI help?
• CMMI provides a way to focus and manage hardware and software
development from product inception through deployment and
maintenance.
– ISO/TL9000 are still required. CMMI interfaces well with them.
CMMI and TL are complementary - both are needed since they
address different aspects.
• ISO/TL9000 is a process compliance standard
• CMMI is a process improvement model
• Behavioral changes are needed at both management and staff levels.
Examples:
– Increased personal accountability
– Tighter links between Product Management, Development, SCN,
etc.
• Initially a lot of investment required – but, if properly managed, we will
be more efficient and productive while turning out products with
consistently higher quality.
CMMI Models within the Framework
• Models:
– Systems Engineering + Software Engineering (SE/SW)
– Systems Engineering + Software Engineering + Integrated Product
and Process Development (IPPD)
– Systems Engineering + Software Engineering + Integrated Product
and Process Development + Supplier Sourcing (SS)
– Software Engineering only
• Representation options:
– Staged
– Continuous
• The CMMI definition of “Systems Engineering” “The interdisciplinary approach governing the total technical and
managerial effort required to transform a set of customer needs,
expectations and constraints into a product solution and to support that
solution throughout the product’s life.” This includes both hardware and
software.
Maturity Level 1: Initial
• Maturity Level 1 deals with performed processes.
• Processes are unpredictable, poorly controlled, reactive.
• The process performance may not be stable and may not meet specific
objectives such as quality, cost, and schedule, but useful work can be
done.
Maturity Level 2 : Managed at the Project Level
• Maturity Level 2 deals with managed processes.
• A managed process is a performed process that is also:
– Planned and executed in accordance with policy
– Employs skilled people
– Adequate resources are available
– Controlled outputs are produced
– Stakeholders are involved
– The process is reviewed and evaluated for adherence to
requirements
• Processes are planned, documented, performed, monitored, and
controlled at the project level. Often reactive.
• The managed process comes closer to achieving the specific
objectives such as quality, cost, and schedule.
Maturity Level 3 : Defined at the Organization Level
• Maturity Level 3 deals with defined processes.
• A defined process is a managed process that:
– Well defined, understood, deployed and executed across
the entire organization. Proactive.
– Processes, standards, procedures, tools, etc. are defined
at the organizational (Organization X ) level. Project or
local tailoring is allowed, however it must be based on the
organization’s set of standard processes and defined per
the organization’s tailoring guidelines.
• Major portions of the organization cannot “opt out.”
Behaviors at the Five Levels
CMMI Components
• Within each of the 5 Maturity Levels, there are basic functions that need
to be performed – these are called Process Areas (PAs).
• For Maturity Level 2 there are 7 Process Areas that must be completely
satisfied.
• For Maturity Level 3 there are 11 Process Areas that must be completely
satisfied.
• Given the interactions and overlap, it becomes more efficient to work the
Maturity Level 2 and 3 issues concurrently.
• Within each PA there are Goals to be achieved and within each Goal
there are Practices, work products, etc. to be followed that will support
each of the Goals.
CMMI Process Areas
Example
For the Requirements Management Process Area:
An example Goal (required):
“Manage Requirements”
An example Practice to support the Goal (required):
“Maintain bi-directional traceability of requirements”
Examples (suggested, but not required) of typical Work Products
might be
Requirements traceability matrix or
Requirements tracking system
Yet another CMMI term: Institutionalization
• This is the most difficult part of CMMI implementation and
the portion where managers play the biggest role and have
the biggest impact
• Building and reinforcement of corporate culture that supports
methods, practices and procedures so they are the ongoing
way of business……..
– Must be able to demonstrate institutionalization of all
CMMI process areas for all organizations, technologies,
etc.
• Required for all Process Areas
Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Delhi,
Chennai and Banglore. The Sales Manager wants quarterly sales
report. Each branch has a separate operational system.
Solution 1:ABC Pvt Ltd.
 Extract sales information from each database.
 Store the information in a common repository at a single site.
Scenario 2
One Stop Shopping Super Market has huge operational
database.Whenever Executives wants some report the OLTP
system becomes slow and data entry operators have to wait for
some time.
Solution 2
 Extract data needed for analysis from operational database.
 Store it in warehouse.
 Refresh warehouse at regular interval so that it contains up
to date information for analysis.
 Warehouse will contain data with historical perspective.
Scenario 3
Cakes & Cookies is a small,new company.President of the
company wants his company should grow.He needs information
so that he can make correct decisions.
Solution 3
 Improve the quality of data before loading it into the
warehouse.
 Perform data cleaning and transformation before
loading the data.
 Use query analysis tools to support adhoc
queries.
What is Data Warehouse??
Inmons’s definition
A data warehouse is
-subject-oriented,
-integrated,
-time-variant,
-nonvolatile
collection of data in support of management’s
decision making process.
Subject-oriented
 Data warehouse is organized around subjects such as
sales,product,customer.
 It focuses on modeling and analysis of data for decision
makers.
 Excludes data not useful in decision support process.
Integration
 Data Warehouse is constructed by integrating multiple
heterogeneous sources.
 Data Preprocessing are applied to ensure consistency.
 In terms of data.
– encoding structures.
– Measurement of attributes.
– physical attribute of data
– naming conventions.
– Data type format
Time-variant
 Provides information from historical perspective e.g. past 510 years
 Every key structure contains either implicitly or explicitly an
element of time
Nonvolatile
 Data once recorded cannot be updated.
 Data warehouse requires two operations in
data accessing
– Initial loading of data
– Access of data
Operational v/s Information System
Features
Operational
Information
Characteristics
Operational processing Informational processing
Orientation
Transaction
Analysis
User
Clerk,DBA,database
professional
Knowledge workers
Function
Day to day operation
Decision support
Data
Current
Historical
View
Detailed,flat relational
Summarized,
multidimensional
DB design
Application oriented
Subject oriented
Unit of work
Short ,simple
transaction
Complex query
Access
Read/write
Mostly read
Features
Operational
Information
Focus
Data in
Information out
N0. of rec. accessed
tens
millions
Number of users
thousands
hundreds
DB size
100MB to GB
100 GB to TB
Priority
High prformnc,high
availability
High flexibility,end-user
autonomy
Metric
Transaction throughput
Query througput
Operational v/s Information System
Data Warehouse
Architecture
 Data Warehouse server
– almost always a relational DBMS,rarely flat files
 OLAP servers
– to support and operate on multi-dimensional data
structures
 Clients
– Query and reporting tools
– Analysis tools
– Data mining tools
Data Warehouse Schema
 Star Schema
 Fact Constellation Schema
 Snowflake Schema
Star Schema
 A single,large and central fact table and one table for each
dimension.
 Every fact points to one tuple in each of the dimensions and
has additional attributes.
 Does not capture hierarchies directly.
SnowFlake Schema
 Variant of star schema model.
 A single,large and central fact table and one or more tables
for each dimension.
 Dimension tables are normalized i.e. split dimension table
data into additional tables
Fact Constellation
 Multiple fact tables share dimension tables.
 This schema is viewed as collection of stars hence called
galaxy schema or fact constellation.
 Sophisticated application requires such schema.
Building Data Warehouse
 Data Selection
 Data Preprocessing
– Fill missing values
– Remove inconsistency
 Data Transformation & Integration
 Data Loading
Data in warehouse is stored in form of fact tables and
dimension tables.
Case Study
 Afco Foods & Beverages is a new company which produces
dairy,bread and meat products with production unit located
at Baroda.
 There products are sold in North,North West and Western
region of India.
 They have sales units at Mumbai, Pune , Ahemdabad ,Delhi
and Baroda.
 The President of the company wants sales information.
Sales Information
Sales Measures & Dimensions
 Measure – Units sold, Amount.
 Dimensions – Product,Time,Region.
Sales Data Warehouse Model
Sales Data Warehouse Model
Online Analysis Processing(OLAP)
 It enables analysts, managers and executives to gain insight
into data through fast, consistent, interactive access to a wide
variety of possible views of information that has been
transformed from raw data to reflect the real dimensionality of
the enterprise as understood by the user.
OLAP
Server
 An OLAP Server is a high capacity,multi user data
manipulation engine specifically designed to support and
operate on multi-dimensional data structure.
 OLAP server available are
– MOLAP server
– ROLAP server
– HOLAP server
Data Warehousing includes
 Build Data Warehouse
 Online analysis processing(OLAP).
 Presentation.
Need for Data Warehousing
 Industry has huge amount of operational data
 Knowledge worker wants to turn this data into useful
information.
 This information is used by them to support strategic
decision making .
 It is a platform for consolidated historical data for analysis.
 It stores data of good quality so that knowledge worker can
make correct decisions.
 From business perspective
-it is latest marketing weapon
-helps to keep customers by learning more about their
needs .
-valuable tool in today’s competitive fast evolving world.
Data Warehousing Tools
 Data Warehouse
– SQL Server 2000 DTS
– Oracle 8i Warehouse Builder
 OLAP tools
– SQL Server Analysis Services
– Oracle Express Server
 Reporting tools
– MS Excel Pivot Chart
– VB Applications
•
•
•
•
•
•
What is Crowdsourcing?
How Crowdsourcing works?
Types of Crowdsourcing
Applications of Crowdsourcing
Benefits & Problems of Crowdsourcing
Video
WHAT IS CROWDSOURCING?
• Crowdsourcing is the process of getting work or funding,
usually online, from a crowd of people.
• The word Crowdsourcing is a combination of Crowd &
Outsourcing
• Definition's:
• Crowdsourcing is the act of outsourcing tasks, traditionally
performed by an employee or contractor, to an undefined,
large group of people or community (a "crowd"), through an
open call.
• Crowdsourcing is an online, distributed problem solving and
production model.
• The term crowd sourcing was first used by Jeff Howe in
2006 in an article for wired magazine.
The Croud Sourcing Process IN EIGHT STEPS
1- Company has a problem
2- Company broadcasts problem online
3- Online “crowd” is asked to give solutions
4- Crows submits Solutions
5- Crowd vets solutions
6- Company rewards winning solvers
7- Company owns winning solutions
8- Company Profits
TYPES OF CROWDSOURCING
• Crowd funding
• The wisdom of the crowd
• Crowdsourcing creative work
• Microwork
CROWD FUNDING
• Crowd funding describes the collective effort of individuals
who network and pool their money, usually via the Internet,
to support efforts initiated by other people or organizations.
This includes disaster relief, startup company funding, free
software development, scientific research and many more.
THE WISDOM OF THE CROWD
• The wisdom of the crowd is the process of taking into
account the collective opinion of a group of individuals rather
than a single expert to answer a question.
CROWDSOURCING CREATIVE WORK
• Creative crowdsourcing spans sourcing creative projects
such as graphic design, architecture, apparel design, writing,
illustration etc.
MICROWORK
• Microwork is a series of small tasks which together comprise
a large unified project, and are completed by many people
over the Internet. Microwork is considered the smallest unit
of work in a virtual assembly line. It is often used where
human intelligence required to complete the task efficiently.
APPLICATIONS OF CROWDSOURCING
• Testing & Refining a
Product
 Netflix
 SellaBand
• Market Research
 Threadless
 Knowledge Management
• Accenture
• Wikipedia
• Customer Service
• My Starbucks ideas
• R&D
• InnoCentive
• P&G Connect &
Develop
• Polling and Voting
• InTrade
 Building a new city
The History / Genesis of Crowd sourcing
1714- Marine Pocket Clock invented
1936- Toyota Holds a Logo Contest
1955- Syd Opera House Architecture Contest
2001- Wikipedia Launched
2002- American Idol Season 1
2005- Youtube Launched
2006- Crowdsourcing term coined
BENEFITS OF CROWDSOURCING
• Problems can be explored at comparatively little cost.
• Payment is by results.
• The organization can tap a wider range of talent than might
be present in its own organization
• Turn customers into designers
• Turn customers into marketers
PROBLEMS WITH CROWDSOURCING
• Quality
• Intellectual property leakage
•
•
•
•
No time constraint
Not much control over development or ultimate product
Ill-will with own employees
Choosing what to crowd source & what to keep in-house
Benefits of Refactoring
The Summary: Refactoring is a huge aid in untangling production code
without breaking it, and in improving its long-term maintainability.
Refactoring helps you achieve:
1. self-documenting code, for better readability and maintainability,
which is pretty much the only kind of code documentation that ever seems
to stay current (Extract Method and Introduce Local allow you to create
function and variable names that are descriptive enough to rarely need
comments). Until you experience readable, self-describing code, you don't
know what you're missing
2. fine-grained encapsulation, for easier debugging and code
reuse: Extract Method automatically determines the parameters needs in
order to create a method from the current selection, and handles them
correctly. You then know exactly what external information the selected
block requires in order to operate. This can be a great aid in untangling
complex code during code reviews or debugging.
3. the generalization of existing code, to make it easier to apply
existing code to a broader range of problems - as youExtract Method,
you can easily replace things like hard-coded constants (perhaps, a
connection string, or a table name) with parameters, thus allowing the
application of proven code to new contexts.
Continues…
Understandability
More straightforward and well organized (factored) code is easier to
understand.
Correctness
It's easier to identify defects by inspection in code that's easier to
understand. Overly complex, poorly structured, Rube Goldberg style code
is much more difficult to inspect for defects. Additionally, well
componentized code with high coherency of components and loose
coupling between components is vastly easier to put under test. Moreover,
smaller, well-formed bits under test makes for less overlap in code
coverage between test cases which makes for faster and more trustworthy
tests (which becomes a self-reinforcing cycle driving toward better and
better tests). As well, more straightforward code tends to be more
predictable and reliable.
Ease of Maintenance and Evolution
Well-factored, high quality, easy to understand common components are
easier to use, extend, and maintain. Many changes to the system are now
easier to make because they have smaller impact and it's more obvious
how to make the appropriate changes.
Refactoring code does have merit on its own just in terms of code quality
and correctness issues, but where refactoring pays off the most is in
maintenance and evolution of the design of the software. Often a good
tactic when adding new features to old, poorly factored code is to refactor
the target code then add the new feature. This often will take less
development effort than trying to add the new feature without refactoring
and it's a handy way to improve the quality of the code base without
undertaking a lot of "pie in the sky" hypothetical advantage refactoring /
redesign work that's hard to justify to management.
Cloud computing
 Definitions of Cloud computing
 Architecture of Cloud computing
 Benefits of Cloud computing
 Opportunities of Cloud Computing
 Cloud computing – Google Apps
 Grid computing vs Cloud computing
Definitions
 Cloud computing is using the internet to access someone else's
software running on someone else's hardware in someone else's
data center. Lewis Cunningham[2]
 A large-scale distributed computing paradigm that is driven by
economies of scale, in which a pool of abstracted, virtualized,
dynamically scalable, managed computing power, storage,
platforms, and services are delivered on demand to external
customers over the Internet. Ian Foster[9]
 A Cloud is a type of parallel and distributed system consisting of a
collection of interconnected and virtualized computers that are
dynamically provisioned and presented as one or more unified
computing resources based on service-level agreements established
through negotiation between the service provider and consumers.
Rajkumar Buyya[10]
Architecture of Cloud computing
Essential Characteristics[7]
 On-demand self-service.
 A consumer can unilaterally provision computing capabilities such
as server time and network storage as needed automatically,
without requiring human interaction with a service provider.
 Broad network access.
 Capabilities are available over the network and accessed through
standard mechanisms that promote use by heterogeneous thin or
thick client platforms (e.g., mobile phones, laptops, and PDAs) as
well as other traditional or cloudbased software services.
 Resource pooling.
 The provider’s computing resources are pooled to serve multiple
consumers using a multi-tenant model, with different physical and
virtual resources dynamically assigned and reassigned according
to consumer demand.
 Rapid elasticity.
 Capabilities can be rapidly and elastically provisioned - in some
cases automatically - to quickly scale out; and rapidly released to
quickly scale in.
 To the consumer, the capabilities available for provisioning often
appear to be unlimited and can be purchased in any quantity at
any time.
 Measured service.
 Cloud systems automatically control and optimize resource usage
by leveraging a metering capability at some level of abstraction
appropriate to the type of service.
 Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the
service.
Cloud Service Models
SPI Model
 Cloud Software as a Service (SaaS)
 Cloud Platform as a Service (PaaS)
 Cloud Infrastructure as a Service (IaaS)
Infrastructure as a Service (IaaS)
 The capability provided to the consumer is to provision processing,
storage, networks, and other fundamental computing resources.
 Consumer is able to deploy and run arbitrary software, which can
include operating systems and applications.
 The consumer does not manage or control the underlying cloud
infrastructure but has control over operating systems, storage,
deployed applications, and possibly limited control of select
networking components (e.g., host firewalls).
Software as a Service (SaaS)
 The capability provided to the consumer is to use the provider’s
applications running on a cloud infrastructure.
 The applications are accessible from various client devices through a
thin client interface such as a web browser (e.g., web-based email).
 The consumer does not manage or control the underlying cloud
infrastructure including network, servers, operating systems, storage,
or even individual application capabilities, with the possible exception
of limited user specific application configuration settings.
Cloud Deployment Models
 Public Cloud.
 Private Cloud.
 Community Cloud.
 Hybrid Cloud.
Public Cloud
 The cloud infrastructure is made available to the general public or a
large industry group and is owned by an organization selling cloud
services.
Private Cloud
 The cloud infrastructure is operated solely for a single organization. It
may be managed by the organization or a third party, and may exist onpremises or off-premises.
Community Cloud
 The cloud infrastructure is shared by several organizations and
supports a specific community that has shared concerns (e.g.,
mission, security requirements, policy, or compliance considerations).
It may be managed by the organizations or a third party and may
exist on-premises or off-premises.
Hybrid Cloud
 The cloud infrastructure is a composition of two or more clouds
(private, community, or public) that remain unique entities but are
bound together by standardized or proprietary technology that
enables data and application portability (e.g., cloud bursting for loadbalancing between clouds).
Benefits of Cloud Computing





Business Benefits
Almost zero upfront infrastructure investment
Just-in-time Infrastructure
More efficient resource utilization
Usage-based costing
Reduced time to market
Technical Benefits
 Automation – “Scriptable infrastructure”
 Auto-scaling
 Proactive Scaling
 More Efficient Development lifecycle
 Improved Testability
 Disaster Recovery and Business Continuity
Opportunities of Cloud Computing
 End consumers.
 Business customers.
 Developers and Independent Software Vendors (ISVs).
Google App Engine
 Google App Engine enables you to build web applications on the
same scalable systems that power Google applications. App Engine
applications
are easy to
build, easy to
maintain, and
easy to scale as your traffic and data storage needs grow.
 Cost  ?
 Pay only for what you actually use.
 Exceed the free quota of 500 MB of storage and around 5M
pageviews per month.
 Trial? 
How to Create applications for Cloud computing?
 build an App Engine application using standard Java web
technologies, such as servlets and JSP.
 create an App Engine Java project with Eclipse use the Google
Plugin for Eclipse for App Engine development.
 use the App Engine datastore with the Java Data Objects (JDO)
standard interface.
 upload your app to App Engine.
Grid computing vs Cloud computing
Cloud
 Increase computing.
 Increase store.
 consumption basis.
 IBM, Google, Microsoft
 Hour, storage, view…
Grid





Increase computing.
Increase store.
project-oriented
academia or gov. labs
number of service units
Collective: interactions across collections of resources, directory services
Platform: collection of specialized tools, middleware and services on top of
the unified resources toprovide a development and/or deployment platform.
Unified Resources: resources that have been abstracted/encapsulated
Resource: discovery, negotiation, monitoring, accounting and payment of
sharing operations on individual resources
Connectivity: communication and authentication protocols
Application
 Grid Computing emerged in eScience to solve scientific problems
requiring HPC.
 Cloud Computing is rather oriented towards applications that run
permanently and have varying demand for physical resources while
running.
 the well-known CRM SaaS Salesforce.com.