Download DM – SS solutions

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Q1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b)
Integrated, (c) Time variant and (d) Non Volatile (15M)
Data Warehousing is a program dedicated to the delivery of information, which advances
decision making, improves business practices and enables knowledge workers.
It usually contains historical data derived from transaction data, but it can include data from
other sources. It separates analysis workload from transaction workload and enables an
organization to consolidate data from several sources.
It plays a functional role in any organization in form of analytical tool
More generally, data warehousing is a collection of decision support technologies, aimed at
enabling the knowledge worker, such as executive, manager, and analyst, to arrive at better
and faster decisions
Data warehouses provide access to data for complex analysis, knowledge discovery, and
In addition to a relational database, a data warehouse environment includes an extraction,
transportation, transformation, and loading (ETL) solution, an online analytical processing
(OLAP) engine, client analysis tools, and other applications that manage the process of
gathering data and delivering it to business users.
The fundamental characteristics of a data warehouse are:
Subject Oriented : Data organized by subject
Integrated : Consistency of defining parameters
Time variant : Timeliness of data and access terms
Non Volatile : Stable data storage medium
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about
your company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last
year?" This ability to define a data warehouse by subject matter, sales in this case, makes
the data warehouse subject oriented.
A data warehouse is organized around high-level business groupings called subjects. They
do not have the same atomic entity focus as OLTP systems.
The data in the warehouse must be integrated and consistent. That is, if two different
source systems store conflicting data about entities, or attributes of an entity, the
differences need to be resolved during the process of transforming the source data and
loading it into the data warehouse.
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming
conflicts and inconsistencies among units of measure. When they achieve this, they are said
to be integrated.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's
focus on change over time is what is meant by the term time variant.
Typically, data flows from one or more online transaction processing (OLTP) databases into
a data warehouse on a monthly, weekly, or daily basis. The data is normally processed in
a staging file before being added to the data warehouse. Data warehouses commonly range
in size from tens of gigabytes to a few terabytes. Usually, the vast majority of the data is
stored in a few very large fact tables.
Non Volatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is
logical because the purpose of a warehouse is to enable you to analyze what has occurred.
The content of OLTP systems are, by their nature, continuously changing. Inserts, deletes,
and updates form the basis of a large volume of business transactions that result in a very
volatile set of data. By contrast, data warehouses are static. The data in the warehouse is
read-only; updates or refresh of the data occur on a periodic incremental or full refresh
Q2. Describe in detail the three major database models with suitable example. What is
ODBMS? How it is similar to ORDBMS? (15M)
A database model is a type of data model that determines the logical structure of a database
and fundamentally determines in which manner data can be stored, organized, and
manipulated. The most popular example of a database model is the relational model, which
uses a table-based format.
Flat model
Flat File Model.
Main articles: Flat file database and Spreadsheet
The flat (or table) model consists of a single, two-dimensional array of data elements, where all
members of a given column are assumed to be similar values, and all members of a row are
assumed to be related to one another. For instance, columns for name and password that
might be used as a part of a system security database. Each row would have the specific
password associated with an individual user. Columns of the table often have a type associated
with them, defining them as character data, date or time information, integers, or floating point
numbers. This tabular format is a precursor to the relational model.
Hierarchical model
Hierarchical Model.
Main article: Hierarchical model
In a hierarchical model, data is organized into a tree-like structure, implying a single parent for
each record. A sort field keeps sibling records in a particular order. Hierarchical structures were
widely used in the early mainframe database management systems, such as the Information
Management System (IMS) by IBM, and now describe the structure of XML documents. This
structure allows one one-to-many relationship between two types of data. This structure is very
efficient to describe many relationships in the real world; recipes, table of contents, ordering of
paragraphs/verses, any nested and sorted information.
This hierarchy is used as the physical order of records in storage. Record access is done by
navigating through the data structure using pointers combined with sequential accessing.
Because of this, the hierarchical structure is inefficient for certain database operations when a
full path (as opposed to upward link and sort field) is not also included for each record. Such
limitations have been compensated for in later IMS versions by additional logical hierarchies
imposed on the base physical hierarchyNetwork model
Relational model
Main article: Relational model
The relational model was introduced by E.F. Codd in 1970[1] as a way to make database
management systems more independent of any particular application. It is a mathematical
model defined in terms of predicate logic and set theory, and systems implementing it have
been used by mainframe, midrange and microcomputer systems.
The products that are generally referred to as relational databases in fact implement a model
that is only an approximation to the mathematical model defined by Codd. Three key terms are
used extensively in relational database models: relations, attributes, and domains. A relation is
a table with columns and rows. The named columns of the relation are called attributes, and
the domain is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a
particular entity (say, an employee) is represented in rows (also called tuples) and columns.
Thus, the "relation" in "relational database" refers to the various tables in the database; a
relation is a set of tuples. The columns enumerate the various attributes of the entity (the
employee's name, address or phone number, for example), and a row is an actual instance of
the entity (a specific employee) that is represented by the relation. As a result, each tuple of
the employee table represents various attributes of a single employee.
· Ease of use: The revision of any information as tables consisting 0f rows and columns is quite
· Flexibility: Different tables from which information has to be linked and extracted can be
easily manipulated by operators such as project and join to give information in the form in
· Security: Security control and authorization can also be implemented more easily by moving
sensitive attributes in a given table into a separate relation with its own authorization controls.
If authorization requirement permits, a particular attribute could be joined back with others to
· Data Independence: Data independence is achieved more easily with normalization structure
used in a relational database than in the more complicated tree or network structure.
· Hardware overheads: relational database systems hide the implementation complexities and
the physical data storage details from the user. For doing this, the relational database system
· Ease of design can lead to bad design: the relational database is easy to design and use. The
user needs not to know the complexities of the data storage. This ease of design and use can
lead to the development and implementation of the very poorly designed database
management system.
Object-oriented database models
Example of an Object-Oriented Model.
Main articles: Object-relational model and Object model
In the 1990s, the object-oriented programming paradigm was been applied to database
technology, creating a new database model known as object databases. This aims to avoid the
object-relational impedance mismatch - the overhead of converting information between its
representation in the database (for example as rows in tables) and its representation in the
application program (typically as objects). Even further, the type system used in a particular
application can be defined directly in the database, allowing the database to enforce the same
data integrity invariants. Object databases also introduce the key ideas of object programming,
such as encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried[by whom?]for storing objects in a database. Some[which?]
products have approached the problem from the application programming end, by making the
objects manipulated by the program persistent. This typically requires the addition of some
kind of query language, since conventional programming languages do not have the ability to
find objects based on their information content. Others[which?] have attacked the problem from
the database end, by defining an object-oriented data model for the database, and defining a
database programming language that allows full programming capabilities as well as traditional
query facilities.
Network Model.
Main article: Network model
The network model expands upon the hierarchical structure, allowing many-to-many
relationships in a tree-like structure that allows multiple parents. It was the most popular
before being replaced by the relational model, and is defined by the CODASYL specification.
The network model organizes data using two fundamental concepts, called records and sets.
Records contain fields (which may be organized hierarchically, as in the programming language
COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships
between records: one owner, many members. A record may be an owner in any number of
sets, and a member in any number of sets.
A set consists of circular linked lists where one record type, the set owner or parent, appears
once in each circle, and a second record type, the subordinate or child, may appear multiple
times in each circle. In this way a hierarchy may be established between any two record types,
e.g., type A is the owner of B. At the same time another set may be defined where B is the
owner of A. Thus all the sets comprise a general directed graph (ownership defines a direction),
or network construct. Access to records is either sequential (usually in each record type) or by
navigation in the circular linked lists.
Most of the modern programming languages are object oriented, while most of the
mainstream databases - relational. So programmer has to seat at two chairs and work with
two data models - relational and object. It significantly complicates design of application,
because system architect has two work with different notions representing the same
An object database (also object-oriented database management system) is a database
management system in which information is represented in the form of objects as used
in object-oriented programming. Object databases are different from relational
databases which are table-oriented
ODBMS supports the modeling and creation of data as objects. This includes some kind of
support for classes of objects and the inheritance of class properties and methods by
subclasses and their objects.
Features of ODBMS
Object is the basic notion in object oriented system. It is basic unit of storing data in the
database. Each object has unique OID which is automatically generated by the system.
Direct representation of references between objects.
Support of inheritance and polymorphism.
Tight integration with at least one object oriented programming language
Support of traditional DBMS features: ACID transactions, backups, import-export utilities,
scheme evaluation etc
An object-relational database (ORD), or object-relational database management
system (ORDBMS), is a database management system (DBMS) similar to a relational
database, but with an object-oriented database model: objects, classes and inheritance are
directly supported in database schemas and in the query language. In addition, just as with
proper relational systems, it supports extension of the data model with custom datatypes and methods.
An object-relational database can be said to provide a middle ground between relational
databases and object-oriented databases (OODBMS). In object-relational databases, the
approach is essentially that of relational databases: the data resides in the database and is
manipulated collectively with queries in a query language
Q3. How does partitioning (both vertical and horizontal) provide data granularity? What is
the advantage of creating granular database for a typical retail enterprise or an airline
company? (15M)
A partition is a division of a logical database or its constituting elements into distinct
manageability, performance or availability reasons.
The partitioning can be done by either building separate smaller databases (each with its
own tables, indices, and transaction logs), or by splitting selected elements, for example just
one table.
Horizontal partitioning
It involves putting different rows into different tables.
E.g. customers with ZIP codes less than 5000 are stored in CustomersEast, while customers
with ZIP codes greater than or equal to 5000 are stored in CustomersWest.
The two partition tables are then CustomersEast and CustomersWest, while a view with a
union might be created over both of them to provide a complete view of all customers.
Vertical partitioning
It involves creating tables with fewer columns and using additional tables to store the
remaining columns. Normalization also involves this splitting of columns across tables, but
vertical partitioning goes beyond that and partitions columns even when already
Different physical storage might be used to realize vertical partitioning as well; storing
infrequently used or very wide columns on a different device, for example, is a method of
vertical partitioning. Done explicitly or implicitly, this type of partitioning is called "row
splitting" (the row is split by its columns).
A common form of vertical partitioning is to split dynamic data (slow to find) from static
data (fast to find) in a table where the dynamic data is not used as often as the static.
Creating a view across the two newly created tables restores the original table with a
performance penalty, however performance will increase when accessing the static data
e.g. for statistical analysis.
Data Granularity
The granularity of data refers to the fineness with which data fields are sub-divided. For
example, a postal address can be recorded, with low granularity, as a single field:
1. address = 200 2nd Ave. South #358, St. Petersburg, FL 33701-4313 USA
or with high granularity, as multiple fields:
street address = 200 2nd Ave. South #358
city = St. Petersburg
postal code = FL 33701-4313
country = USA
Higher granularity has overheads for data input and storage. It does however offer benefits in
flexibility of data processing in treating each data field in isolation if required. A performance
problem caused by excessive granularity may not reveal itself until scalability becomes an issue.
As stated in the above example partitioning in database can be used to make finely granular for
the ease of processing.
When the address is broken down into different columns its vertical partitioning
Again based on specific criteria the data in a single table can be broken down to multiple
partition tables. This is horizontal partitioning.
Advantage of using Granular database in Retail/Airline application.
1. Having granular database assist in file grained parallelism. This means individual tasks are
relatively small in terms of code size and execution time.
2. The finer the granularity, the greater the potential for parallelism and hence speed-up, but
the greater the overheads of synchronization and communication.
3. Since the an airline and retail application would be accessed by many users across different
locations best parallel performance can be attained by the best balance between load and
communication overhead. If the granularity is too fine, the performance can suffer from the
increased communication overhead. On the other side, if the granularity is too coarse, the
performance can suffer from load imbalance.
Q4. What are virtual machines? Explain briefly its working mechanism with suitable example.
List two advantages of using virtual machines in our day to day business / work environment.
A virtual machine is a tightly isolated software container that can run its own operating
systems and applications as if it were a physical computer. A virtual machine behaves
exactly like a physical computer and contains it own virtual (ie, software-based) CPU,
RAM hard disk and network interface card (NIC).
An operating system can’t tell the difference between a virtual machine and a physical
machine, nor can applications or other computers on a network. Even the virtual
machine thinks it is a “real” computer. Nevertheless, a virtual machine is composed
entirely of software and contains no hardware components whatsoever. As a result,
virtual machines offer a number of distinct advantages over physical hardware.
Working Mechanism
Compatibility - Just like a physical computer, a virtual machine hosts its own guest
operating system and applications, and has all the components found in a physical
computer (motherboard, VGA card, network card controller, etc). As a result, virtual
machines are completely compatible with all standard x86 operating systems,
applications and device drivers, so you can use a virtual machine to run all the same
software that you would run on a physical x86 computer.
Isolation - While virtual machines can share the physical resources of a single computer,
they remain completely isolated from each other as if they were separate physical
machines. If, for example, there are four virtual machines on a single physical server and
one of the virtual machines crashes, the other three virtual machines remain available.
Isolation is an important reason why the availability and security of applications running
in a virtual environment is far superior to applications running in a traditional, nonvirtualized system.
Encapsulation - A virtual machine is essentially a software container that bundles or
“encapsulates” a complete set of virtual hardware resources, as well as an operating
system and all its applications, inside a software package. Encapsulation makes virtual
machines incredibly portable and easy to manage. For example, you can move and copy
a virtual machine from one location to another just like any other software file, or save a
virtual machine on any standard data storage medium, from a pocket-sized USB flash
memory card to an enterprise storage area networks (SANs).
Hardware Independence - Virtual machines are completely independent from their
underlying physical hardware. For example, you can configure a virtual machine with
virtual components (eg, CPU, network card, SCSI controller) that are completely
different from the physical components that are present on the underlying hardware.
Virtual machines on the same physical server can even run different kinds of operating
systems (Windows, Linux, etc).
When coupled with the properties of encapsulation and compatibility, hardware
independence gives you the freedom to move a virtual machine from one type of x86
computer to another without making any changes to the device drivers, operating
system, or applications. Hardware independence also means that you can run a
heterogeneous mixture of operating systems and applications on a single physical
Q5. How does a web application work? Explain with suitable diagram how a single tier and
multi tier application work? What is two phase commit? How does rollback, roll forward and
commit work?
How Web application work.
 Technology that separates computers and application software into two categories clients,
and servers to better employ available computing resources and share data processing loads.
This model was developed at Xerox PARC during the 1970s.
 The model assigns one of two roles to the computers in a network: Client or server.
 A server is a computer system that selectively shares its resources. It might provide highvolume storage capacity, heavy data crunching, and/or high resolution graphics.
 Client is a computer or computer program that initiates contact with a server in order to
make use of a resource. Data, CPUs, printers, and data storage devices are some examples of
resources. A client computer provides the user interaction-facility (interface) and some or all
application processing.
 Typically, several client computers are connected through a network (or networks) to a server
which could be a large PC, minicomputer, or a mainframe computer
The following are the examples of client/server architectures.
Two tier architectures
 Two-tier architecture is where a client talks directly to a server, with no intervening server. It
is typically used in small environments(less than 50 users).
 In two tier client/server architectures, the user interface is placed at user's desktop
environment and the database management system services are usually in a server that is a
more powerful machine that provides services to the many clients.
 Information processing is split between the user system interface environment and the
database management server environment.
Three tier architectures
 Three tier architecture was introduced to overcome the drawbacks of the two tier
architecture. In the three tier architecture, a middleware is used between the user system
interface client environment and the database management server environment.
 These middleware are implemented in a variety of ways such as transaction processing
monitors, message servers or application servers. The middleware perform the function of
queuing, application execution and database staging. In addition the middleware adds
scheduling and prioritization for work in progress.
 The three tier client/server architecture is used to improve performance for large number of
users and also improves flexibility when compared to the two tier approach.
 The drawback of three tier architectures is that the development environment is more
difficult to use than the development of two tier applications.
 The widespread use of the term 3-tier architecture also denotes the following architectures:
o Application sharing between a client, middleware and enterprise server
o Application sharing between a client, application server and enterprise database
Three tier with message server.
In this architecture, messages are processed and prioritized asynchronously. Messages have
headers that include priority information, address and identification number. The message
server links to the relational DBMS and other data sources. Messaging systems are alternative
for wireless infrastructures.
Three tier with an application server
This architecture allows the main body of an application to run on a shared host rather than in
the user system interface client environment. The application server shares business logic,
computations and a data retrieval engine. In this architecture applications are more scalable
and installation costs are less on a single server than maintaining each on a desktop client.
3-tier architecture provides
 A greater degree of flexibility
 Increased security, as security can be defined for each service, and at each level
 Increased performance, as tasks are shared between servers
Two Phase Commit
A commit operation is, by definition, an all-or-nothing affair. If a series of operations bound as a
transaction cannot be completed, the rollback must restore the system (or cooperating
systems) to the pre-transaction state.
In order to ensure that a transaction can be rolled back, a software system typically logs each
operation, including the commit operation itself. A transaction/recovery manager uses the log
records to undo (and possibly redo) a partially completed transaction.
When a transaction involves multiple distributed resources, for example, a database server on
each of two different network hosts, the commit process is somewhat complex because the
transaction includes operations that span two distinct software systems, each with its own
resource manager, log records, and so on. (In this case, the distributed resources are the
database servers.)
Two-phase commit is a transaction protocol designed for the complications that arise with
distributed resource managers. With a two-phase commit protocol, the distributed transaction
manager employs a coordinator to manage the individual resource managers.
The commit process proceeds as follows:
Phase 1
Each participating resource manager coordinates local operations and forces all log records out:
If successful, respond "OK"
If unsuccessful, either allow a time-out or respond "OOPS"
Phase 2
If all participants respond "OK":
Coordinator instructs participating resource managers to "COMMIT"
Participants complete operation writing the log record for the commit
Coordinator instructs participating resource managers to "ROLLBACK"
Participants complete their respective local undos
Roll Back
DB server reads back over the transaction log entries for the transaction that needs to be rolled
back and generates compensating operations (operations that reverse the effect of each logged
change) which it then logs and executes.
Roll Forward
It is also possible to keep a separate journal of all modifications to a database (sometimes
called after images). This is not required for rollback of failed transactions but it is useful for
updating the database in the event of a database failure, so some transaction-processing
systems provide it. If the database fails entirely, it must be restored from the most recent backup. The back-up will not reflect transactions committed since the back-up was made. However,
once the database is restored, the journal of after images can be applied to the database
(rollforward) to bring the database up to date. Any transactions in progress at the time of the
failure can then be rolled back. The result is a database in a consistent, known state that
includes the results of all transactions committed up to the moment of failure.
Commit is exact opposite of Rollback transaction. In this case the the record is saved such the
changed data is available to all other users and transaction log entries are deleted.
Q6. Explain each of following with suitable example. (a) Multi-Processing (b)Multi Tasking (c)
Multi Threading (d)Multi programming (15M)
Multi Processing
Multiprocessing is the coordinated processing of programs by more than one computer
processor. Multiprocessing is a general term that can mean the dynamic assignment of a
program to one of two or more computers working in tandem or can involve multiple
computers working on the same program at the same time (in parallel).
With the advent of parallel processing, multiprocessing is divided into symmetric
multiprocessing (SMP) and massively parallel processing (MPP).
In symmetric (or "tightly coupled") multiprocessing, the processors share memory and the
I/O bus or data path. A single copy of the operating system is in charge of all the processors.
SMP, also known as a "shared everything" system, does not usually exceed 16 processors.
In massively parallel (or "loosely coupled") processing, up to 200 or more processors can
work on the same application. Each processor has its own operating system and memory,
but an "interconnect" arrangement of data paths allows messages to be sent between
processors. Typically, the setup for MPP is more complicated, requiring thought about how
to partition a common database among processors and how to assign work among the
processors. An MPP system is also known as a "shared nothing" system.
Example processing two MS word documents at the same time
Multi Tasking
Multitasking (sometimes incorrectly called multiprocessing) refers to an Operating System's
ability to handle multiple concurrent processes that are launched by different running
In a single processor computer the CPU can execute one task at a time, but the Operating
System manages which task should access the CPU. In a synergy between hardware and
operating system, the CPU is allocated to different processes/applications several times per
second in a process called 'time-slicing'. This enables many programs on your computer at
once with apparent instant user responsiveness, even though the single-core CPU can do
only one thing at a time.
This technology has been around since the 1960's and is not to be confused with Multithreading.
Example : Watching a movie while downloading a song
Multi Threading
- Multi-threading is the program's ability to break itself down to multiple concurrent threads
that can be executed separately by the computer.
- A multiprocessor computer can run two or more of the threads at a time, which means that
the program "runs faster" on a multiprocessor machine than a single-processor machine.
- On a single processor machine, while a multi-threaded program will run no faster, a multithreaded application can appear to be more responsive to user interaction, because the
operating system can give the illusion that multiple activities within the same program are
running at the same time.
- "Traditional" single-thread applications cannot make use of two processors; therefore they
don't run faster on multiprocessor machines.
- Example : In a typical chatting application we have two threads running, one listening to
incoming messages and the one pushing the typed messages over the network
Multi Programming
Early computers ran one process at a time. While the process waited for servicing by
another device, the CPU was idle. In an I/O intensive process, the CPU could be idle as much
as 80% of the time.
Advancements in operating systems led to computers that load several independent
processes into memory and switch the CPU from one job to another when the first becomes
blocked while waiting for servicing by another device.
This idea of multiprogramming reduces the idle time of the CPU. Multiprogramming
accelerates the throughput of the system by efficiently using the CPU time.
Programs in a multiprogrammed environment appear to run at the same time. Processes
running in a multiprogrammed environment are called concurrent processes. In actuality,
the CPU processes one instruction at a time, but can execute instructions from any active
Example: Simultaneously chatting on Yahoo as well as MSN messenger in a single core
desktop machine
Q7. What is an operating system? Name major components of operating system and explain
briefly roles of any two.
 An operating system is the most important software that runs on a computer. It manages the
computer's memory, processes, and all of its software and hardware. It also allows you to
communicate with the computer without knowing how to speak the computer's "language."
 The operating system (OS) is the first thing loaded onto the computer
 Not all computers have operating systems. The computer that controls the microwave oven in your
kitchen, for example, doesn't need an operating system. It has one set of tasks to perform, very
straightforward input to expect (a numbered keypad and a few pre-set buttons) and simple, neverchanging hardware to control.
 For other devices, an operating system creates the ability to:
o serve a variety of purposes
o interact with users in more complicated ways
o keep up with needs that change over time
 Most commonly available families of Operating system developed by Microsoft
o Windows family of operating system
o Macintosh operating systems developed by Apple
o UNIX family of operating systems
Operating System Functions
 It manages the hardware and software resources of the system. In a desktop computer, these
resources include such things as the processor, memory, disk space and more (On a cell phone, they
include the keypad, the screen, the address book, the phone dialer, the battery and the network
 It provides a stable, consistent way for applications to deal with the hardware without having to know
all the details of the hardware.
Types Of Operating system
Real-time operating system
 Real-time operating systems(RTOS) are used to control machinery, scientific instruments and
industrial systems.
 It has very little user-interface capability, and no end-user utilities, since the system will be a "sealed
box" when delivered for use.
 A very important part of an RTOS is managing the resources of the computer so that a particular
operation executes in precisely the same amount of time, every time it occurs.
 In a complex machine, having a part move more quickly just because system resources are available
may be just as catastrophic as having it not move at all because the system is busy.
Single-user, single task
 This operating system is designed to manage the computer so that one user can effectively do one
thing at a time.
 The Palm OS for Palm handheld computers is a good example of a modern single-user, single-task
operating system.
Single-user, multi-tasking
 This is the type of operating system most people use on their desktop and laptop computers today.
 Microsoft's Windows and Apple's MacOS platforms are both examples of operating systems that will
let a single user have several programs in operation at the same time. For example, it's entirely
possible for a Windows user to be writing a note in a word processor while downloading a file from
the Internet while printing the text of an e-mail message.
 A multi-user operating system allows many different users to take advantage of the computer's
resources simultaneously.
 The operating system must make sure that the requirements of the various users are balanced, and
that each of the programs they are using has sufficient and separate resources so that a problem with
one user doesn't affect the entire community of users.
 Unix, VMS and mainframe operating systems, such as MVS, are examples of multi-user operating
Components of Operating System
Process Management
 The operating system manages many kinds of activities ranging from user programs to system
programs like printer spooler, name servers, file server etc. Each of these activities is encapsulated in a
 A process includes the complete execution context (code, data, PC, registers, OS resources in use etc.).
It is important to note that a process is not a program. A process is only ONE instant of a program in
execution. There are many processes can be running the same program.
 The five major activities of an operating system in regard to process management are
o Creation and deletion of user and system processes.
o Suspension and resumption of processes.
A mechanism for process synchronization.
A mechanism for process communication.
A mechanism for deadlock handling.
Main-Memory Management
 Primary-Memory or Main-Memory is a large array of words or bytes. Each word or byte has its own
address. Main-memory provides storage that can be access directly by the CPU. That is to say for a
program to be executed, it must in the main memory.
 The major activities of an operating in regard to memory-management are:
o Keep track of which part of memory are currently being used and by whom.
o Decide which processes are loaded into memory when memory space becomes
o Allocate and de-allocate memory space as needed.
File Management
 A file is a collected of related information defined by its creator. Computer can store files on the disk
(secondary storage), which provide long term storage.
 Some examples of storage media are magnetic tape, magnetic disk and optical disk. Each of these
media has its own properties like speed, capacity, data transfer rate and access methods.
 File systems normally organized into directories to ease their use. These directories may contain files
and other directions.
 The five main major activities of an operating system in regard to file management are
o The creation and deletion of files.
o The creation and deletion of directions.
o The support of primitives for manipulating files and directions.
o The mapping of files onto secondary storage.
o The backup of files on stable storage media.
I/O System Management
 I/O subsystem hides the peculiarities of specific hardware devices from the user. Only the device
driver knows the peculiarities of the specific device to which it is assigned.
Secondary-Storage Management
 Generally speaking, systems have several levels of storage, including primary storage, secondary
storage and cache storage.
 Instructions and data must be placed in primary storage or cache to be referenced by a running
program. Because main memory is too small to accommodate all data and programs, and its data are
lost when power is lost, the computer system must provide secondary storage to back up main
 Secondary storage consists of tapes, disks, and other media designed to hold information that will
eventually be accessed in primary storage (primary, secondary, cache) is ordinarily divided into bytes
or words consisting of a fixed number of bytes. Each location in storage has an address; the set of all
addresses available to a program is called an address space.
 The three major activities of an operating system in regard to secondary storage management are:
o Managing the free space available on the secondary-storage device.
o Allocation of storage space when new files have to be written.
o Scheduling the requests for memory access.
 A distributed system is a collection of processors that do not share memory, peripheral devices, or a
clock. The processors communicate with one another through communication lines called network.
The communication-network design must consider routing and connection strategies, and the
problems of contention and security.
Protection System
 If computer systems has multiple users and allows the concurrent execution of multiple processes,
then the various processes must be protected from one another's activities.
 Protection refers to mechanism for controlling the access of programs, processes, or users to the
resources defined by a computer system.
Command Interpreter System
 A command interpreter is an interface of the operating system with the user. The user gives
commands with are executed by operating system (usually by turning them into system calls).
 The main function of a command interpreter is to get and execute the next user specified command.
Command-Interpreter is usually not part of the kernel, since multiple command interpreters may be
support by an operating system, and they do not really need to run in kernel mode.
 There are two main advantages to separating the command interpreter from the kernel.
o If we want to change the way the command interpreter looks, i.e. I want to change the
interface of command interpreter, I am able to do that if the command interpreter is
separate from the kernel. I cannot change the code of the kernel so I cannot modify the
o If the command interpreter is a part of the kernel it is possible for a malicious process to
gain access to certain part of the kernel that it showed not have to avoid this ugly
scenario it is advantageous to have the command interpreter separate from kernel.
Q8. Write short note on OLAP and OLTP (10M)
OLTP (online transaction processing)
Is a class of program that facilitates and managestransaction-oriented applications, typically for
data entry and retrieval transactions in a number of industries, including banking, airlines,
mailorder, supermarkets, and manufacturers. Probably the most widely installed OLTP product
is IBM's CICS (Customer Information Control System).
Today's online transaction processing increasingly requires support for transactions that span a
network and may include more than one company. For this reason, new OLTP software
uses client/server processing and brokering software that allows transactions to run on
different computer platforms in a network.
OLAP (online analytical processing)
Is computer processing that enables a user to easily and selectively extract and view data from
different points of view. For example, a user can request that data be analyzed to display a
spreadsheet showing all of a company's beach ball products sold in Florida in the month of July,
compare revenue figures with those for the same products in September, and then see a
comparison of other product sales in Florida in the same time period. To facilitate this kind of
analysis, OLAP data is stored in a multidimensional database. Whereas a relational
database can be thought of as two-dimensional, a multidimensional database considers each
data attribute (such as product, geographic sales region, and time period) as a separate
"dimension." OLAP software can locate the intersection of dimensions (all products sold in the
Eastern region above a certain price during a certain time period) and display
them. Attributes such as time periods can be broken down into sub attributes.
OLAP can be used for data mining or the discovery of previously undiscerned relationships
between data items. An OLAP database does not need to be as large as a data warehouse, since
not all transactional data is needed for trend analysis. Using Open Database Connectivity
(ODBC), data can be imported from existing relational databases to create a multidimensional
database for OLAP.
Data Model
& Schema
OLTP stands for On Line Transaction
Processing and is a data modeling
approach typically used to facilitate and
manage usual business applications.
Most of applications you see and use
are OLTP based
OLTP System deals with operational
data. Operational data are those data
involved in the operation of a particular
OLPT requires instant update. When
you cash some money from an ATM
you balance shall be immediately
OLTP perfectly fits traditional entityrelationship or object-oriented models.
We usually refer to information as
attributes related to entities, objects or
classes, like product price, invoice
amount or client name. Mapping can
be with a simple, one argument
OLPT emphasis is on update.
Transaction level isolation assures that
database is always in a consistent state.
This can imply in some overhead to
coordinate concurrent updates but is
necessary even in small applications.
In a banking System, you withdraw
amount through an ATM. Then account
Number,ATM PIN Number,Amount
you are withdrawing, Balance amount
in account etc are operational data
OLAP stands for On Line Analytic Processing
and is an approach to answer multidimensional queries. OLAP was conceived for
Management Information Systems and
Decision Support Systems
What is the Salary of Mr.John?
How is the profit changing over the years
across different regions ?
What is the address and email id of the
person who is the head of maths
OLAP deals with Historical Data or Archival
Data. Historical data are those data that are
archived over a long period of time.
OLAP has not require instant refresh. Nobody
needs instant information to make strategic
business decision.
OLAP solution is use an hybrid approach
based sitting on conventional relational
technology. This model employs so called
star-schema instead of traditional
OLAP can be updated by periodic (daily)
processes that work in standalone mode thus
consistency can be assured through update
If we collect last 10 years data about flight
reservation, The data can give us many
meaningful information such as the trends in
reservation. This may give useful information
like peak time of travel, what kinds of people
are traveling in various classes
Is it financially viable continue the production
unit at location X?
Q9. Write short note on Centralized processing and Decentralized processing (10M)
Centralized processing
Centralized processing environments maintain all data & perform all data processing
at a central location.
Mainframe & large server computing applications are examples of centralized
Decentralized (Distributed) Processing
Decentralized processing occurs when computing power, applications, & "work" is
spread out (or distributed) over many locations (i.e., via a LAN or WAN).
Decentralized processing environments often use distributed processing techniques,
where each remote computer performs a portion of the processing, thus reducing
the processing burden on a central computer.
Distributed systems are workstations placed in geographically remote locations &
linked to a centralized computer.
- Advantages of Centralized Processing
o Data is secured better, once received.
o Processing is consistent.
- Disadvantages of Centralized Processing
o High cost of transmitting large numbers of detailed transactions
o Increased processing power & data storage needs at a central location
o There is a reduction in local accountability.
o Input/output bottlenecks can occur at high traffic times.
o Lack of ability to respond in a timely manner to information requests from remote
Decentralized (Distributed) Processing
DDBMS has many advantages. Data is located near the greatest demand site, access is faster, processing
is faster due to several sites spreading out the work load, new sites can be added quickly and easily,
communication is improved, operating costs are reduced, it is user friendly, there is less danger of a
single-point failure, and it has process independence.
Several reasons why businesses and organizations move to distributed databases include organizational
and economic reasons, reliable and flexible interconnection of existing database, and the future
incremental growth.
Data can physically reside nearest to where it is most often accessed, thus providing users with local
control of data that they interact with. This results in local autonomy of the data allowing users to
enforce locally the policies regarding access to their data.
One might want to consider a parallel architecture is to improve reliability and availability of the data in
a scalable system. In a distributed system, with some careful tact, it is possible to access some, or
possibly all of the data in a failure mode if there is sufficient data replication.
DDBMS also has a few disadvantages.
Managing and controlling is complex, there is less security because data is at so many different sites.
Distributed databases provides more flexible accesses that increase the chance of security violations
since the database can be accessed throughout every site within the network.
The ability to ensure the integrity of the database in the presence of unpredictable failures of both
hardware and software components is also an important features of any distributed database
management systems. The integrity of a database is concerned with its consistency, correctness,
validity, and accuracy. The integrity controls must be built into the structure of software, databases, and
involved personnel.
If there are multiple copies of the same data, then this duplicated data introduces additional complexity
in ensuring that all copies are updated for each update. The notion of concurrency control and
recoverability consume much of the research efforts in the area of distributed database theory.
Increasing in reliability and performance is the goal and not the status quo.