CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE Download

Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
JULY2016
ASSESSMENT_CODE MC0088_JULY2016
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5227
QUESTION_TEXT
Distinguish the features between OLTP and OLAP
SCHEME OF
EVALUATION
Users and sytem orientatioin: An OLTP system is customer oriented and
is used for transaction and query used for transacrion and
query processing by clerks,clients and information technology
professionals. An OLAP system is market oriented and is usd for data
analysis by knowledge workers, including managers, executives and
analysts. (2 marks)
Data contents: An OLTP system managers current data that typically are
too detailed to be easily used for decision making. An OLTP system
managers large amounts of historiacl data, provides facilities for
summarization and aggregation and stores and managers information at
different levels of granularity. These features make the data easier to use
in informed decision making. (2 marks)
Database design:An OLTP system usually adopts an entity relationship
data model and an application oriented database design. An OLAP
system typically adopts either a star or snowflake model and subject –
oriented database design. (2 marks)
View:An OLTP system focuses mainly on the current data within an
enterprise or department without referring to historical data or data
in different organizations. In contrast OLAP system often spans multiple
versions of a database schema, due to the evolutionary process of an
organization. OLAP systems also deal with information that originates
from different organizations, integrating information from many data
stores. Because of their huge volume, OLAP data are stored on multiple
storage media. (2 marks)
Access patterns: The access patterns of an OLTP system consists manily
of short, atomic transactions. Such a system requires
concurrency concurrency control and recovery mechanisms. However
access to OLAP systems are mostly read only operations although many
could be complex queries. (2 marks)
(Total 10 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5229
QUESTION_TEXT
What is Divisive clustering? Write algorithmic steps for the divisive
clustering.
SCHEME OF
EVALUATION
The variant of hierarchical clustering is called top-down clustering or
divisive clustering.We start at the top with all documents in one cluster.
the cluster is split using a flat clustering algorithm. This procedure is
applied recursively until each document is in its own singleton cluster. (2
marks)
Top down clustering is conceptually more complex than bottom –up
clustering since we need a second, flat clustering algorithm as a
subroutine. It has the advantage of being more efficient if we do not
generate a complete hierarchy all the way down to individual document
leaves.For fixed number of top levels, using an efficient flat algorithm
like k-means, top down algorithms are linear in the number of
documents and clusters. (3 marks)
Algorithm: Divisive clustering starts by placing all objects into a single
group. Before we start the procedure, we need to decide on a threshold
distance. Once this is done then the procedure is (1 mark)
1.The distance between all the pairs of objects within the same group is
determined and the pair with the largest distance is selected. (1 mark)
2.This maximum distance compared to the threshold distance. (2 marks)
a.If it is larger than the threshold, this group is divided in two. This is
done by placing the selected pair into different groups and using them as
seed points. All other objects in this group are examined, and are placed
into the new group with the closest seed point. The procedure then
returns to step 1. (1 mark)
b.If the distance between the selected objects is less than the threshold,
the divisive clustering stops. (Total 10 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5231
QUESTION_TEXT
Explain the data warehouse models from the architecture point of view.
SCHEME OF
EVALUATION
Enterprise warehouse: An enterprise warehouse collects all of the
information about subjects spanning the entire organization. It provides
corporate wide data integration, usually from one or more operational
systems or external information providers, and is cross - functional in
scope. It typically contains detailed data as well as summarized data, and
can range in size from a few gigabytes to hundreds of gigabytes,
terabytes or beyond. An enterprise data warehouse may be implemented
on traditional mainframes, UNIX superservers, or parallel architecture
platforms. It requires extensive business modeling and may take years to
design and build.
(3.5 marks)
Data mart: A data mart contains a subset of corporate — wide data that
is of value to a specific group of users. The scope is confined to specific
selected subjects. For example, a marketing data mart may confine its
subjects to customer, item, and sales. The data contained in data marts
tend to be summarized. Depending on the sources of data, data marts can
be categorized as independent or dependent. Independent data marts are
sourced from data captured from one or more operational systems or
external information providers, or from data generated locally within a
particular department or geographic area. Dependent data marts are
sourced directly from enterprise data warehouses.
(3.5 marks)
Virtual warehouse: A virtual warehouse is a set of views over
operational databases. For efficient query processing, only some of the
possible summary views may be materialized. A virtual warehouse is
easy to build but requires excess capacity on operational database
servers.
(3 marks)
QUESTION_T
DESCRIPTIVE_QUESTION
YPE
QUESTION_ID 117786
QUESTION_T
Define Data Mining and DBMS. Differentiate between them.
EXT
Data Mining or knowledge discovery in databases, as it is also known, is the
non-trivial extraction of implicit, previously unknown and potentially useful
information from the data.
Data mining is the search for the relationships and global patterns that exist
in large databases but are hidden among vast amounts of data, such as
relationship between patient data and their medical diagnosis.
Data Mining is the process of discovering meaningful, new correlation
SCHEME OF patterns and trends by sifting through large amounts of data stored in
EVALUATION repositories, using pattern recognition techniques.
(Any 1 definition 1 mark)
A DBMS is a "Database Management System". This is the software that
manages data on physical storage devices.
(1 mark)
Diffrences: (8 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117789
QUESTION_TEXT
What is data cleaning? Explain missing values method for data
cleaning.
Data cleaning routines attempt to fill in missing values, smooth out
noise while identifying outlines, and correct inconsistencies in the
data. (2 marks)
Missing value Methods are:
SCHEME OF
EVALUATION
1.
Ignore the tuple
2.
Fill in the missing value immediately
3.
Use a global constant to fill in the missing value (4 marks)
4.
Use the attribute value to fill in the missing value
5.
Use the attribute mean for all samples belonging to the same
class as the given tuple
6.
QUESTION_TYPE
Use the most probable value to fill in the missing value (4 marks)
DESCRIPTIVE_QUESTION
QUESTION_ID
117792
QUESTION_TEXT
Explain the constraint based Association Mining.

Knowledge type constraints

Data constraints

Dimensional/level constraints
SCHEME OF EVALUATION

Interestingness constraints

Rule constraints
5×2=10 marks