Download Objectives - e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Mining real world data Part-1
Objectives

The student will look at RDBMS concepts and also get
a brief overview of SQL.

The student shall be introduced to the basics and
challenges of mining web data.

The student shall be introduced to the basics and
challenges of mining multimedia data.
Implementations
1.
Identify and outline the differences between DBMS
and OLAP. Explain why using OLAP for any kind of
application is not advisable? In what kinds of situations
can a DBMS prove to be a much effective choice ? Apart
from SQL there are few other query languages such as
DMQL etc. Identify one such language and compare
both these languages from your viewpoint.
Hints:



DBMS and OLAP differences have been
discussed earlier in the Data warehousing unit.
Please go through them once.
DMQL stands for Data Mining query language.
Explain your viewpoints by considering the
ease of understanding, types of operators
available in the languages.
2.
3.
Social networks such as Orkut, Facebook and Netflix
have lot of potential data with many patterns. However
this data is not directly ready for data mining. Describe
what are the challenges and what preprocessing steps
must be taken to get this data ready. Secondly also
identify two pattern mining tasks in particular which
you would like to conduct on these datasets.
Hints:

Data available in social networks generally is
very noisy. So this should be one of the primary
challenges.

Secondly tasks can be identified based on
community
detection,
suggesting
recommendations
online
etc.
Climate data prediction is one of the most recent
areas of research going on. Please survey online on
what are the challenges associated in mining such
spatio temporal data. What kinds of patterns can be
predicted. Also explain how informative such patterns
can be in real life.
Hints:

Go through the information available online
from here (http://gopher.cs.umn.edu/ )

Also read about approaches on dealing with
spatio temporal data.
Resources:
http://www.galeas.de/webmining.html
Image similarity in mining
http://crl.research.compaq.com/vision/multimedia/si
milarity/default.htm
http://multimedia.software.informer.com/downloadmultimedia-miner-tool/
Glossary

RDBMS: Relational data base management systems

SQL: Structured Query language

WWW: World wide web.
Mining real world data Part-1
Objectives

Basic concepts and challenges faced when dealing
with spatial data are clearly explained.

The student will grasp the problems of handling data
streams and how data mining techniques should be used.

The student will get an overview of mining in
biological data and the main advantages.
Implementations
1.
Identify the different types of spatial data involved.
Consider any one of such available spatial datasets and
identify the challenges when we want to mine patterns
from that dataset.
2.
In a network stream scenario where a server is
accessed by many people, come up with an algorithm
how you can cluster the users who are accessing the
server based on the time and frequency and
recommend advertisements to the users based on the
type of content they are accessing.
3.
Consider a biological dataset, what kind of data
mining techniques can be applied. How can the data
mining techniques vary based on the dataset considered.
4.
Share your thoughts on "Can we apply the same
algorithms used for spatial data mining, biological data
mining and network data mining?" Support your
statements.
Resources
International Cartographic Association ICA
http://icaci.org/
GIS Lounge
http://www.gislounge.com/
The greatest page of bioinformatics links in the world..
ever!
http://evol.nott.ac.uk/cmelun/links.html
Glossary




Data Stream: A stream of data for example in
telecommunications and computing, it is sequence of
digitally encoded coherent signals (packets of data or
data packets) used to transmit or receive information
that is in transmission.
Spatial Data: Data pertaining to geographical
sciences.
Geographic Information: Geographic information is
created by manipulating spatial data in a computerized
system.
Bio-informatics: It is the application of statistics and
computer science to the field of molecular biology.