Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining real world data Part-1 Objectives The student will look at RDBMS concepts and also get a brief overview of SQL. The student shall be introduced to the basics and challenges of mining web data. The student shall be introduced to the basics and challenges of mining multimedia data. Implementations 1. Identify and outline the differences between DBMS and OLAP. Explain why using OLAP for any kind of application is not advisable? In what kinds of situations can a DBMS prove to be a much effective choice ? Apart from SQL there are few other query languages such as DMQL etc. Identify one such language and compare both these languages from your viewpoint. Hints: DBMS and OLAP differences have been discussed earlier in the Data warehousing unit. Please go through them once. DMQL stands for Data Mining query language. Explain your viewpoints by considering the ease of understanding, types of operators available in the languages. 2. 3. Social networks such as Orkut, Facebook and Netflix have lot of potential data with many patterns. However this data is not directly ready for data mining. Describe what are the challenges and what preprocessing steps must be taken to get this data ready. Secondly also identify two pattern mining tasks in particular which you would like to conduct on these datasets. Hints: Data available in social networks generally is very noisy. So this should be one of the primary challenges. Secondly tasks can be identified based on community detection, suggesting recommendations online etc. Climate data prediction is one of the most recent areas of research going on. Please survey online on what are the challenges associated in mining such spatio temporal data. What kinds of patterns can be predicted. Also explain how informative such patterns can be in real life. Hints: Go through the information available online from here (http://gopher.cs.umn.edu/ ) Also read about approaches on dealing with spatio temporal data. Resources: http://www.galeas.de/webmining.html Image similarity in mining http://crl.research.compaq.com/vision/multimedia/si milarity/default.htm http://multimedia.software.informer.com/downloadmultimedia-miner-tool/ Glossary RDBMS: Relational data base management systems SQL: Structured Query language WWW: World wide web. Mining real world data Part-1 Objectives Basic concepts and challenges faced when dealing with spatial data are clearly explained. The student will grasp the problems of handling data streams and how data mining techniques should be used. The student will get an overview of mining in biological data and the main advantages. Implementations 1. Identify the different types of spatial data involved. Consider any one of such available spatial datasets and identify the challenges when we want to mine patterns from that dataset. 2. In a network stream scenario where a server is accessed by many people, come up with an algorithm how you can cluster the users who are accessing the server based on the time and frequency and recommend advertisements to the users based on the type of content they are accessing. 3. Consider a biological dataset, what kind of data mining techniques can be applied. How can the data mining techniques vary based on the dataset considered. 4. Share your thoughts on "Can we apply the same algorithms used for spatial data mining, biological data mining and network data mining?" Support your statements. Resources International Cartographic Association ICA http://icaci.org/ GIS Lounge http://www.gislounge.com/ The greatest page of bioinformatics links in the world.. ever! http://evol.nott.ac.uk/cmelun/links.html Glossary Data Stream: A stream of data for example in telecommunications and computing, it is sequence of digitally encoded coherent signals (packets of data or data packets) used to transmit or receive information that is in transmission. Spatial Data: Data pertaining to geographical sciences. Geographic Information: Geographic information is created by manipulating spatial data in a computerized system. Bio-informatics: It is the application of statistics and computer science to the field of molecular biology.