Download Metadata Harvesting in Detail

www.cdac.in P2P Framework for Community Based Creation, Semantic Annotation, Sharing and Quality Assessment of Courseware for Higher Technical Education Dr. B.D. Chaudhury & Dr. Hemant Darbari CSED, MNNIT-Allahabad & Applied Artificial Intelligence Group, CDAC-Pune 1 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in 2 C-DAC/AAIG/Pune & MNNIT, Allahabad © C-DAC & MNNIT 2010 www.cdac.in Our Software Architecture 3 C-DAC/AAIG/Pune & MNNIT, Allahabad Layer 1: Distributed and Federated Database www.cdac.in It Contains:  Meta-data base  Ontology base  Knowledge Resource base  Access log  Base for user profiles 4 C-DAC/AAIG/Pune & MNNIT, Allahabad Layer 1: Distributed and Federated Database www.cdac.in It also contains:  Publication base  Subscription base  Base for event brokering 5 C-DAC/AAIG/Pune & MNNIT, Allahabad Layer 2: Publish/Subscribe, Overlay Layer www.cdac.in It has three sub-layers:  Sub-layer 1 : Overlay sub-layer  Sub-layer 2 : Community Management sub-layer  Sub-layer 3 : Publish/Subscribe sub-layer 6 C-DAC/AAIG/Pune & MNNIT, Allahabad Layer 3: Service Layer www.cdac.in Provides Services for  Distributed Ontology Creation  Metadata Harvesting  Inference Engine  Multilingual Subscription/Publication Support 7 C-DAC/AAIG/Pune & MNNIT, Allahabad Modules Involved for ACM Paper Simulation • Metadata Extraction www.cdac.in • Metadata Harvesting • Ontology Creation • Knowledge Resource Creation & Semantic Net • Inference Engine • Multilingual Search Support 8 C-DAC/AAIG/Pune & MNNIT, Allahabad ACM Paper on Data Mining Example for Process Simulation When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardised. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorisation of data- and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of nonnovel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. Authors • Jan H. Kroeze Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002 • Machdel C. Matthee Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002 • Theo J. D. Bothma Department of Information Science, School of IT, University of Pretoria, Pretoria, 0002 Sponsors • Microsoft : Microsoft • ACM : Assoc. for Computing Machinery Publisher • South African Institute for Computer Scientists and Information Technologists , Republic of South Africa 9 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Excerpts from ACM Paper on Data Mining “Differentiating data- and text-mining terminology ACM Paper Example Simulation through Overall Architecture Publication Base Subscription Base (LAYER3) Publish (LAYER2) Metadata Extraction ACM Paper on Data Mining Metadata Harvesting Ontology Creation Event Brokering Base Knowledge Resource Ontology Base Knowledge & Semantic Net User Access History Inference Engine POS Tagging Phrase Marking Ontological Analysis (LAYER3) Semantic Analysis Search & Retrieval User Profile Searchable Tokens Publish/Subscribe (LAYER2) (Node1) (Node2) (Node n) Parsing GUI (LAYER 4) for Publication, Subscription, Search & Notification) ACM Paper on Data Mining Notify (LAYER2) 10 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Moderator Validation Verification & Updation Distributed & Federated DB (LAYER 1) Metadata Extraction Automatic | Semi Automatic Metadata Extraction from ACM Paper inde x File Diges t (Has h ID) Title Author Keywords Depart ment Publish er 1 A1B2 C3…. .DD4 Differentiati ng data- and text-mining terminology 1. Jan H. Kroeze IR, algorithms, database queries, documentation, full-text retrieval, information retrieval, knowledge creation, knowledge discovery, knowledge management, languages, measurement, metadata, text data mining, text mining, textmining Departm ent of Informat ics, School of IT, Universi ty of Pretoria, Pretoria, 0002 South African Institute for Comput er Scientist s and Informat ion Technol ogists , Republi c of South Africa 2. Machdel C. Matthee 3. Theo J. D. Bothma C-DAC/AAIG/Pune & MNNIT, Allahabad Sponsor No: Of Downlo ads in last 12 months 336 Citation Count www.cdac.in When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardized. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. 0 Microsoft Assoc. for Computing Machinery (ACM) 11 www.cdac.in Metadata Harvesting & Knowledge Resources Extraction from ACM Paper 12 C-DAC/AAIG/Pune & MNNIT, Allahabad Metadata Harvesting & Knowledge Resources Extraction from ACM Paper • IR – – – • knowledge management – – • full-text retrieval information retrieval database queries www.cdac.in When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardized. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. knowledge creation knowledge discovery text mining – – metadata text data mining 13 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Knowledge Net and Semantic Net of Extracted Data from ACM Paper 14 C-DAC/AAIG/Pune & MNNIT, Allahabad Knowledge Net and Semantic Net of Extracted Data from ACM Paper Data Processing Knowledge Net Information Retrieval Information Retrieval Semantic Search Knowledge Net Knowledge Acquisition Knowledge Creation Information Creation Information Disseminatio n Knowledge Net Semantic Net www.cdac.in Knowledge Management Text Data Mining Text Mining Meta Data Extraction Text Processing 15 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Ontology of Concepts Creation on ACM Paper on Data Mining 16 C-DAC/AAIG/Pune & MNNIT, Allahabad Domain Ontology Creation on ACM Paper on Data Mining Knowledge Discovery has Knowledge Creation is-a is-a Instance Processing Task Task is-a Feature Value Computation Task is-a is-a is-a Data Processing Task is-a Feature Processing Text Mining Data Mining Clustering Task Descriptive Modeling Task is-a is-a is-a is-a Subgroup Discovery Task Probability Estimation Task Association Discovery Task Instance Normalization Task is-a is-a Instance Normalization Task has model parameter Pattern Discovery Task www.cdac.in is-a is-a produces SVM Model Support Vector Machine produces Information Retrieval 17 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Details of User Access Pattern & History An Input for Behavior Mining for Dynamic Community Creation and Quality Assessment of Courseware on ACM Paper 18 C-DAC/AAIG/Pune & MNNIT, Allahabad User Profile & Access History on ACM Paper Comment 1: A conceptual essay on Text Mining, a must read for beginners User 1 (Professor:John Smith) Comment 2: The paper discusses Hearst’s attempt to clarify concepts in Text Mining like text metadata, standard text mining etc User User Role User 2 (Researcher: Mary Susan) www.cdac.in When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardized. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, 19 Hearst [1999] tackled the problem of clarifying textmining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. Area-ofinterest Comment s 1 Professor Data Mining Comments1 2 . . . n Researcher Data Mining Comments2 19 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in NLP process description for Multilingual Semantic Search and Retrieval on ACM paper 20 C-DAC/AAIG/Pune & MNNIT, Allahabad NLP based Multilingual Semantic Search using Inference Engine The ACM paper on Data Mining taken as example can be searched with English/Hindi query with three types of queries: www.cdac.in  Content level search  Meta data search  Ontology search 21 C-DAC/AAIG/Pune & MNNIT, Allahabad Examples of Queries on ACM Paper on Data mining Query on Metadata Jan H Kroeze’s paper on Data Mining. www.cdac.in Query on Ontology Paper in NLP related to Information Retrieval. Query on Content Information Retrieval paper with Information Dissemination. 22 C-DAC/AAIG/Pune & MNNIT, Allahabad Query on Metadata POS Tagging Behavioral Mining When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardized. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. Phrase Marking Ontology Analysis Anaphora Resolution Parsing Semanti c Analysi s Decision Support System (DSS) Metadata Query Notified Multi-lingual Semantic Search Inference Query Attribute Checking Engine Author Jan H Kroeze Title Differentiating data- Distributed & Federated DB Metadata index Semantic Knowledge Net ACM Paper on Data Mining 1 File Digest (Hash ID) Title A1B2C3…..D D4 Differentiating data- and text-mining terminology Author 1. Kroeze Jan H. Keywords Department Publisher IR, algorithms, database queries, documentation, full-text retrieval, information retrieval, knowledge creation, knowledge discovery, knowledge management, languages, measurement, metadata, text data mining, text mining, text-mining, theory Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002 South African Institute for Computer Scientists and Information Technologists , Republic of South Africa Sponsor No: Of Downloads in last 12 months 336 Citation Count and text-mining terminology 0 Microsoft 2. Machdel C. Matthee Ontology 3. Theo J. D. Bothma Assoc. for Computing Machinery (ACM) User Profile/Access History C-DAC/AAIG/Pune & MNNIT, Allahabad 23 www.cdac.in Jan H Kroeze’s paper on Data Mining Query on Ontology Paper related to IR in NLP POS Tagging Behavioral Mining Phrase Marking Anaphora Resolution Ontology Query When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardized. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. Ontology Analysis Semanti c Analysi s Inference Query Attribute Checking Engine Notified ACM Paper on Data Mining Knowledge Management Knowledge Discovery Distributed & Federated DB Knowledge Creation Metadata Index Semantic Knowledge Net Ontology User Profile/Access History 1 2 . . . n Domain name ACM Paper on Data Mining C-DAC/AAIG/Pune & MNNIT, Allahabad Ontology concepts Knowledge Management Metadata Ontology tree Know ledge Know Mana ledge Know geme Disco ledge Text nt very Minin Text Creati Metad g Data on ata Infor Minin matio Full g n Text Retrie Retrie val val Text Mining Text Data Mining Metadata Information Retrieval Full Text Retrieval 24 www.cdac.in Parsing Decision Support System (DSS) Query on Content Paper related to Information Retrieval in NLP POS Tagging Behavioral Mining Phrase Marking Anaphora Resolution Ontology Query When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardized. Such a new discipline is text mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge. Ontology Analysis Semanti c Analysi s Inference Query Attribute Checking Engine Notified Knowledge Net ACM Paper on Data Mining Synonym Data Processing Distributed & Federated DB Synonym IR Metadata Information Retrieval Synonym Semantic Knowledge Net ID Concept Synonym 1 Information Retrieval Semantic Search 12.0 2 Information Retrieval IR 12.1 3 Text Mining Metadata Extraction 16.0 4 Text Mining Text Processing 16.1 Ontology User Profile/Access History C-DAC/AAIG/Pune & MNNIT, Allahabad Semantic index Semantic Search 25 www.cdac.in Parsing Decision Support System (DSS) www.cdac.in Distributed Ontology Creation Details 26 C-DAC/AAIG/Pune & MNNIT, Allahabad Distributed Ontology Creation Details  Domain specific ontology creation by ontology experts in P2P community in distributed fashion.  The domain of content is Computer Science oriented research papers and research notes of different file types. 27 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Distributed Ontology Creation Distributed Ontology Creation Details  For large scale enactment of above points without third party web servers requiring periodical maintenance as our P2P network being autonomous and maintenance free.  This distributed and collaborated manner of creating ontologies is for enhancement of search, knowledge enhancement and quality assessment . 28 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Why Distributed Ontology Creation Distributed Ontology Creation Details www.cdac.in  The ontology is created out of content or resources published as well as from the profiles and usage patterns (component of behavior mining).  The metadata or references fetched using portal level, community level and user level information. The user oriented or personalization ontology is created as shown in the figure. 29 C-DAC/AAIG/Pune & MNNIT, Allahabad Distributed Ontology Creation Details www.cdac.in An Example of Distributed Ontology 30 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Domain ontology example in engineering domain 31 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Metadata Harvesting in Detail 32 C-DAC/AAIG/Pune & MNNIT, Allahabad Metadata Harvesting in Detail In interest based communities some experts may come forward to work in collaborative manner for www.cdac.in knowledge resource generation and sharing that is metadata harvesting. 33 C-DAC/AAIG/Pune & MNNIT, Allahabad Metadata Harvesting in Detail Users may take the role of annotators to enhance the knowledge resources by some metadata to support advance search functionalities and quality assessment. Some members of community can work as ontology experts to generate domain specific ontology in distributed fashion. 34 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Metadata harvesting process: Metadata Harvesting in Detail Advantage of Metadata harvesting  A P2P network is more suitable for large scale enactment of above activities without the need for third party Web servers, often require considerable management and maintenance effort whereas P2P networks operate in an autonomous and spontaneous way with minimal management overhead. 35 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in which  The “harvesting process” relies on the metadata produced by humans or by full or semi-automatic processes supported by software.  For example, Web editing software and selected document software automatically produce metadata at the time a resource is created or updated for “format,” “date of creation,” “revision date,” without human intervention.  Software can also support a semi-automatic approach to metadata creation by presenting a person with a “template” that guides the manual input for “keywords” and “description” metadata, and additional metadata.  The software automatically converts the metadata to META tags (or another tagged form depending on the document format) and places them in the resource header. 36 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Metadata Harvesting in Detail Metadata Harvesting in Detail www.cdac.in Different sources of meta-data harvesting 37 C-DAC/AAIG/Pune & MNNIT, Allahabad Metadata Harvesting in Detail www.cdac.in • Metadata Harvesting process: 38 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Theoretical and Architecture Details of Multi-lingual Semantic Search with Inference Engine (i-Quester) 39 C-DAC/AAIG/Pune & MNNIT, Allahabad Overall System Features of i-Quester: Multilingual Semantic Search with Inference Engine  Contains semantic search for most relevant retrieval of data from distributed information of peers www.cdac.in  Semantic search is empowered with strong domain ontology, metadata lineage where even in a pragmatic context relation the query will relate to all inter-linked information like a network of language connotations, called semantic-net  The inputs for semantic search is inter-dependent on inference engine and domain ontology 40 C-DAC/AAIG/Pune & MNNIT, Allahabad Overall System Features of i-Quester –The Multilingual Semantic Search Overall System Features of proposed System  Behavioral pattern of users are auto-analyzed to form semantic indexing for most relevant search in most distant and remotely referenced information in distributed peer-architecture of nodes and super nodes.  Multi-lingual Query handling and retrieval  Text-Audio-Image-Video various format support are given for semantic search 41 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  Strong Decision Support System (DSS) features and components of AI models are integrated in Inference Engine to aid to the i-QUester www.cdac.in Architecture of i-Quester on Distributed Courseware 42 C-DAC/AAIG/Pune & MNNIT, Allahabad i-QUester Components i-QUester has following components www.cdac.in  Inference Engine  Multi-lingual Semantic Search 43 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in i-Quester Layers Overview 44 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Inference Engine Nuances 45 C-DAC/AAIG/Pune & MNNIT, Allahabad Inference Engine An inference engine is a computer program that tries to derive answers from a knowledge base. is the "brain" that expert systems use to reason about the information in the knowledge base for the ultimate purpose of formulating new conclusions. Inference engine is based on domain specific Ontology and Semantic- web. The inference engine is based on the behavioral analysis of users. It consists of semanticpragmatic connotations like usage context on inferences. User profile collection and comparison is one of the intrinsic features of inference. It provides support to multi-lingual semantic search 46 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  It Inference Engine  Anaphora resolution: It connects references of different metadata, content level knowledge-net and semantic-net and ontologies so that inter-connected fashion the three levels of search, i.e., Metadata search, Ontology search and content search Input data Anaphora Resolution Semantic annotation AI based DSS Inference 47 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  Consists of three components : Anaphora resolution, Semantic Annotation and AI based Decision Support System. Inference Engine Semantic annotation: Semantic fetched from two layers annotations are information, their synonyms, hyponyms, hypernyms etc.  Semantic annotations drawn from references related to connection between meta-data and ontologies created out of the contents. 48 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  Knowledge-net and semantic-net comprising of content level Inference Engine Details AI( Artificial Intelligence) based DSS( Decision Support Systems):  The behavior pattern of users that are derived from usage pattern, profile analysis and the information drawn from other two layers www.cdac.in of inference engine, i.e. Semantic Annotations and Anaphora resolution. Based on this two significant things are achieved: - Dynamic community creation - Quality assessment of contents 49 C-DAC/AAIG/Pune & MNNIT, Allahabad Inference Engine Details  The inference engine as described above uses the behavior mining analyzing the user data patterns and usages.  The inputs of behavior mining are used in semantic search as well for locating the content from user search query. The entire process of entanglement of semantic-search and inference engine has been shown in following diagram. 50 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  The inference engine takes input from behavior mining. Inference Engine Details The following is the example of Inference engine working over LOM ( Learning object metadata)  It draws information from user queries , match with Learning object metadata (LOM) ontology, semantic-net & knowledgewww.cdac.in net are used (ontology and concept mapping) and finally it creates the metadata as output of inference engine. This can be searched by the user as well as used to refine user queries. Example :  What is the most searched article for NLP with reference to search? 51 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Inference Engine Architecture 52 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Inference Engine Process flow LOM: Learning object Metadata 53 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Behavior Mining with Dynamic Community Creation and Content Quality assessment Nuances 54 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in The Overall Architecture of Dynamic Community Evolution for P2P with Behavior Mining 55 C-DAC/AAIG/Pune & MNNIT, Allahabad Behavior Mining www.cdac.in Behavior Mining Architecture 56 C-DAC/AAIG/Pune & MNNIT, Allahabad User Access History Architecture of User Access History User n Application Logging Pattern Log files Discovery www.cdac.in User 1 Rules, Patterns & Statistics Processing data User Identification Pattern Analysis 57 C-DAC/AAIG/Pune & MNNIT, Allahabad Access Pattern History & User Profile     Access Pattern History mainly focuses on ‘demand-side’ of Semantic search, i.e. interpreting user queries and studying their information needs. Maintaining logs of user behaviors – browsing patterns and transaction data. Assigning search queries to one or more predefined categories based on its topic to provide better search results in terms of efficiency & accuracy. Developing Tools like Pattern Discovery & Pattern Analysis Maintaining information about user’s educational background, their interest areas. 58 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  www.cdac.in Behavior Mining Attributes and determining factors 59 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Behavior Mining on Courseware Quality Assessment 60 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Courseware Quality Assessment with Behavior Mining 61 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Behavior Mining on LOM 62 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Multi-lingual Semantic Search 63 C-DAC/AAIG/Pune & MNNIT, Allahabad Features of Semantic Search Semantic Search using NLP search engine searches on domain ontology and inference engine created semantic annotations for apt information linking and retrieval in a distributed network traversing through sub/super layer.  Cross-lingual support with sense translator of multi-lingual query  Along with texts, audio, image and video retrieval are also facilitated  www.cdac.in  Relevancy ranking and retrieved information linking through semanticpragmatic annotations 64 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Multi-lingual Semantic Search Architecture 65 C-DAC/AAIG/Pune & MNNIT, Allahabad Multilingual Search with Ontology www.cdac.in Ontology based Multi-lingual Search Process 66 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Multi-lingual Audio-video Search 67 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Linguistic Process in NLP with Video Search 68 C-DAC/AAIG/Pune & MNNIT, Allahabad Multilingual Search Support  System takes as input a formal query. This query could be generated from a keyword query, a natural language query, a formbased interface where the user can explicitly select ontology classes and enter property values or more sophisticated search interfaces. www.cdac.in  69 C-DAC/AAIG/Pune & MNNIT, Allahabad Multilingual Search Support The NLP oriented semantic search with multi-lingual support will have following features from query side:   Ontology cross linking in English and Hindi so that whether Hindi/English query given it can fetch Content level matching using NLP knowledge-net/semantic-net will contain words and their all possible meanings and linking. Meta-data will have also cross-linkages for multi-lingual query. 70 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in  Multilingual Search Support The semantic search in multi-lingual level comprises of a semantic level query analysis and retrieval as described in following diagrams:   It consists of stages of semantic search with all the NLP layers involved. i.e. POS tagging, Phrase Marking, Ontology Analysis, Parsing and semantic analysis. www.cdac.in  It is also connected to inputs from Ontology, metadata and inference engine. The output is comprising of searchable tokens and annotations to be fetched from annotated extracted information (metadata, ontology and content level knowledge-net and semantic-net) 71 C-DAC/AAIG/Pune & MNNIT, Allahabad Multilingual Search Support NLP Process Involved POS Tagging www.cdac.in Phrase Marking Ontology Analysis Parsing Semantic Analysis 72 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in Linguistic Process in NLP Search 73 C-DAC/AAIG/Pune & MNNIT, Allahabad www.cdac.in 74 C-DAC/AAIG/Pune & MNNIT, Allahabad © C-DAC & MNNIT 2010 www.cdac.in Thank You ! 75 C-DAC/AAIG/Pune & MNNIT, Allahabad

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Metadata Harvesting in Detail