Download survey of web content mining and relation extraction techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
ISSN: 0975-766X
CODEN: IJPTFI
Research Article
Available Online through
www.ijptonline.com
SURVEY OF WEB CONTENT MINING AND RELATION EXTRACTION TECHNIQUES
D.J Hani Mary Shenihaa, A.Ezil Sam Lenib
a
Research Scholar, Department of Computer Science and Engineering,Sathyabama University,
Chennai, TamilNadu 600119, India.
b
Professor & Head,Department of Computer Science and Engineering,SRR Engineering College,
Padur, Chennai 603103, India.
Email:[email protected]
Received on: 20.10.2016
Accepted on: 25.11.2016
Abstract
In the World Wide Web, there are different types of semantic relations which belong to diverse entities. As years passed
World Wide Web has became overloaded with information and it became hard to retrieve according to need. This
problem is solved using web mining techniques. The web contains structured, unstructured, semi structured and
multimedia data, semantic relations should be known in all relation extraction techniques. Traditional relation extraction
methods require predefined relations and relation specific human tagged examples. Minimally supervised novel
extraction method uses multiple source relations to learn a relational classifier for a target relation. Handling entities that
are not related with multiple semantic relations is a challenge. This paper analyzes various techniques available in web
content mining and supervised semantic relation extraction. Finally some techniques should be proposed to handle the
entities not related with multiple semantic relations.
Keywords: Semantic relation extraction; Web mining; Supervised novel extraction.
1. Introduction
Web mining is integration of information gathered by traditional data mining technologies and the information gathered
over World Wide Web by various methods and techniques. Web mining is an important application of data mining. Web
mining is used to extract knowledge from web data such as web documents, document usage log of websites found
between the hyperlinks. Web mining can be represented using two different approaches. They are process centric view
and data centric view 1,2.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 22996
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
Process centric view illustrates web mining as a sequence of tasks and data centric view defines the type of web data
used during mining process. Based on the type of data mined, web mining can be categorized into three broad and
distinct categories. They are Web Structure mining, Web Content mining, Web usage mining. Each category of Web
mining has its own distinct functionalities. Web content mining is a technique of extracting useful data, integrating
knowledge and information available from the content of the web page. It can also be used for scanning and mining of
text, graph and pictures from the web page to verify the significance of the search query. There are two views for web
content mining such as data base view and information retrieval view3,4. In a database view, there will a better querying
and information management on the web. In this view, the mining always tries to conjecture the web site structure and is
used to convert website to a database. In case of information retrieval view, semi structured and unstructured data is
used. In semi structured data, the structure of the hyperlink found between documents and the HTML structures found in
the web documents are used for representation.
Web structure mining is to spawn structural summing up about the WebPages and websites. It mainly focuses on the
inter document level structure of the hyperlinks in web pages and websites. The relationship and similarity among
various websites and WebPages are generated using web structure mining. There is more relationship with web content
mining because the web sites and web documents have links and they use primary data or real data on the web. HITS and
page rank are two important techniques used in Web structure mining. Web usage mining allows the anthology of Web
admittance information for Web pages.
The information about path leading to accessed web page are collected without human intervention into access logs
through a web server. The output data describes the pattern of usage of WebPages including IP address, time and date of
access including the page references in accessing web pages and websites.
2. Web Content Mining5,6
Has proposed a mechanism for detecting adult account in twitter using iterative social based classifier. Adult twitter
accounts frequently connected with ordinary accounts as well as post countless ordinary entities, which makes the graph
filled with noisy links. A novel graph based classification technique called Iterative Social based Classifier (ISC) is used
to address the problem which is challenging to the noisy links. Large scale real world Twitter data are used for
evaluation which showed that ISC can attain acceptable concert in adult account detection by labeling small numeral of
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 22997
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
twitter accounts which are popular This method can be implemented in twitter by changing keywords and can be
applied to other social networks.
7
Has proposed a mechanism for measuring semantic similarity between words using four word co-occurrence measures.
Here page counts and snippets are retrieved from a web search engine. The semantic relation extracted between the
words is done with the help of lexical pattern extraction algorithm. Support Vector Machines (SVM) is used for finding
the optimal combination of clusters of lexical patterns and co occurrence measures which are based on page counts. The
values of Precision, recall and F-score shows that the proposed method has best results comparing with other techniques
for measuring semantic similarity between words.
7
Has proposed a social network extraction system called POLYPHONET.This system make use of a number of superior
technique to dig out relations of persons, detect groups of persons, get hold of keywords for a person. Integrated systems
are created to reduce the related methods into simple pseudo codes using Google.
Several new algorithms are developed for social networking to formulate extraction scalable, to get hold of and make use
of person-to-word relations, mining to classify relations into categories, and every unit is implemented in
POLYPHONET. The limitation is, repetition of extraction of relations and entities identification are performed to gain a
good accurate social network.
2.1 Web Content Mining Techniques
Web content mining can be used in four types of data available in the web page. They are Unstructured, Structured, Semi
structured and Multimedia data. There are many web content mining techniques. Figure 1 illustrates the various web
content mining techniques available. Web Content mining mines many useful information like text, image, audio, video,
metadata, hyperlinks and extracts many useful information.
2.1.1 Unstructured Mining
Unstructured data indicates the information which has no predefined data. Usually, this unstructured information is
typically text heavy which contain data such as dates, numbers and facts as well. As there is no predefined structure, it is
quite difficult to understand using our traditional programs compared to data stored in databases. Some of the techniques
used for unstructured mining are topic tracking, information extraction, summarization, clustering, and categorization
and information visualization. There are many tools for implementing unstructured mining techniques.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 22998
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
2.1.2
Structured Mining
Structured data refers to information which is included in the database and searching can be done easily by straight
forward search operations or with the help of search engine algorithms. Techniques used are WebCrawler8, wrapper
generation, page content mining. The following Figure 1 illustrates various web content mining techniques in
unstructured, Structured, Semi structured and Multimedia Web Content mining.
Figure-I.
2.1.3
Semi-structured and multimedia Mining
Semi structured data is another form of structured data that does not confirm with the formal structure of the data models
coupled with relational databases or other forms of data tables. There are no tags or other markers to separate semantic
elements and put into effect of the hierarchies of records and fields within the data. Techniques of semi structured data
are top down extraction, web data extraction language and using Object Exchange Model (OEM).OEM is used for
storing relevant information’s which are extracted from semi structured data. This extracted information from semi
structured data is embedded into a group of useful information. In OEM there is no necessity to describe the structure of
the object in advance. Multimedia data consist of a combination of variety of media types such as audio, text, animation
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 22999
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
and video. Techniques used in mining multimedia data are SKICAT, Colour histogram matching, shot boundary
detection and Multimedia miner.
2.1.4
Comparative Analysis of Web Content mining Techniques
The following Table 1 illustrates the performance of web content mining in Adult account detection, Semantic similarity
between words and Open information Extraction. The performance is represented by evaluating the precision and recall
of the corresponding technique.[Table 1]
Table 1. Comparision of Web Content Mining Techniques.
Method
Adult Account detection
Semantic Similarity between
words
POLYPHONET
Technique
Precision
Recall
URL
092
0.17
Text
Gnet Mine
ICA
Trust Rank
ISC
WebJaccard
0.62
0.39
0.62
0.37
0.78
0.59
0.41
0.61
0.58
0.23
0.89
0.71
Web Overlap
Web Dice
WEbPMI
Sahmi
Chen
No Clust
Four Word Co-occurrence
Evaluation of questionnaire
using Co-author class
0.59
0.58
0.26
0.63
0.47
0.79
0.85
78.5
0.68
0.71
0.42
0.66
0.62
0.80
0.87
53.6
Evaluation of questionnaire
using Lab class
Evaluation of questionnaire
using Project class
Evaluation of questionnaire
using Conf class
55.6
28.3
20.3
20.0
39.9
41.3
2.2 Web Content Mining Tools
There are many effective commercial and open source web content mining tools available. These web content mining
tools help to download the essential information to the users by collecting the perfect and appropriate information.Brief
overview of commercial and open source web content mining tools are given as follows.
1) Automation Anywhere9, is an intellectual computerization software to automatically perform business & IT
processes, together with web data extraction and screen scraping.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23000
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
2) Mozenda, is More-Zenful-Data, web content mining, software which is used for circulating extracting and
storing data by setting agents by users.
3) Screen Scraper, is a freely downloadable software that permits user to scrape and format unstructured, structured
data from websites.
4) Web info extractor is a web content mining tool for extracting content and it is used for monitoring the content
updates.
5) Web content extractor is an authoritative and simple tool for data retrieval and web scrapping.
2.2.1Comparison of Web Content Mining Tools
There are many commercial and open source tools available, and it is highly difficult to compare the effectiveness and
efficiency of tools. The Table 2 gives the comparison of 5 web content mining tools. The comparison is performed based
on the performance on unstructured and structured web data, data recording and the user friendliness of the software.
[Table 2]
Table 2. Web Content Mining Tools Comparison.
Tool
Web content
extractor
Automation any
where
Web info extractor
Mozenda
Screen scrapper
Structured data
extraction
Tasks
Un Structured
User friendly
data extraction
Data recording
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
3. Relation Extraction
A relation extraction is a strategy of classifying and detecting semantic relationship mentioned in a set of entities. This is
done typically from a text or XML documents. This method is very much comparable with information extraction. The
main difference between information extraction and relation extraction is information extraction performs the operation
of removing repeated relations. Relation extraction is used for extracting relevant relations and facts where as
information retrieval selects relevant document.Parse Trees can be used for representing semantic relation extraction10.
3.1 Relation Extraction Techniques
There are various relation extraction techniques available. They are generally classified as Supervised Relation
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23001
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
Extraction, Unsupervised Relation Extraction, Distant Supervision, Semi Supervised Relation Extraction and Semantic
relation Extraction.
3.1.1
Supervised Relation Extraction
Supervised Relation extraction is extraction of information given as a classification task. A supervised system for
relation extraction has three steps:
1. Data representation for labelled examples
2. Train a classification model as the relation detector/ classifier
3. Apply the model as the relation extractor on the unseen relation mentions.
The classification task can be performed with the help of various classifiers. Rich structural depiction similar to parse
trees can be formed by giving input to the classifiers. Based on the nature of input given to the classifier training,
supervised approach is classified in to two types. They are feature based methods and kernel based methods. In Feature
based methods a set of semantics and syntactic features are extracted from the text. After extraction, extracted features
act as a decision parameter for deciding the various entities in the sentences are related or not. Contents of the syntactic
features which are extracted are the types of two entities, entities themselves, number of words found in between entities,
sequence of words found between entities and path in the parse tree containing the two entities. The paths linking the two
entities are included in the semantic indication in the dependency parse. The extracted syntactic and semantic features
are given as feature vector to the classifier for classification and training. Kernel based methods are used in string kernel
based relation extraction which are used in the context of text classification. When we compute relation extracting
among two strings the string kernel calculated the similarity based on number of sub sequences which are common to
both of the strings. If we find more number of subsequence’s we conclude the two strings computed has more similarity
and other wise those strings has less similarity. Every string given can be mapped to a higher dimensional space in which
each dimension match up to the occurrence or nonexistence of a meticulous subsequence. Word sequences near the
entities are represented for relation extraction objects under parse trees or questions containing the entities.
3.1.2
Unsupervised Relation Extraction
Unsupervised relation extraction techniques gather pairs of co-occurring entities as relation instances, extract features for
instances and apply unsupervised clustering techniques and the major relations of a corpus are found. This method
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23002
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
depends on tagging a predefined set of dispute types, such as person, organization, and location, in advance. proposes
several generative models, largely similar to LDA , for relation extraction.
3.1.3
Distant Supervision
Distant Supervision (DS) is one of the important training relation extractor without using labeled data 11. According to
whether the argument pair is listed in the target relational tables in a knowledge base (KB) by labelling relation
mentioned in the source corpus the examples for training are generated automatically. This method significantly reduces
human efforts for relation extraction.
3.1.4
Semi Supervised Relation Extraction
Semi supervised approaches are technique based on bootstrapping which result in the detection of huge numeral of
relations and patterns.In relation extraction tasks there is availability of large amount of unlabelled data. There are fewer
amounts of labeled data, because it is too expensive so boot strapping method is advantageous in creating a large
quantity of labeled data. Yarowsky, 1995 and Blum & Mitchell, 1998 has proposed algorithms for semi supervised
relation extraction. Concept of both the algorithms is the weak learners output is used as training data for next
iteration.Cotraining method proposed by blum and mitchell in 1998 is a example for weakly supervised paradigm which
learns from a huge volume of unlabelled data using separate, but using many views of data and a small set of labeled
data.
3.1.5
Semantic Relation Extraction
Semantic relation is the fundamental relation sandwiched between two concepts expressed by phrases or words. It is
mainly used in text summarization, question answering, text to image generation; textual entailment etc.Semantic
analytics is the use of ontologies to analyze content in web resources. This field of research combines text
analytics and semantic web technologies like RDF.
3.1.6
Data sets for Relation Extraction
The entity relation extraction needs to be evaluated with the help of data sets. The evaluation of the relation extraction
depends upon the kind of the dataset used and method applied in relation extraction. There are different ways for relation
extraction for supervised and unsupervised relation extraction methods.Wikipedia has become a popular data source for
semantic relation extraction mechanism.Wikepedia contains many hyperlink entities in which most of the pages are used
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23003
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
for relation extraction. Wikepedia is also a rich source of data’s in which relations can be extracted for hyperlinked
documents12.A Message understanding conference13 has started a program by DARPA to make possible to investigate
various information extraction techniques. There are two primary tasks for evaluation. They are Named Entity
Recogonisation (NER) and co reference resolution. Most familiarly used data sets are MUC,ACE,MEDLINE and
YAGO.
3.1.7Evaluation of Supervised Relation Extraction
For evaluating supervised relation extraction the problem is evaluated as a classification problem. There are three metrics
used for relation extraction of supervised methods. They are Precision, Recall and F-measure14. These metrics are given
as follows.
1) Precision
In information retrieval mechanism positive predictive value called as precision which is the division of retrieved
instances that are relevant. It can be computed at a given cut-off rank, making an allowance for only the top most results
returned by the system. Such a measure is called precision at nether Precision metric for supervised relation extraction is
defined as follows.
Number of correctly extracted entity relations
Precision (P) =
Total number of extracted entity relations
2) Recall
Recall is computed as the fraction of the relevant instances that are retrieved. It is the ratio of the total number of relevant
records retrieved to the total number of relevant records in the database. The recall metric for supervised relation
extraction is given as follows.
Number of correctly extracted entity relations
Recall (R) =
Actual number of extracted entity relations
3) F-measure
To provide single measurement for a system we combine precision and recall. That measure is called as F-measure. The
weighted harmonic mean of precision and recall, Computation of the F-measure is given as follows.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23004
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
F- Measure (F1) = 2 * Precision * Recall
Precision + Recall
In15 computation process is done by a random allocation of 100 instances which are categorized into three groups. In the
first group 60 instances are categorized into training instances by having the source relation R, second group with 10
instances for training by having R as target relation and the last group with 30 instances as the instances for testing. For
each target type of relation there are 1,140 training instances for the source relation and 10 instances for training the
relation at target.
By using pattern selection based on relation independency in entropy the 1,000 patterns which are top ranked are
selected as relation independent patterns and the rest of the patterns are considered as relation specific patterns. In the
precision, recall and F measure are calculated for the target relation scoring the value of Precision as 86.47,recall as
51.78 and F-measure as 62.77.here the macro average calculation is done with 20 relation types.
3.1.8 Comparative Analysis of Relation Extraction Techniques.
Relation extraction from the web can be done using various methods and techniques. Some of the relation extraction
techniques compared are minimally supervised relation extraction, [Table 3] Motif based Relation extraction technique
and Relation Extraction from text16.
Extracted relations are evaluated based on supervised relation extraction evaluation techniques such as Precision and
Recall.
There are various techniques used in each method of relation extraction. In the method of relation extraction from text
techniques such as logic based, short path kernel and sub tree kernel techniques are used to extract the relations from text
and evaluated using precision and recall.
Motif based relation extraction method was used to extract the relations in Wikipedia hyperlinks using various data sets
such as data mining, computer network, data structure.
Euclidean geometry, classical mechanics ,micro biology and wine as given in Table 3 and Precision and recall values are
computed separately for each data set. The following Table 3 explains about the comparison of various relation
extraction techniques16 , 17.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23005
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
Table 3. Relation Extraction Techniques Comparison.
Method
Technique
Precision Recall
Minimally supervised
Minimally Supervised novel
Relation Extraction
Relation Extraction
Relation Extraction from
Text
86.47
51.78
Logic based
68.2
42.3
Shortest path kernel
65.5
53.8
Sub Tree Kernel
67.1
35
Wikipedia using Data mining
0.893
0.323
0.826
0.395
Data
0.884
0.498
Euclidean
0.898
0.501
Classical
0.864
0.459
Wikipedia using Microbiology
0.801
0.352
0.826
0.472
Dataset
Wikipedia
using
Computer
Network Dataset
Wikipedia
Motif based Relation
Extraction
using
structure Dataset
Wikipedia
using
geometry Dataset
Wikipedia
using
mechanics Dataset
Dataset
Wikipedia using Wine Dataset
3.2 Relation Extraction Tools
Relation Extraction Tools are used for extracting the relations many open source and commercial tools are available for
relation extraction. Some of the tools available are DIPRE, Snowball, KnowItAll, TextRunner 18.These tools are
explained as follows.
1. Text Runner, used for extracting more relations, broader sets of facts, with the reflecting order of magnitude
2. Dual Iterative Pattern Expansion (DIPRE)19 , used for extracting structural relation from the huge collection of
HTML documents.
3. Snowball , is used for generating patterns, extracting tuples form various text documents.
4. KnowItAll is for automating the tedious mechanism of extracting huge volume of facts from web.
3.2.1Comparison of Relation Extraction Tools
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23006
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
There various Relation Extraction tools available for extracting the relations from huge volume of text or from the web.
Each Relation extraction tool has its own specific functionality. The tools DIPRE, Snowball, KnowItAll and Text
Runner are compared with various features such as Initial seed, predefine relation, External NLP tools, relation types,
Language dependent and classifier. There are various binary and unary relation types.knowItAll has both binary as well
as unary relation types.All others have only binary relation types.The following Table 4 illustrates the comparison of
various relation extraction tools.
Table 4. Relation Extraction Tools Comparison.
DIPRE
Snowball
KnowItAll
Text Runner
Initial Seed
Yes
Yes
Yes
No
Predefine
Yes
Yes
Yes
No
No
Yes:NER
Yes:NP
Yes: dependency
chunker
parser,NPChunker
Relation
External NLP
Tools
Relation
Binary
Binary
Unary/Binary
Binary
No
Yes
Yes
Yes
Exact Pattern
Matching with
Naive Bayes
Self Supported
Matching
similarity function
Classifier
binary classifier
Types
Language
dependent
Classifier
4. Conclusion
After comparing and analyzing the existing techniques, available in supervised relation extraction we found that cost of
the training is more and there is a great challenge of handling entities that are not related to each other and to handle
entities with multiple semantic relations. A relation extraction can adapt relation extraction system that has to be trained
to extract some specific new relation types. It can be used in domain based semantic relationship extraction and social
network extraction. This paper provides an overview of the web content mining techniques and relation extraction
techniques for supervised relation extraction method.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23007
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
References
1.
Johnson F, Gupta SK. Web Content Mining Techniques a Survey, International Journal of Computer Applications.
2012, 47(11), pp. 44-49.
2.
Srivatsa T, Desikan P, Kumar V. Web Mining – Concepts, Applications & Research Directions, Foundations and
Advances in Data mining, Springer-Berlin Heidelberg, 2005, pp. 275-307.
3.
Kosala R, Blockeel H. Web mining Research: A Survey, SIGKDD ACM SIGKDD Explorations Newsletter, 2000,
2(1), pp. 1-15.
4.
Bach N, Badaskar S. A Review of Relation Extraction http://www.cs.cmu.edu/~nbach/papers/A-survey-on-RelationExtraction.pdf. Date accesssed: 01/10/2015.
5.
Cheng H, Xing X, Liu X, Lv Q. ISC: An Iterative Social based Classifier for Adult Account Detection on Twitter.
IEEE transactions on knowledge and data Engineering. 2014 Jan, 6(1), pp. 1-14.
6.
Bollegala D, Matsuo Y, Ishizuka M. A Web Search Engine-Based Approach to measure semantic Similarity
between words. IEEE Transactions on knowledge and Data Engineering, 2011 Jul, 23(7), pp. 977-990.
7.
Mary A V A, Samuel S J, Rajam D J. Automated trinity based web data extraction for simultaneous comparison.
Contemporary Engineering Sciences. 2015 May, 8(11), pp. 491-497.
8.
Herrouz A, Khentout C, Djoudi M. Overview of Web Content Mining Tools. International Journal of Engineering
and Science (IJES). 2013 Jun, 2(6), pp. 106-110.
9. Qian L, Zhou G, Kong F, Zhu Q, Qian P. Exploiting Constituent Dependencies for tree Kernel-Based Semantic
Relation Extraction. Proceedings of the 22nd Int’l Conf. Computational Linguistics (COLING ’08). 2008 Aug, pp.
697-704.
10. Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. ACL '09
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint
Conference on Natural Language Processing of the AFNLP. 2009 Aug, 2, pp. 1003-1011.
11. Wei B, Liu J, Ma J, Zheng Q, Zhang W, Feng B. Motif based Hyponym Relation Extraction from Wikipedia
Hyperlinks. IEEE transactions on knowledge and data Engineering. 2014 Oct, 26(10), pp. 2507-2519.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23008
D.J Hani Mary Shenih* et al. /International Journal of Pharmacy & Technology
12. Grishman R, Sundheim B. Message Understanding Conference-6: A brief history .Proceedings of the 16th
conference on computational linguistics.1996, 1, pp. 466-471.
13. Midhunchakkaravarthy J, Selva Brunda S. An Enhanced Web Mining Approach for Product Usability Evaluation in
Feature Fatigue Analysis using LDA Model and Association Rule Mining with Fruit Fly Algorithm. Indian Journal
of Science and Technology. 2016, Feb, 9(8), pp.1-10.
14. Bollegala D, Matsuo Y, Ishizuka M. Minimally Supervised Novel relation Extraction Using a Latent relational
Mapping. IEEE transactions on knowledge and data Engineering. 2013 Feb, 25(2), pp. 419-432.
15. Horvarth T, Pass G, Reichartz F, Wrobe S. Logic based Approach to Relation Extraction from Text. Inductive Logic
Programming. Springer-Verlag: Berlin Heidelberg. 2010, pp. 34-48.
16. Etzioni O, Cafaralla M, Downey D, Popescu A M, Shaked T, Soderland S, Eld D S, Yates A. Unsupervised NamedEntity Extraction from the web: An Experimental Study. Journal Artificial Intelligence. 2005 Jun, 165(1), pp. 91134.
17. A-survey-on-Relation-Extraction.
https://www.researchgate.net/publication/249890666_A_SURVEY_ON_RELATION_EXTRACTION.Date
Accessed: 30/01/2015.
18. Agichtein E, Gravano L. Snowball: Extracting Relations from Large Plain Text Collections. DL '00 Proceedings of
the fifth ACM conference on Digital libraries. 2000, pp. 85-94.
19. Matsuo Y, Mori J, Hamasaki M, Ishida K, Nishimura T, Takeda H, Hasida K, Ishizuka M. Polyphonet: An
Advanced Social Network Extraction System. Web Semantics: Science, Services and Agents on the World Wide
Web.2007 Dec, 5(4), pp.262-278.
IJPT| Dec-2016 | Vol. 8 | Issue No.4 | 22996-23009
Page 23009