Download Aggregated Probabilistic Fuzzy Relational

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Human genetic clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Aggregated Probabilistic Fuzzy Relational
Sentence Level Expectation Maximization
Clustering Algorithm For Efficient
Text Categorization
Lakshmi Kartheek V1, Dr.Chandra Sekhar V2
II M.Tech, Department of CSE, S.R.K.R Engineering College, Bhimavaram, AP, India 1.
Associate Professor, Department of CSE, S.R.K.R Engineering College, Bhimavaram, AP, India 2
Abstract- Text clustering has found an important application to organize the data and to extract useful information from
the available corpus. We proposed EM algorithm extracts the potentially noisy article from the data set using the
descending porthole technique. Many existing clustering techniques have difficulties in handling extreme outliers but
fuzzy clustering algorithms tend to give them very small membership degree in surrounding clusters. In this paper a
generalized Expectation–maximization Probabilistic fuzzy c-means (FCM) algorithm is proposed and applied to
clustering fuzzy sets. This technique leads to a fuzzy partition of the fuzzy rules, one for each cluster, which
corresponds to a new set of fuzzy sub-systems. When applied to the clustering of a flat fuzzy system results a set of
decomposed sub-systems that will be conveniently linked into a Parallel Collaborative Structures. This algorithm is
particularly used in finding maximum likelihood estimates of parameters.
Keywords: Fuzzy Clustering, probabilistic Fuzzy c-means, Fuzzy Sets, Parallel Collaborative Structures.
INTRODUCTION
Cluster analysis is primarily a tool for discovering previously hidden structure in the set of unordered objects, where we
assume that a natural grouping exists in the data. Cluster analysis is a technique for classifying data [1], i.e., to divide a
given set of objects into a set of classes or clusters based on similarity. The goal is to divide the data set in such a way
that cases assigned to the same cluster should be as similar as possible whereas two objects from different clusters
should be as dissimilar as possible[2]. It is an approach towards unsupervised learning as well as one of the major
techniques in pattern recognition. The conventional (hard) clustering methods restrict each point of the data set to
exactly one cluster [1]. These methods yield exhaustive partitions of the example set into non-empty and pair wise
disjoint subsets. Fuzzy cluster analysis, [3] therefore allows gradual memberships of data points to clusters in [0, 1].
This gives the flexibility to express that data points belong to more than one cluster at the same time. Furthermore,
these membership degrees offer a much finer degree of detail of the data model. One of the most popular object data
clustering algorithms is the FCM algorithm, proposed by Dunn. Clustering is the process of grouping or aggregating of
data items. Sentence clustering mainly used in variety of Applications such as classify and categorization of documents,
automatic summary generation, organizing the documents, etc [4].
In text processing, sentence clustering plays a vital role this is used in text mining activities. Size of the clusters may
change from one cluster to another. The traditional clustering algorithms have some problems in clustering the input
dataset [3]. The problems such as, instability of clusters, complexity and sensitivity. To overcome the drawbacks of
these clustering algorithms, Later an algorithm called Hierarchical Fuzzy Relational Eigenvector Centrality-based
Clustering Algorithm (HFRECCA) [5] is extension of FRECCA which is used for the clustering of sentences. Contents
present in text documents contain hierarchical structure and there are many terms present in the documents which are
related to more than one theme hence HFRECCA will be useful algorithm for natural language documents. In this
algorithm single object may belong to more than one cluster [6],[7]. In 1973 and Later it is extended by Bezdek
(1981), which can be applied if the objects of interest are represented as points in a multi-dimensional space. FCM [8]
relates the concept of object similarity to spatial closeness and finds cluster centres as prototypes. Several examples of
application of FCM to real clustering problems have proved the good characteristics of this algorithm with respect to
stability and partition quality [8]. Further, its convergence has been formally demonstrated (Bezdek, 1987; Hathaway
et. al. , 1988). From this method a large variety of clustering techniques was derived with more complex prototypes,
1
which are mainly interesting in data analysis applications [9]. However, the generalization of these techniques to
clustering imprecisely or uncertainly data or objects is not yet explored. Moreover, in the real-world applications,
transaction data are usually composed of quantitative values.
Designing a sophisticated data-mining algorithm to deal with different types of data turns a challenge in this research
topic. Recently, fuzzy set theory is more and more frequently used in intelligent systems, because of its simplicity and
similarity to human reasoning. The theory has been successfully applied to many fields such
as manufacturing, engineering, diagnosis, economics, and others (Höppner, 1999).
In this context, a generalization of the previously methods in order to be used in clustering of fuzzy data (or fuzzy
numbers) [9] would be a meritorious research. In this work, a new fuzzy relational clustering algorithm, based on the
fuzzy c-means [8] algorithm is proposed to clusters fuzzy data, which is used in the antecedent and the consequents
parts of the fuzzy rules. This clustering process divides the fuzzy rules of a Fuzzy System into a set of classes or
clusters of fuzzy rules based on similarity. From this new strategy, a flat fuzzy system f(x) can be organized into a
hierarchical structure of fuzzy systems (Salgado, 2005a and 2007b). Hierarchical fuzzy modelling is a promising
method to identify fuzzy models of target systems with many input variables or/and with different complexity
interrelation. Partitioning a fuzzy system reduces its complexity, which simplifies the identification problem, improves
the computation times and saves resources, such as memory space. Moreover, with the organization of the fuzzy system
into a new hierarchical structure, the model readability and transparency can be improved. In this context, we propose a
new technique, the Probabilistic Fuzzy Clustering of Fuzzy Rules (FCFR), based on cluster methodology, to
decompose a flat fuzzy system f(x) into a set of n fuzzy sub-systems f1(x), f2(x), ..., fn(x), organized in a collaborative
structure. Each of these clusters may contain information related with particular aspects of the system f(x). The
proposed algorithm allows grouping a set of rules into c subgroups (clusters) of similar rules. It is a generalization of
the Probabilistic Clustering Algorithm (FCM)[8] , here applied to rules instead of points. With this algorithm, the
system obtained from The data is transformed into a new system, organized into several subsystems, in PCS structures
(Salgado, 2005b and 2007a).
The paper is organized as follows: firstly, a brief introduction to fuzzy systems is presented. The concept of relevance
of a set of rules and of fuzzy system is reviewed. The PCS structure is described in section 3. In section 4 the FCFR
strategy is proposed. An example is presented in section 5. Finally, the main conclusions are outlined in section 6. The
development of information technology in the last two decades paved the way for a world full of data. But much of
these data are of potentially not useful. In order to make it useful one, we need to extract the large amount of
information or knowledge underlying the data. Data mining is a process of extraction the valuable information inside
the huge amount of data. Clustering techniques can help in this data discovery and data analysis. Clustering the
sentences is mainly useful in Information Retrieval (IR) Process. Clustering text [7] at the sentence level and document
level has many differences. Document clustering partitions the documents into several parts and cluster those parts
based on the overall theme. It doesn’t give much importance to the semantics of each sentence in the document. So
there may be content overlap or bad coverage of theme will happen in the case of multi document summarization. Each
data element in hard clustering method belongs to exactly one cluster.
A. Cluster Analysis
There are several algorithms available for clustering [10]. Each algorithm will cluster or group similar data objects in a
useful way. This task involves dividing the data into various groups called clusters. The application of clustering
includes Bioinformatics, Business modeling, image processing etc. In general, the text mining [10] process focuses on
the statistical study of terms or phrases which helps us to understand the significance of a word within a document.
Even if the two words didn’t have similar meanings, clustering will takes place. Clustering can be considered the most
important unsupervised[11] learning framework, a cluster is declared as a group of data items, which are “similar”
between them and are “dissimilar” to the objects belonging to other clusters. Sentence Clustering mainly used in variety
of text mining applications. Output of clustering should be related to the query, which is specified by the user.
B. Similarity Measure
Similarity between the sentences is measured in terms of some distance function; such functions are Euclidean distance
or Manhattan distance [1]. The choice of the measure is based on our requirement that induces the cluster size and
formulates the success of a clustering algorithm on the specific application domain [4,7]. Current sentence clustering
methods usually represent sentences as a term document matrix and perform clustering algorithm on it. Although these
clustering methods [12] can group the documents satisfactorily, it is still hard for people to capture the meanings of the
documents since there is no satisfactory interpretation for each document cluster. Based on the similarity or
dissimilarity values clustering will takes place [1].
2
BACK GROUND AND RELATED WORK
A. Efficient Conceptual Rule Mining on Text Clusters
Text mining is a modern and computational approach, which attempts to determine new, formerly unidentified
information by applying various techniques from normal language processing and data mining domains. Clustering is
one of the conventional data mining techniques[13] that is used us an unsubstantiated learning pattern, where clustering
techniques attempt to groupings of the text documents with different data items. Clusters have high intra-cluster
similarity and low inter-cluster similarity. Most of the current document clustering methods and algorithms are fully
based on the Vector Space Model (VSM), which is a widely used as a data representation for text classification and
clustering. Weighting these feature words accurately affects the result of the clustering algorithm and improves its
efficiency. In this paper, we are going to present a Conceptual rule mining which is generated for the sentence meaning
and related sentences in the document. Weights are appropriated for the sentences having higher contribution to the
topic of the entire document.
B. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
C.
This proposal is a new approach for measuring the semantic similarity between the words and concepts. The
characteristics of polysemy and synonymy that exist in words of natural language have always been a challenge in the
fields of Natural Language Processing (NLP) and Information Retrieval (IR) techniques [13]. There are certain
advantages in the work of semantic association discovery by combining a taxonomy structure with corpus statistics.
The proposed approach outperforms other computational models. It gives the highest correlation value, with a
benchmark resulting from human similarity judgments.
D. Some Experiments on Clustering Similar Sentences of Texts
Identifying similar text passages or sentences plays an important role in many of the text mining applications. The
proposal is based on some experiments [14] on clustering similar sentences of texts in the documents. The proposed
framework is based on an incremental and unsupervised clustering method which is combined with statistical similarity
metrics to measure the semantic distance between sentences [15]. It aims at identifying sets of highly semanticallyrelated sentences from a collection of documents. The Sentence Splitting is performed by a textual-segmentation tool
called SENTER, which is based on a list of abbreviations and some sentence delimiters.
E. A Clustering Algorithm for Text Summarization using Expectation Maximization
Nowadays, large amount of data is available in the form of texts. It is very difficult for human beings to manually find
out useful and significant data[16]. This problem can be solved with the help of text summarization algorithms. Text
Summarization is the process of condensing the input text file into shorter version by preserving its overall content and
meaning. This paper is about called text summarization using natural language processing. The proposal describes a
system, which consists of two steps. In first step, we are implementing the phases of natural language processing that
are splitting, tokenization, part of speech tagging, and parsing [17]. In second step, we are implementing Expectation
Maximization (EM) Clustering Algorithm[18] to find out sentence similarity between the sentences. Based on the value
of sentences similarity, we can easily summarize text.
E. Fuzzy Relational Clustering FRECCA
A fuzzy relational clustering approach is used to produce clusters with sentences, where each of them corresponds to
some content. The output of clustering indicates the strength of the association among the data elements. Andrew
Skabar and Khaled Abdalgader [1] proposed a novel fuzzy relational clustering algorithm called FRECCA (Fuzzy
Relational Eigen Vector Centrality based Clustering Algorithm). The algorithm involves the following steps.
Initialization: cluster membership values are initialized randomly, and normalized. Mixing coefficients are initialized.
Expectation: Calculates the Page Rank value for each object in each cluster.
Maximization: Updating the mixing coefficients based on membership values calculated in the Expectation Step.
F.
HFRECCA
The general idea of the Hierarchical fuzzy clustering is the partitioning of the data items into a collection of clusters.
The data points are assigned membership values for each of the clusters. Many existing clustering techniques have
3
difficulties in handling extreme outliers but fuzzy clustering algorithms tend to give them very small membership
degree in surrounding clusters. This algorithm is an extension of fuzzy relational clustering algorithm. An expectation–
maximization (EM) algorithm is an iterative process, in which the model mainly depends on some unobserved
latent/hidden variables. This algorithm is particularly used in finding maximum likelihood estimates of parameters. The
EM algorithm’s iteration alternates between performing an expectation (E) Step; this creates a function to compute the
cluster membership probabilities and maximization (M) step, in which these probabilities are then used to reestimate
the parameters. These parameter estimates are then used to find out the distribution of the latent variables in the next E
step.
PROPOSED WORK
In this wok, we analyze how one can take advantage of the efficiency and stability of clusters, when the data to be
clustered are available in the form of similarity relationships between pairs of objects. More precisely, we propose a
new Hierarchical fuzzy relational clustering algorithm, based on the existing fuzzy C-means (FCM) algorithm, which
does not require any restriction on the relation matrix. This HRECCA algorithm is applied for the clustering of the text
data which is present in the form of xml (hierarchical) files. HFRECCA will give the output as clusters which are
grouped from text data which is present in a given documents. In this HFRECCA algorithm, Page Rank algorithm is
used as similarity measure.
A.Page Rank
We describe the application of the algorithm to data sets, and show that our algorithm performs better than other fuzzy
clustering algorithms. In the proposed algorithm, we describe the use of Page Rank and use the Gaussian mixture
model approach. Page Rank is used as a graph centrality measure. Page Rank algorithm is used to determine the
importance of a particular node within a graph. Importance of node is used as a measure of centrality. This algorithm
assigns numerical score (from 0 to 1) to every node in graph. This score is known as Page Rank Score. Sentence is
represented by node on a graph and edges are weighted with value representing similarity between sentences. Page
Rank can be used within the Expectation- Maximization algorithm to optimize the parameter values and to formulate
the clusters. A graph representation of data objects is used in along with the Page Rank algorithm. It operates within an
Expectation-Maximization; it is a framework which is a general purpose method for learning knowledge from the
incomplete data. Each sentence in a document is represented by a node in the directed graph and the objects with
weights indicate the object similarity.
B,EM algorithm
It is an unsupervised method, which does not need any training phase; it tries to find the parameters of the probability
distribution that has the maximum likelihood of its parameters. Its main role is to parameter estimation. It is an iterative
method, which is mainly used to finding the maximum likelihood parameters of the model. The E-step involves the
computation of cluster membership probabilities. The probabilities calculated from E-step are reestimated with the
parameters in M-step.
C.APFRSEC
The Main idea of aggregated probabilistic Fuzzy Relational sentence level Expectation maximization clustering
Algorithm a clustering algorithm is used in this work to implement the separation of information among the various
subsystems, which are organized into a Parallel Collaborative Structure, PCS. Each of these subsystems may contain
information related with particular aspects of the system or merely collaborates to the performance of f(x). A PCS
structure with n sub models fuzzy systems is depicted in Fig. 1. Each fuzzy system model i has two outputs: an output
variable yi and the correspondent fuzzy system relevance Ri (x)
This fuzzy system architecture describes the strength of mind collaboration among the different fuzzy models
Therefore, the output of the SLIM model is the integral of the individual contributions of each fuzzy subsystem:
𝑛
f(X) = ∫𝑖=1 𝑓𝑖 (𝑋). 𝑅𝑖 (𝑋) ----------------(1)
4
Text data
(XML Files)
(XMLFiles)
Similarity
Measure(Page Rank))
APFRSEC
Y1/R1(X)
f1(x)
X
Y2/R2(X)
f2(x)
Clusters with Some
hierarchical order
with sentences
∫
Y3/R3(X)
fn(x)
Fig. 2. Structure of Probabilistic Hierarchical
Collaborative Fuzzy System
Fig1: APFRSEC Clustering Process
Where Ri(x) represents the relevance function of the ith fuzzy subsystem covering the point x of the Universe of
Discourse, and the ∫ is an aggregation operator. The relevance Ri(x) reveals the effective contribution (or belief of its
contribution) to the respective fuzzy system. This variable should be considered in the aggregation of all collaborative
systems. With the same meaning of its congener sub-systems, the relevance of an aggregated system is given by:
𝑅
𝑖 (𝑥)
= ⋃𝑛𝑖=1 𝑅𝑖(𝑥)
------------------(2)
Naturally, if the ith fuzzy subsystem covers appropriately the region of point x, its relevance value is high (very close to
one), otherwise the relevance value is low (near zero or zero).
D.THE FCM ALGORITHM
Clustering is well established as a way to separate a set X={x1, x2, x3,…xnp} into c subsets that represent (sub)
structures of X. A partition can be described by a c × n partition matrix U. Each element uik , i =1,..,c , k =1…, n of
the partition matrix represents the membership of xk ∈ X in the ith cluster. We distinguish a particular set of partition
matrices.
Mfcm={U€[0,1]cn ∑𝑐𝑖=1 𝑢𝑖𝑘 = 1, 𝑘 = 1, . . 𝑛𝑝, 𝑖 = 1, . . 𝑐}
--------------------(3)
FCM is defined as the following problem: Given the data set X, any norm ||.|| on Rp and a fuzziness parameter
m∈ (1,∞) , minimize the objective function
𝐽(𝑈, 𝑉) = ∑𝑐𝑘=1 ∑𝑐𝑖=1 ⋃𝑚
𝑖𝑘 𝑑𝑖𝑘, ,1 < 𝑚 ≤ ∞
----------------------(4)
5
Where d ik= ||Xk-Vi||2 ; U€ Mfcm and V={v1,v2,….vc}⊂ Rp
is a set of prototype points (cluster centers). It can
be Shown that the following algorithm may lead the pair (U*,V*) to a minimum, using alternating optimization
(Hathaway et. al, 1988), which result is resumed as follows:
E. Probabilistic Fuzzy C-Means Algorithm:
Step 1– For a set of points X={x1, x2,..., xnp}, with xi∈ Rn , keep c, 2 ≤ c < np, and initialize U(0)∈ Mfcm.
Step 2– On the rth iteration, with r = 0, 1, 2, ..., compute
the c mean vectors.
𝑛𝑝
𝑉𝑖 (𝑟) =
𝑚
∑𝑘=1(𝑢𝑖𝑘 (𝑟)) .𝑋𝑘
𝑚
𝑛𝑝
∑𝑘=1(𝑢𝑖𝑘 (𝑟))
, 𝑖 = 1,2,3, … . . , 𝑐 -------------------------(5)
Step 3– Compute the new partition matrix U(r+1) using the expression:
1
𝑢𝑖𝑘 (𝑟 + 1) =
1
𝑚−1
--------------------------(6)
𝑑
∑𝑐𝑗=1( 𝑖𝑘 )
𝑑𝑗𝑘
For 1<=i<=c, 1<=k<=np, where ηk ∈ R.
Step 4– Compare U(r) with U(r+1): If ||U(r+1)-U(r)||< ε Then the process ends. Otherwise let r = r + 1 and go to
step 2. ε is a small real positive constant. The equation Defines the probabilistic (FCM) membership function
for cluster i in the universe of discourse of all data vectors X.
F.Our APFRSEC ALGORITHM
Our aggregated probabilistic Fuzzy Relational sentence level Expectation maximization clustering Algorithm a
clustering algorithm is used in this work to implement the separation of information among the various subsystems,
which are organized into a Parallel Collaborative Structure, PCS. Each of these subsystems may contain information
related with particular aspects of the system or merely collaborates to the performance of f(x).
Initialization:
# of original word patterns
: m
# of sentences formed
: s
# of classes
: p
Initial #of clusters
: k=0
Input:
Xi = <xi1,xi2,xi3,……xip>,1<=i<=s>
Output:
Clusters G1, G2; . . .;Gk
Procedure for APFRSEC Algorithm:
For Each word pattern in a Sentence Si,1<=i<=m
Probability Relevance Ri is finding out by (2);
Then aggregating the probability relevance of each word
in a given sentence By (1).
The Sentence which gets maximum Probability of
Relevance to a Cluster G gets the Priority.
Return with the Created K clusters;
end Procedure.
6
EXPERIMENTAL RESULT
Table 1: Clustering Performance Evaluation
Techniques
Purity
Entropy
V-means
Rand
F-means
APFRSEC
0.810
0.328
0.649
0.865
0.606
FRECCA
0.800
0.324
0.646
0.862
0.601
ARCA
0.622
0.451
0.524
0.815
0.462
Spec.Clus
0.690
0.475
0.508
0.800
0.444
k-medoids
0.720
0.457
0.546
0.779
0.459
0.9
0.8
Entropy
0.7
Purity
0.6
0.5
APFRSEC
FRECCA
ARCA
Spec. Clus
K-Medoids
Fig 3 : Purity and Entropy Comparison
1
0.8
0.6
V-means
Rand
0.4
F-Means
0.2
0
APFRSEC
FRECCA
ARCA
Spec. Clus
K-Medoids
Fig 4: Performance Comparison
7
In table1, the comparison is performed out for 6 numbers of clusters. We compare the performance of APFRSEC
algorithm with ARCA, Spectral Clustering, and Medoids algorithms to the quotations data set and valuating using the
external measures. In each algorithm, the affinity matrix was used and pair wise similarities also calculated for each of
the method. It is to be observed that FRECCA algorithm is able to identify and avoid overlapping clusters. In Our
Aggregated probabilistic Fuzzy sentence level Expectation maximization clustering Algorithm, It can forms the
Sentence Level Clusters Using Fuzzy C-Means And find out the Probabilities of each valuable Word In a given
sentence Our Method Find out The Relevance of each Word from Above Stated formula and find out the Aggregate of
Probabilities of collection of words in a given Sentense.Then We Get the maximum likelihood estimates to belong to a
Cluser Accurately.
CONCLUSION
In this Work the Expectation maximization Probabilistic fuzzy c-means (FCM) algorithm is proposed and applied to
clustering fuzzy sets. It gives the More Accurate Results When compared to Existing Sytem. Thus we found
maximum likelihood estimates of parameters.
ACKNOWLEDGMENT
Our Sincere thanks to the experts who have contributed towards Development of this paper
[1]
REFERRENCES
Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee, Member, A Fuzzy Self-Constructing Feature Clustering
Algorithm for Text Classification, March 2011, IEEE, VOL. 23, NO. 3.
[2]
N. Slonim and N. Tishby. “The power of word clusters for text classification,” 23rd European Colloquium on
Information Retrieval Research (ECIR), 2001.
[3]
Neha Mehta, Mamta Kathuria, Mahesh Singh, “Comparison of conventional
A survey”, April 2014 IJARCCE,Vol. 2,Issue 1
[4]
G.Thilagavathi, J.Anitha,K.Nethra,” Sentence Similarity based Document Clustering using Fuzzy algorithm”
March 2014, IJAFRC, Vol1, Issue 3.
[5]
K.Jeyalakshmi1, R.Deepa2, M.Manjula,” An Efficient Clustering Sentence-Level Text Using A Novel
Hierarchical Fuzzy Relational Cluster- Ing Algorithm”, February2014, IJARCCE,Vol.3 Issue.2.
[6]
M.S.Yang,” A Survey of Fuzzy clustering”, October 1993, Vol 18, No 11.
and fuzzy Clustering Techniques
[7 ] F.Pereira, N. Tishby, and L.Lee.“Distribution Al clustering of English words,” 31st Annual Meeting of
ACL,1993, pages 183– 190
[8]
Hathaway RJ, Bezdek JC Recent convergence results for the fuzzy C-means clustering algorithms, oct 1988. J
Class 5:237-247.
[9]
S.M. Jagatheesan1, V. Thiagarasu2,” Development of Fuzzy based categorical Text Clustering Algorithm for
Information Retrieval”, January 2014, vol 2, issue 1.
[10] K. Nalini Dr. L. Jaba Sheela,” Survey on Text Classification”, IJIRAE, July 2014, Vol 1, Issue6
[11] Roventa, E., Spircu, T.” Averaging Procedures in Defuzzification Processes, Fuzzy Sets and Systems “,
2003,136, pp. 375-385.
[12]
S.J.Lee and C. S. Ouyang. “A neuro-fuzzys system modeling with self-constructing rule generation and hybrid
svd-based learning,”, IEEE Transaction son Fuzzy Systems, June 2003,11(3):341– 353.
[13] G. Salton and M. J. McGill. Introduction to Modern Retrieva,. McGraw-Hill Book Company,1983.
8
[14] F. Sebastiani. “Machine learning in automated Ed text categorization,” ACM Computing Surveys,34(1):1
47,March 2002.
[15]
L. X.Wang. A Couse in Fuzzy Systems and Control. Prentic-Hall International, Inc., 1997.
[16] Y. Yang and J. O. Pedersen. “A comparative study on feature selection in text categorization.
[17] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, “Distributional Word Clusters Versus Words for Text
Categorization, ” J. Machine Learning Research, 2003,vol. 3, pp. 1183-1208
[18]
L.D. Baker and A. McCallum, “Distribution Al Clustering of Words for Text Classification,” Proc. ACM
SIGIR, pp, 1998, Pages 96-10.
BIOGRAPHY
V.L.Kartheek is currently pursuing his M.Tech in Computer Science and Technology from SRKR Engineering
College, Bhimavaram, W.G.Dt., Andhrapradesh, India which is Affiliated to Andhra University, Visakhapatnam. His
main research interests are Information retrieval, Artificial Intelligence, Machine learning, and data mining. Mobile
computing.
Dr. Chandra Sekhar Vasamsetty received Ph.D., in Computer Science and Systems Engineering from Andhra
University in 2014. He is working as Associate Professor in the department of CSE, SRKR Engineering College,
Bhimavaram, W.G.Dt., A.P, India. He has 19 years of experience in teaching. His main research interests include Bioinformatics, Software Engineering, Computer Networks, Machine Learning, Data Mining & Information retrieval.
9