Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mixture model wikipedia , lookup
Human genetic clustering wikipedia , lookup
Nonlinear dimensionality reduction wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
K-means clustering wikipedia , lookup
Aggregated Probabilistic Fuzzy Relational Sentence Level Expectation Maximization Clustering Algorithm For Efficient Text Categorization Lakshmi Kartheek V1, Dr.Chandra Sekhar V2 II M.Tech, Department of CSE, S.R.K.R Engineering College, Bhimavaram, AP, India 1. Associate Professor, Department of CSE, S.R.K.R Engineering College, Bhimavaram, AP, India 2 Abstract- Text clustering has found an important application to organize the data and to extract useful information from the available corpus. We proposed EM algorithm extracts the potentially noisy article from the data set using the descending porthole technique. Many existing clustering techniques have difficulties in handling extreme outliers but fuzzy clustering algorithms tend to give them very small membership degree in surrounding clusters. In this paper a generalized Expectation–maximization Probabilistic fuzzy c-means (FCM) algorithm is proposed and applied to clustering fuzzy sets. This technique leads to a fuzzy partition of the fuzzy rules, one for each cluster, which corresponds to a new set of fuzzy sub-systems. When applied to the clustering of a flat fuzzy system results a set of decomposed sub-systems that will be conveniently linked into a Parallel Collaborative Structures. This algorithm is particularly used in finding maximum likelihood estimates of parameters. Keywords: Fuzzy Clustering, probabilistic Fuzzy c-means, Fuzzy Sets, Parallel Collaborative Structures. INTRODUCTION Cluster analysis is primarily a tool for discovering previously hidden structure in the set of unordered objects, where we assume that a natural grouping exists in the data. Cluster analysis is a technique for classifying data [1], i.e., to divide a given set of objects into a set of classes or clusters based on similarity. The goal is to divide the data set in such a way that cases assigned to the same cluster should be as similar as possible whereas two objects from different clusters should be as dissimilar as possible[2]. It is an approach towards unsupervised learning as well as one of the major techniques in pattern recognition. The conventional (hard) clustering methods restrict each point of the data set to exactly one cluster [1]. These methods yield exhaustive partitions of the example set into non-empty and pair wise disjoint subsets. Fuzzy cluster analysis, [3] therefore allows gradual memberships of data points to clusters in [0, 1]. This gives the flexibility to express that data points belong to more than one cluster at the same time. Furthermore, these membership degrees offer a much finer degree of detail of the data model. One of the most popular object data clustering algorithms is the FCM algorithm, proposed by Dunn. Clustering is the process of grouping or aggregating of data items. Sentence clustering mainly used in variety of Applications such as classify and categorization of documents, automatic summary generation, organizing the documents, etc [4]. In text processing, sentence clustering plays a vital role this is used in text mining activities. Size of the clusters may change from one cluster to another. The traditional clustering algorithms have some problems in clustering the input dataset [3]. The problems such as, instability of clusters, complexity and sensitivity. To overcome the drawbacks of these clustering algorithms, Later an algorithm called Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) [5] is extension of FRECCA which is used for the clustering of sentences. Contents present in text documents contain hierarchical structure and there are many terms present in the documents which are related to more than one theme hence HFRECCA will be useful algorithm for natural language documents. In this algorithm single object may belong to more than one cluster [6],[7]. In 1973 and Later it is extended by Bezdek (1981), which can be applied if the objects of interest are represented as points in a multi-dimensional space. FCM [8] relates the concept of object similarity to spatial closeness and finds cluster centres as prototypes. Several examples of application of FCM to real clustering problems have proved the good characteristics of this algorithm with respect to stability and partition quality [8]. Further, its convergence has been formally demonstrated (Bezdek, 1987; Hathaway et. al. , 1988). From this method a large variety of clustering techniques was derived with more complex prototypes, 1 which are mainly interesting in data analysis applications [9]. However, the generalization of these techniques to clustering imprecisely or uncertainly data or objects is not yet explored. Moreover, in the real-world applications, transaction data are usually composed of quantitative values. Designing a sophisticated data-mining algorithm to deal with different types of data turns a challenge in this research topic. Recently, fuzzy set theory is more and more frequently used in intelligent systems, because of its simplicity and similarity to human reasoning. The theory has been successfully applied to many fields such as manufacturing, engineering, diagnosis, economics, and others (Höppner, 1999). In this context, a generalization of the previously methods in order to be used in clustering of fuzzy data (or fuzzy numbers) [9] would be a meritorious research. In this work, a new fuzzy relational clustering algorithm, based on the fuzzy c-means [8] algorithm is proposed to clusters fuzzy data, which is used in the antecedent and the consequents parts of the fuzzy rules. This clustering process divides the fuzzy rules of a Fuzzy System into a set of classes or clusters of fuzzy rules based on similarity. From this new strategy, a flat fuzzy system f(x) can be organized into a hierarchical structure of fuzzy systems (Salgado, 2005a and 2007b). Hierarchical fuzzy modelling is a promising method to identify fuzzy models of target systems with many input variables or/and with different complexity interrelation. Partitioning a fuzzy system reduces its complexity, which simplifies the identification problem, improves the computation times and saves resources, such as memory space. Moreover, with the organization of the fuzzy system into a new hierarchical structure, the model readability and transparency can be improved. In this context, we propose a new technique, the Probabilistic Fuzzy Clustering of Fuzzy Rules (FCFR), based on cluster methodology, to decompose a flat fuzzy system f(x) into a set of n fuzzy sub-systems f1(x), f2(x), ..., fn(x), organized in a collaborative structure. Each of these clusters may contain information related with particular aspects of the system f(x). The proposed algorithm allows grouping a set of rules into c subgroups (clusters) of similar rules. It is a generalization of the Probabilistic Clustering Algorithm (FCM)[8] , here applied to rules instead of points. With this algorithm, the system obtained from The data is transformed into a new system, organized into several subsystems, in PCS structures (Salgado, 2005b and 2007a). The paper is organized as follows: firstly, a brief introduction to fuzzy systems is presented. The concept of relevance of a set of rules and of fuzzy system is reviewed. The PCS structure is described in section 3. In section 4 the FCFR strategy is proposed. An example is presented in section 5. Finally, the main conclusions are outlined in section 6. The development of information technology in the last two decades paved the way for a world full of data. But much of these data are of potentially not useful. In order to make it useful one, we need to extract the large amount of information or knowledge underlying the data. Data mining is a process of extraction the valuable information inside the huge amount of data. Clustering techniques can help in this data discovery and data analysis. Clustering the sentences is mainly useful in Information Retrieval (IR) Process. Clustering text [7] at the sentence level and document level has many differences. Document clustering partitions the documents into several parts and cluster those parts based on the overall theme. It doesn’t give much importance to the semantics of each sentence in the document. So there may be content overlap or bad coverage of theme will happen in the case of multi document summarization. Each data element in hard clustering method belongs to exactly one cluster. A. Cluster Analysis There are several algorithms available for clustering [10]. Each algorithm will cluster or group similar data objects in a useful way. This task involves dividing the data into various groups called clusters. The application of clustering includes Bioinformatics, Business modeling, image processing etc. In general, the text mining [10] process focuses on the statistical study of terms or phrases which helps us to understand the significance of a word within a document. Even if the two words didn’t have similar meanings, clustering will takes place. Clustering can be considered the most important unsupervised[11] learning framework, a cluster is declared as a group of data items, which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. Sentence Clustering mainly used in variety of text mining applications. Output of clustering should be related to the query, which is specified by the user. B. Similarity Measure Similarity between the sentences is measured in terms of some distance function; such functions are Euclidean distance or Manhattan distance [1]. The choice of the measure is based on our requirement that induces the cluster size and formulates the success of a clustering algorithm on the specific application domain [4,7]. Current sentence clustering methods usually represent sentences as a term document matrix and perform clustering algorithm on it. Although these clustering methods [12] can group the documents satisfactorily, it is still hard for people to capture the meanings of the documents since there is no satisfactory interpretation for each document cluster. Based on the similarity or dissimilarity values clustering will takes place [1]. 2 BACK GROUND AND RELATED WORK A. Efficient Conceptual Rule Mining on Text Clusters Text mining is a modern and computational approach, which attempts to determine new, formerly unidentified information by applying various techniques from normal language processing and data mining domains. Clustering is one of the conventional data mining techniques[13] that is used us an unsubstantiated learning pattern, where clustering techniques attempt to groupings of the text documents with different data items. Clusters have high intra-cluster similarity and low inter-cluster similarity. Most of the current document clustering methods and algorithms are fully based on the Vector Space Model (VSM), which is a widely used as a data representation for text classification and clustering. Weighting these feature words accurately affects the result of the clustering algorithm and improves its efficiency. In this paper, we are going to present a Conceptual rule mining which is generated for the sentence meaning and related sentences in the document. Weights are appropriated for the sentences having higher contribution to the topic of the entire document. B. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy C. This proposal is a new approach for measuring the semantic similarity between the words and concepts. The characteristics of polysemy and synonymy that exist in words of natural language have always been a challenge in the fields of Natural Language Processing (NLP) and Information Retrieval (IR) techniques [13]. There are certain advantages in the work of semantic association discovery by combining a taxonomy structure with corpus statistics. The proposed approach outperforms other computational models. It gives the highest correlation value, with a benchmark resulting from human similarity judgments. D. Some Experiments on Clustering Similar Sentences of Texts Identifying similar text passages or sentences plays an important role in many of the text mining applications. The proposal is based on some experiments [14] on clustering similar sentences of texts in the documents. The proposed framework is based on an incremental and unsupervised clustering method which is combined with statistical similarity metrics to measure the semantic distance between sentences [15]. It aims at identifying sets of highly semanticallyrelated sentences from a collection of documents. The Sentence Splitting is performed by a textual-segmentation tool called SENTER, which is based on a list of abbreviations and some sentence delimiters. E. A Clustering Algorithm for Text Summarization using Expectation Maximization Nowadays, large amount of data is available in the form of texts. It is very difficult for human beings to manually find out useful and significant data[16]. This problem can be solved with the help of text summarization algorithms. Text Summarization is the process of condensing the input text file into shorter version by preserving its overall content and meaning. This paper is about called text summarization using natural language processing. The proposal describes a system, which consists of two steps. In first step, we are implementing the phases of natural language processing that are splitting, tokenization, part of speech tagging, and parsing [17]. In second step, we are implementing Expectation Maximization (EM) Clustering Algorithm[18] to find out sentence similarity between the sentences. Based on the value of sentences similarity, we can easily summarize text. E. Fuzzy Relational Clustering FRECCA A fuzzy relational clustering approach is used to produce clusters with sentences, where each of them corresponds to some content. The output of clustering indicates the strength of the association among the data elements. Andrew Skabar and Khaled Abdalgader [1] proposed a novel fuzzy relational clustering algorithm called FRECCA (Fuzzy Relational Eigen Vector Centrality based Clustering Algorithm). The algorithm involves the following steps. Initialization: cluster membership values are initialized randomly, and normalized. Mixing coefficients are initialized. Expectation: Calculates the Page Rank value for each object in each cluster. Maximization: Updating the mixing coefficients based on membership values calculated in the Expectation Step. F. HFRECCA The general idea of the Hierarchical fuzzy clustering is the partitioning of the data items into a collection of clusters. The data points are assigned membership values for each of the clusters. Many existing clustering techniques have 3 difficulties in handling extreme outliers but fuzzy clustering algorithms tend to give them very small membership degree in surrounding clusters. This algorithm is an extension of fuzzy relational clustering algorithm. An expectation– maximization (EM) algorithm is an iterative process, in which the model mainly depends on some unobserved latent/hidden variables. This algorithm is particularly used in finding maximum likelihood estimates of parameters. The EM algorithm’s iteration alternates between performing an expectation (E) Step; this creates a function to compute the cluster membership probabilities and maximization (M) step, in which these probabilities are then used to reestimate the parameters. These parameter estimates are then used to find out the distribution of the latent variables in the next E step. PROPOSED WORK In this wok, we analyze how one can take advantage of the efficiency and stability of clusters, when the data to be clustered are available in the form of similarity relationships between pairs of objects. More precisely, we propose a new Hierarchical fuzzy relational clustering algorithm, based on the existing fuzzy C-means (FCM) algorithm, which does not require any restriction on the relation matrix. This HRECCA algorithm is applied for the clustering of the text data which is present in the form of xml (hierarchical) files. HFRECCA will give the output as clusters which are grouped from text data which is present in a given documents. In this HFRECCA algorithm, Page Rank algorithm is used as similarity measure. A.Page Rank We describe the application of the algorithm to data sets, and show that our algorithm performs better than other fuzzy clustering algorithms. In the proposed algorithm, we describe the use of Page Rank and use the Gaussian mixture model approach. Page Rank is used as a graph centrality measure. Page Rank algorithm is used to determine the importance of a particular node within a graph. Importance of node is used as a measure of centrality. This algorithm assigns numerical score (from 0 to 1) to every node in graph. This score is known as Page Rank Score. Sentence is represented by node on a graph and edges are weighted with value representing similarity between sentences. Page Rank can be used within the Expectation- Maximization algorithm to optimize the parameter values and to formulate the clusters. A graph representation of data objects is used in along with the Page Rank algorithm. It operates within an Expectation-Maximization; it is a framework which is a general purpose method for learning knowledge from the incomplete data. Each sentence in a document is represented by a node in the directed graph and the objects with weights indicate the object similarity. B,EM algorithm It is an unsupervised method, which does not need any training phase; it tries to find the parameters of the probability distribution that has the maximum likelihood of its parameters. Its main role is to parameter estimation. It is an iterative method, which is mainly used to finding the maximum likelihood parameters of the model. The E-step involves the computation of cluster membership probabilities. The probabilities calculated from E-step are reestimated with the parameters in M-step. C.APFRSEC The Main idea of aggregated probabilistic Fuzzy Relational sentence level Expectation maximization clustering Algorithm a clustering algorithm is used in this work to implement the separation of information among the various subsystems, which are organized into a Parallel Collaborative Structure, PCS. Each of these subsystems may contain information related with particular aspects of the system or merely collaborates to the performance of f(x). A PCS structure with n sub models fuzzy systems is depicted in Fig. 1. Each fuzzy system model i has two outputs: an output variable yi and the correspondent fuzzy system relevance Ri (x) This fuzzy system architecture describes the strength of mind collaboration among the different fuzzy models Therefore, the output of the SLIM model is the integral of the individual contributions of each fuzzy subsystem: 𝑛 f(X) = ∫𝑖=1 𝑓𝑖 (𝑋). 𝑅𝑖 (𝑋) ----------------(1) 4 Text data (XML Files) (XMLFiles) Similarity Measure(Page Rank)) APFRSEC Y1/R1(X) f1(x) X Y2/R2(X) f2(x) Clusters with Some hierarchical order with sentences ∫ Y3/R3(X) fn(x) Fig. 2. Structure of Probabilistic Hierarchical Collaborative Fuzzy System Fig1: APFRSEC Clustering Process Where Ri(x) represents the relevance function of the ith fuzzy subsystem covering the point x of the Universe of Discourse, and the ∫ is an aggregation operator. The relevance Ri(x) reveals the effective contribution (or belief of its contribution) to the respective fuzzy system. This variable should be considered in the aggregation of all collaborative systems. With the same meaning of its congener sub-systems, the relevance of an aggregated system is given by: 𝑅 𝑖 (𝑥) = ⋃𝑛𝑖=1 𝑅𝑖(𝑥) ------------------(2) Naturally, if the ith fuzzy subsystem covers appropriately the region of point x, its relevance value is high (very close to one), otherwise the relevance value is low (near zero or zero). D.THE FCM ALGORITHM Clustering is well established as a way to separate a set X={x1, x2, x3,…xnp} into c subsets that represent (sub) structures of X. A partition can be described by a c × n partition matrix U. Each element uik , i =1,..,c , k =1…, n of the partition matrix represents the membership of xk ∈ X in the ith cluster. We distinguish a particular set of partition matrices. Mfcm={U€[0,1]cn ∑𝑐𝑖=1 𝑢𝑖𝑘 = 1, 𝑘 = 1, . . 𝑛𝑝, 𝑖 = 1, . . 𝑐} --------------------(3) FCM is defined as the following problem: Given the data set X, any norm ||.|| on Rp and a fuzziness parameter m∈ (1,∞) , minimize the objective function 𝐽(𝑈, 𝑉) = ∑𝑐𝑘=1 ∑𝑐𝑖=1 ⋃𝑚 𝑖𝑘 𝑑𝑖𝑘, ,1 < 𝑚 ≤ ∞ ----------------------(4) 5 Where d ik= ||Xk-Vi||2 ; U€ Mfcm and V={v1,v2,….vc}⊂ Rp is a set of prototype points (cluster centers). It can be Shown that the following algorithm may lead the pair (U*,V*) to a minimum, using alternating optimization (Hathaway et. al, 1988), which result is resumed as follows: E. Probabilistic Fuzzy C-Means Algorithm: Step 1– For a set of points X={x1, x2,..., xnp}, with xi∈ Rn , keep c, 2 ≤ c < np, and initialize U(0)∈ Mfcm. Step 2– On the rth iteration, with r = 0, 1, 2, ..., compute the c mean vectors. 𝑛𝑝 𝑉𝑖 (𝑟) = 𝑚 ∑𝑘=1(𝑢𝑖𝑘 (𝑟)) .𝑋𝑘 𝑚 𝑛𝑝 ∑𝑘=1(𝑢𝑖𝑘 (𝑟)) , 𝑖 = 1,2,3, … . . , 𝑐 -------------------------(5) Step 3– Compute the new partition matrix U(r+1) using the expression: 1 𝑢𝑖𝑘 (𝑟 + 1) = 1 𝑚−1 --------------------------(6) 𝑑 ∑𝑐𝑗=1( 𝑖𝑘 ) 𝑑𝑗𝑘 For 1<=i<=c, 1<=k<=np, where ηk ∈ R. Step 4– Compare U(r) with U(r+1): If ||U(r+1)-U(r)||< ε Then the process ends. Otherwise let r = r + 1 and go to step 2. ε is a small real positive constant. The equation Defines the probabilistic (FCM) membership function for cluster i in the universe of discourse of all data vectors X. F.Our APFRSEC ALGORITHM Our aggregated probabilistic Fuzzy Relational sentence level Expectation maximization clustering Algorithm a clustering algorithm is used in this work to implement the separation of information among the various subsystems, which are organized into a Parallel Collaborative Structure, PCS. Each of these subsystems may contain information related with particular aspects of the system or merely collaborates to the performance of f(x). Initialization: # of original word patterns : m # of sentences formed : s # of classes : p Initial #of clusters : k=0 Input: Xi = <xi1,xi2,xi3,……xip>,1<=i<=s> Output: Clusters G1, G2; . . .;Gk Procedure for APFRSEC Algorithm: For Each word pattern in a Sentence Si,1<=i<=m Probability Relevance Ri is finding out by (2); Then aggregating the probability relevance of each word in a given sentence By (1). The Sentence which gets maximum Probability of Relevance to a Cluster G gets the Priority. Return with the Created K clusters; end Procedure. 6 EXPERIMENTAL RESULT Table 1: Clustering Performance Evaluation Techniques Purity Entropy V-means Rand F-means APFRSEC 0.810 0.328 0.649 0.865 0.606 FRECCA 0.800 0.324 0.646 0.862 0.601 ARCA 0.622 0.451 0.524 0.815 0.462 Spec.Clus 0.690 0.475 0.508 0.800 0.444 k-medoids 0.720 0.457 0.546 0.779 0.459 0.9 0.8 Entropy 0.7 Purity 0.6 0.5 APFRSEC FRECCA ARCA Spec. Clus K-Medoids Fig 3 : Purity and Entropy Comparison 1 0.8 0.6 V-means Rand 0.4 F-Means 0.2 0 APFRSEC FRECCA ARCA Spec. Clus K-Medoids Fig 4: Performance Comparison 7 In table1, the comparison is performed out for 6 numbers of clusters. We compare the performance of APFRSEC algorithm with ARCA, Spectral Clustering, and Medoids algorithms to the quotations data set and valuating using the external measures. In each algorithm, the affinity matrix was used and pair wise similarities also calculated for each of the method. It is to be observed that FRECCA algorithm is able to identify and avoid overlapping clusters. In Our Aggregated probabilistic Fuzzy sentence level Expectation maximization clustering Algorithm, It can forms the Sentence Level Clusters Using Fuzzy C-Means And find out the Probabilities of each valuable Word In a given sentence Our Method Find out The Relevance of each Word from Above Stated formula and find out the Aggregate of Probabilities of collection of words in a given Sentense.Then We Get the maximum likelihood estimates to belong to a Cluser Accurately. CONCLUSION In this Work the Expectation maximization Probabilistic fuzzy c-means (FCM) algorithm is proposed and applied to clustering fuzzy sets. It gives the More Accurate Results When compared to Existing Sytem. Thus we found maximum likelihood estimates of parameters. ACKNOWLEDGMENT Our Sincere thanks to the experts who have contributed towards Development of this paper [1] REFERRENCES Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee, Member, A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification, March 2011, IEEE, VOL. 23, NO. 3. [2] N. Slonim and N. Tishby. “The power of word clusters for text classification,” 23rd European Colloquium on Information Retrieval Research (ECIR), 2001. [3] Neha Mehta, Mamta Kathuria, Mahesh Singh, “Comparison of conventional A survey”, April 2014 IJARCCE,Vol. 2,Issue 1 [4] G.Thilagavathi, J.Anitha,K.Nethra,” Sentence Similarity based Document Clustering using Fuzzy algorithm” March 2014, IJAFRC, Vol1, Issue 3. [5] K.Jeyalakshmi1, R.Deepa2, M.Manjula,” An Efficient Clustering Sentence-Level Text Using A Novel Hierarchical Fuzzy Relational Cluster- Ing Algorithm”, February2014, IJARCCE,Vol.3 Issue.2. [6] M.S.Yang,” A Survey of Fuzzy clustering”, October 1993, Vol 18, No 11. and fuzzy Clustering Techniques [7 ] F.Pereira, N. Tishby, and L.Lee.“Distribution Al clustering of English words,” 31st Annual Meeting of ACL,1993, pages 183– 190 [8] Hathaway RJ, Bezdek JC Recent convergence results for the fuzzy C-means clustering algorithms, oct 1988. J Class 5:237-247. [9] S.M. Jagatheesan1, V. Thiagarasu2,” Development of Fuzzy based categorical Text Clustering Algorithm for Information Retrieval”, January 2014, vol 2, issue 1. [10] K. Nalini Dr. L. Jaba Sheela,” Survey on Text Classification”, IJIRAE, July 2014, Vol 1, Issue6 [11] Roventa, E., Spircu, T.” Averaging Procedures in Defuzzification Processes, Fuzzy Sets and Systems “, 2003,136, pp. 375-385. [12] S.J.Lee and C. S. Ouyang. “A neuro-fuzzys system modeling with self-constructing rule generation and hybrid svd-based learning,”, IEEE Transaction son Fuzzy Systems, June 2003,11(3):341– 353. [13] G. Salton and M. J. McGill. Introduction to Modern Retrieva,. McGraw-Hill Book Company,1983. 8 [14] F. Sebastiani. “Machine learning in automated Ed text categorization,” ACM Computing Surveys,34(1):1 47,March 2002. [15] L. X.Wang. A Couse in Fuzzy Systems and Control. Prentic-Hall International, Inc., 1997. [16] Y. Yang and J. O. Pedersen. “A comparative study on feature selection in text categorization. [17] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, “Distributional Word Clusters Versus Words for Text Categorization, ” J. Machine Learning Research, 2003,vol. 3, pp. 1183-1208 [18] L.D. Baker and A. McCallum, “Distribution Al Clustering of Words for Text Classification,” Proc. ACM SIGIR, pp, 1998, Pages 96-10. BIOGRAPHY V.L.Kartheek is currently pursuing his M.Tech in Computer Science and Technology from SRKR Engineering College, Bhimavaram, W.G.Dt., Andhrapradesh, India which is Affiliated to Andhra University, Visakhapatnam. His main research interests are Information retrieval, Artificial Intelligence, Machine learning, and data mining. Mobile computing. Dr. Chandra Sekhar Vasamsetty received Ph.D., in Computer Science and Systems Engineering from Andhra University in 2014. He is working as Associate Professor in the department of CSE, SRKR Engineering College, Bhimavaram, W.G.Dt., A.P, India. He has 19 years of experience in teaching. His main research interests include Bioinformatics, Software Engineering, Computer Networks, Machine Learning, Data Mining & Information retrieval. 9