Automated subject classification of textual web documents
... Another difference within the text categorization approach is in the document pre-‐processing and indexing part, where documents are represented as vectors of term weights. Computing the term weights can be ...
... Another difference within the text categorization approach is in the document pre-‐processing and indexing part, where documents are represented as vectors of term weights. Computing the term weights can be ...
Semantics-Based Spam Detection by Observance of Outgoing
... Abstract—The existing spam detection system are mostly keyword-based and find the spam message present in the outgoing message by matching the keyword. The quality of result provided by traditional keyword-based spam detection is not optimal for finding the spam information present in the message. T ...
... Abstract—The existing spam detection system are mostly keyword-based and find the spam message present in the outgoing message by matching the keyword. The quality of result provided by traditional keyword-based spam detection is not optimal for finding the spam information present in the message. T ...
Distractor Quality Analyze In Multiple Choice Questions
... may be too cumbersome and inconvenient to use. Therefore, we should use a mathematical model. The simplest and most natural way is to define Boolean model. In construction of Boolean model, we introduce the following interpretation of the logical variables of Boolean functions. Let’s denote attribut ...
... may be too cumbersome and inconvenient to use. Therefore, we should use a mathematical model. The simplest and most natural way is to define Boolean model. In construction of Boolean model, we introduce the following interpretation of the logical variables of Boolean functions. Let’s denote attribut ...
On Word Frequency Information and Negative Evidence in Naive
... Why is the performance of the multinomial Naive Bayes classifier improved when the word frequency information is eliminated in the documents? In [6] and [7] the distribution of terms in documents was studied. It was found that terms often exhibit burstiness: the probability that a term appears a sec ...
... Why is the performance of the multinomial Naive Bayes classifier improved when the word frequency information is eliminated in the documents? In [6] and [7] the distribution of terms in documents was studied. It was found that terms often exhibit burstiness: the probability that a term appears a sec ...
Intelligent Search on the Internet
... present in the query are lost, therefore relevant information is not retrieved. - polysemy occurs when a term has several different meanings; it causes irrelevant documents to appear in the result lists. In order to solve such problems, documents are represented through underlying concepts. The conce ...
... present in the query are lost, therefore relevant information is not retrieved. - polysemy occurs when a term has several different meanings; it causes irrelevant documents to appear in the result lists. In order to solve such problems, documents are represented through underlying concepts. The conce ...
Semantic Outlier Detection for Affective Common-Sense Reasoning and Concept-Level Sentiment Analysis Erik Cambria
... Sentic computing (Cambria and Hussain 2015) tackles these crucial issues by exploiting affective common-sense reasoning, i.e., the intrinsically-human capacity to interpret the cognitive and affective information associated with natural language and, hence, to infer new knowledge and make decisions, ...
... Sentic computing (Cambria and Hussain 2015) tackles these crucial issues by exploiting affective common-sense reasoning, i.e., the intrinsically-human capacity to interpret the cognitive and affective information associated with natural language and, hence, to infer new knowledge and make decisions, ...
Matching Ottoman Words: An image retrieval approach to historical
... Chan et al. [4] presented a segmentation based approach that utilizes gHMMs with a bi-gram letter transition model. Their lexiconfree system performs text queries on off-line printed and handwritten Arabic documents. Saykol et al. [20] used the idea of compression for content-based retrieval of Otto ...
... Chan et al. [4] presented a segmentation based approach that utilizes gHMMs with a bi-gram letter transition model. Their lexiconfree system performs text queries on off-line printed and handwritten Arabic documents. Saykol et al. [20] used the idea of compression for content-based retrieval of Otto ...
Word Sense Disambiguation for Arabic Text Categorization
... others representations. The main difficulty in this approach is that it is not capable of determining the correct senses. For a word that has multiple synonyms, they choose the first concept to determine the nearest concept. The work in [14] is a comparative study with the other usual modes of repre ...
... others representations. The main difficulty in this approach is that it is not capable of determining the correct senses. For a word that has multiple synonyms, they choose the first concept to determine the nearest concept. The work in [14] is a comparative study with the other usual modes of repre ...
Keyword Extraction from a Single Document
... used relatively impartially with each frequent term, while a term such as “imitation” or “digital computer” shows cooccurrence especially with particular terms. These biases are derived from either semantic, lexical, or other relations of two terms. Thus, a term with co-occurrence biases may have an ...
... used relatively impartially with each frequent term, while a term such as “imitation” or “digital computer” shows cooccurrence especially with particular terms. These biases are derived from either semantic, lexical, or other relations of two terms. Thus, a term with co-occurrence biases may have an ...
Ontology construction for information classification
... only pose possible setbacks due to the quality of the dictionary, it will also prove incapable of adapting to the incessantly changing environment ...
... only pose possible setbacks due to the quality of the dictionary, it will also prove incapable of adapting to the incessantly changing environment ...
Cross-Language Information Retrieval
... Phrasal Translation and Query Expansion Techniques for Crosslanguage Information Retrieval, Lisa Ballesteros and W. Bruce Croft, Research and Development in Information Retrieval, 1995. Resolving Ambiguity for Cross-Language Retrieval, Lisa Ballesteros and W. Bruce Croft, Research and Development in ...
... Phrasal Translation and Query Expansion Techniques for Crosslanguage Information Retrieval, Lisa Ballesteros and W. Bruce Croft, Research and Development in Information Retrieval, 1995. Resolving Ambiguity for Cross-Language Retrieval, Lisa Ballesteros and W. Bruce Croft, Research and Development in ...
Magnifico: A Platform For Expert Mining Using Metadata
... Afterwards we measure the importance of each word for a given sub-discipline using the term frequency of that word occurring in the specific sub-discipline. For every publication, after collecting all the words appearing in the title and publisher name, stop words are removed from the word collectio ...
... Afterwards we measure the importance of each word for a given sub-discipline using the term frequency of that word occurring in the specific sub-discipline. For every publication, after collecting all the words appearing in the title and publisher name, stop words are removed from the word collectio ...
Pyndri: a Python Interface to the Indri Search Engine
... There is still, however, a lack of an integrated Python library dedicated to Information Retrieval (IR) research. Researchers often implement their own procedures to parse common file formats, perform tokenization, token normalization that encompass the overall task of corpus indexing. Uysal and Gun ...
... There is still, however, a lack of an integrated Python library dedicated to Information Retrieval (IR) research. Researchers often implement their own procedures to parse common file formats, perform tokenization, token normalization that encompass the overall task of corpus indexing. Uysal and Gun ...
INF5820 Distributional Semantics
... Words are in paradigmatic relation if the same neighbors typically occur near them (humans often ‘eat’ both ‘bread’ and ‘butter’). It is also called second order co-occurrence. The words in such a relation may well never actually co-occur with each other. ...
... Words are in paradigmatic relation if the same neighbors typically occur near them (humans often ‘eat’ both ‘bread’ and ‘butter’). It is also called second order co-occurrence. The words in such a relation may well never actually co-occur with each other. ...
N045038690
... with the characteristics of exploitation and exploration, GAs can efficiently deal with large search spaces, and hence are less prone to get stuck into a local optimum solution when compared to other algorithms. This derives from the GAs ability to handle multiple concurrent solutions (individuals) ...
... with the characteristics of exploitation and exploration, GAs can efficiently deal with large search spaces, and hence are less prone to get stuck into a local optimum solution when compared to other algorithms. This derives from the GAs ability to handle multiple concurrent solutions (individuals) ...
Fiqure 4: The Binomail distribution
... obtained per the frequency of terms appearance in the corpus by providing a systematic way to detect which entity classes are most similar to each other and, therefore, which entity classes are the best candidates for establishing the similarity between two terms with respect to the domain ontology. ...
... obtained per the frequency of terms appearance in the corpus by providing a systematic way to detect which entity classes are most similar to each other and, therefore, which entity classes are the best candidates for establishing the similarity between two terms with respect to the domain ontology. ...
NLDB10-OntoGain - Intelligent Systems Laboratory
... Aims at organizing concepts into a hierarchical structure where each concept is related to its respective broader and narrower terms Two methods in OntoGain Agglomerative clustering Formal Concept Analysis (FCA) ...
... Aims at organizing concepts into a hierarchical structure where each concept is related to its respective broader and narrower terms Two methods in OntoGain Agglomerative clustering Formal Concept Analysis (FCA) ...
Conceptual grouping in word co-occurrence networks
... One way to quantify the ideas on conceptual grouping presented above is to build a custom semantic network for a user query. What we do is build a new small semantic network with all concepts that are linked to the user query (e.g. 'bomb', see Figure 1, which shows only some of the links around 'bom ...
... One way to quantify the ideas on conceptual grouping presented above is to build a custom semantic network for a user query. What we do is build a new small semantic network with all concepts that are linked to the user query (e.g. 'bomb', see Figure 1, which shows only some of the links around 'bom ...
CL35491494
... deals with languages. Language refers to a body of words and the systems for their use common to a people who are of the same community or nation, the same geographical area, or the same cultural tradition. It is the primary means of communication used by particular groups of human beings [1]. It is ...
... deals with languages. Language refers to a body of words and the systems for their use common to a people who are of the same community or nation, the same geographical area, or the same cultural tradition. It is the primary means of communication used by particular groups of human beings [1]. It is ...
Discriminative Improvements to Distributional Sentence Similarity
... al., 2003; Arora et al., 2012); the difference from SVD is the addition of a non-negativity constraint in the latent representation based on non-orthogonal basis. While W may simply contain counts of distributional features, prior work has demonstrated the utility of reweighting these counts (Turney ...
... al., 2003; Arora et al., 2012); the difference from SVD is the addition of a non-negativity constraint in the latent representation based on non-orthogonal basis. While W may simply contain counts of distributional features, prior work has demonstrated the utility of reweighting these counts (Turney ...
in the document - XP
... The main idea behind tf-idf is that the term occurring infrequently should be given a higher weight than a term that occurs frequently. •Important definitions in tf-idf context : t = number of distinct terms in the document collection. tfij = number of occurrences of term tj in document Di. This is ...
... The main idea behind tf-idf is that the term occurring infrequently should be given a higher weight than a term that occurs frequently. •Important definitions in tf-idf context : t = number of distinct terms in the document collection. tfij = number of occurrences of term tj in document Di. This is ...
No Slide Title
... – Start with some user-supplied relevance information about a “training set” of documents – The training set is used to compute term weights by estimating P(t in document | document is relevant) P(t in document | document is irrelevant ) ...
... – Start with some user-supplied relevance information about a “training set” of documents – The training set is used to compute term weights by estimating P(t in document | document is relevant) P(t in document | document is irrelevant ) ...
2006 Paula Matuszek
... – hyperbolic viewer based on document similarity; browse a field of scientific documents – “map” based techniques showing peaks, valleys, outliers – Faceted search results showing document counts for different categorizations, with browsing ...
... – hyperbolic viewer based on document similarity; browse a field of scientific documents – “map” based techniques showing peaks, valleys, outliers – Faceted search results showing document counts for different categorizations, with browsing ...