Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

Theoretical computer science wikipedia , lookup

Operational transformation wikipedia , lookup

Machine learning wikipedia , lookup

Pattern recognition wikipedia , lookup

Transcript
NAÏVE BAYES CATEGORIZATION USING TEXT SPECIFIC FEATURES
Abstract:
• A Bayesian classification approach for automatic text categorization using
class-specific features.
• Unlike the conventional approaches for text categorization, proposed method
selects a specific feature subset for each class.
•
To apply these class-dependent features for classification, we follow
Baggenstoss’s PDF Projection Theorem to reconstruct PDFs in raw data
space from the class-specific PDFs in low-dimensional feature space, and
build a Bayes classification rule.
• One noticeable significance of our approach is that most feature selection
criteria, such as Information Gain (IG) and Maximum Discrimination (MD),
can be easily incorporated into our approach.
• Evaluate our method’s classification performance on several real-world
benchmark data sets, compared with the state-of-the-art feature selection
approaches.
• The superior results demonstrate the effectiveness of the proposed approach
and further indicate its wide potential applications in text categorization.
Existing System:
• THE wide availability of web documents in electronic forms requires an
automatic technique to label the documents with a predefined set of topics,
what is known as automatic Text Categorization (TC).
•
Over the past decades, it has been witnessed a large number of advanced
machine learning algorithms to address this challenging task.
•
By formulating the TC task as a classification problem, many existing
learning approaches can be applied
Disadvantages:
• Assumption:class conditional indepence ,so accuracy is less.
• Pratically dependencies exist among variables.
• Dependies among these cannot be modelled
Proposed System:
• The Naive Bayesian classifier is based on Bayes’ theorem with
independence assumptions between predictors.
• A Naive Bayesian model is easy to build, with no complicated iterative
parameter estimation which makes it particularly useful for very large
datasets.
• Despite its simplicity, the Naive Bayesian classifier often does surprisingly
well and is widely used because it often outperforms more sophisticated
classification methods.
Advantages:
• Easy to implement.
• Requires small of training data to estimate the parameters.
• Good results obtain in most of the cases.
Software Requirements:
 Operating system : - Windows 7. 32 bit
 Coding Language : C#.net 4.0
 Data Base
: SQL Server 2008
Hardware requirements:
 System
 Hard Disk
: Pentium IV 2.4 GHz.
: 40 GB.
 Floppy Drive : 1.44 Mb.
 Monitor
: 15 VGA Colour.
 Mouse
: Logitech.
 Ram
: 512 Mb.
References:
• G. Forman, “An extensive empirical study of feature selection metrics for
text classification,” The Journal of machine learning research, vol. 3, pp.
1289–1305, 2003.
• H. Liu and L. Yu, “Toward integrating feature selection algorithms for
classification and clustering,” IEEE Transactions on Knowledge and Data
Engineering, vol. 17, no. 4, pp. 491–502, 2005.
• P. M. Baggenstoss, “Class-specific feature sets in classification,” IEEE
Transactions on Signal Processing, vol. 47, no. 12, pp. 3428–3432, 1999.
•
“The pdf projection theorem and the class-specific method,” IEEE
Transactions on Signal Processing, vol. 51, no. 3, pp. 672–685, 2003.
• A. McCallum, K. Nigam et al., “A comparison of event models for naive
bayes text classification,” in AAAI-98 workshop on learning for text
categorization, vol. 752, 1998, pp. 41–48.
•
V. Kecman, Learning and soft computing: support vector machines, neural
networks, and fuzzy logic models. MIT press, 2001.
•
L. Wang and X. Fu, Data mining with computational intelligence. Springer
Science & Business Media, 2006. [10] D. D. Lewis, “Naive (Bayes) at forty:
The independence
assumptionininformationretrieval,”inMachinelearning:ECML98, 1998, pp.
4–15.