Centroid Based Classification Essay

Class normalization in centroid-based text categorization

Authors: Verayuth LertnatteeFaculty of Pharmacy, Silpakorn University, Sanamchandra Campus, Muang, Nakorn Pathom 73000, Thailand
Thanaruk TheeramunkongInformation Technology Program, Sirindhorn International, Institute of Technology, 131 Moo 5 Tiwanont Rd, Bangkadi, Muang, Pathum Thani 12000, Thailand
Published in:
· Journal
Information Sciences: an International Journal archive
Volume 176 Issue 12, June, 2006
Pages 1712-1738
Elsevier Science Inc.New York, NY, USA
table of contentsdoi>10.1016/j.ins.2005.05.010
2006 Article
· Citation Count: 8
· Downloads (cumulative): n/a
· Downloads (12 Months): n/a
· Downloads (6 Weeks): n/a

centroid-based classifiernormalizationterm weightingtext categorization

Powered by

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Centroid-Based Document Classification: Analysis & Experimental Results

Eui-Hong (Sam) Han and George Karypis
4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 424 - 431, 2000
Download Paper
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Automatic text categorization, which is the task of assigning text documents to pre-specified classes (topics or themes) of documents, is an important task that can help both in organizing as well as in finding information on these huge resources. Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, and attribute dependencies. In this paper we focus on a simple linear-time centroid-based document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. Our extensive experiments show that this centroid-based classifier consistently and substantially outperforms other algorithms such as Naive Bayesian, k-nearest-neighbors, and C4.5, on a wide range of datasets. Our analysis shows that the similarity measure used by the centroid-based scheme allows it to classify a new document based on how closely its behavior matches the behavior of the documents belonging to different classes, as measured by the average similarity between the documents. This matching allows it to dynamically adjust for classes with different densities. Furthermore, our analysis shows that the similarity measure of the centroid-based scheme accounts for dependencies between the terms in the different classes. We believe that this feature is the reason why it consistently outperforms other classifiers that cannot take these dependencies into account.
Research topics: Classification | Data mining | Text mining
Categories: 1

0 Replies to “Centroid Based Classification Essay”

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *