Improving css-KNN classification performance by shifts in training data

This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose a training data preprocessing phase that tries to alleviate the lack of learning. The idea is to compute training data modifications, such that class representative instances are optimized before the actual k-NN algorithm is employed. The empirical text classification experiments using mid-size Wikipedia data sets show that carefully crossvalidated settings of such preprocessing yields significant improvements in k-NN performance compared to classification without this step. The proposed approach can be useful for improving the effectivenes of other classifiers as well as it can find applications in domain of recommendation systems and keyword-based search

Improving css-KNN classification performance by shifts in training data / Draszawka, Karol; Szymański, Julian; Guerra, Francesco. - 9398:(2015), pp. 51-63. (Intervento presentato al convegno 1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015 tenutosi a Coimbra nel 8-9 September 2015) [10.1007/978-3-319-27932-9_5].

Improving css-KNN classification performance by shifts in training data

Draszawka, Karol;Szymański, Julian;GUERRA, Francesco

2015

Abstract

This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose a training data preprocessing phase that tries to alleviate the lack of learning. The idea is to compute training data modifications, such that class representative instances are optimized before the actual k-NN algorithm is employed. The empirical text classification experiments using mid-size Wikipedia data sets show that carefully crossvalidated settings of such preprocessing yields significant improvements in k-NN performance compared to classification without this step. The proposed approach can be useful for improving the effectivenes of other classifiers as well as it can find applications in domain of recommendation systems and keyword-based search

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2015
			
	Titolo del Convegno
	
				1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015
			
	Luogo del Convegno
	
				Coimbra
			
	Data del Convegno
	
				8-9 September 2015
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-27932-9_5
			
	Codice WoS
	
				WOS:000410802200005
			
	Codice Scopus
	
				2-s2.0-84955264674
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				9398
			
	Pagina iniziale
	
				51
			
	Pagina finale
	
				63
			
	Tutti gli autori
	
						Draszawka, Karol; Szymański, Julian; Guerra, Francesco
					
	Citazione
	
				Improving css-KNN classification performance by shifts in training data / Draszawka, Karol; Szymański, Julian; Guerra, Francesco. - 9398:(2015), pp. 51-63. (Intervento presentato al  convegno 1st COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2015 tenutosi a Coimbra nel 8-9 September 2015) [10.1007/978-3-319-27932-9_5].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1084116

Citazioni

ND

2

1

social impact