The use of word senses in place of surface word forms has been shown to improve performance on many computational tasks, including intelligent web search. In this paper we propose a novel approach to automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Almost all the WSI approaches proposed in the literature dealt with monolingual data and only very few proposals incorporate bilingual data. The WSI method we propose is innovative as use multi-lingual data to perform WSI of words in a given language. The experiments show a clear overall improvement of the performance: the single-language setting is outperformed by the multi-language settings on almost all the considered target words. The performance gain, in terms of F-Measure, has an average value of 5% and in some cases it reaches 40%.
Word Sense Induction with Multilingual Features Representation / Lorenzo, Albano; Beneventano, Domenico; Bergamaschi, Sonia. - STAMPA. - 2:(2014), pp. 343-349. (Intervento presentato al convegno International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM tenutosi a Warsaw, Poland nel 11–14 August 2014) [10.1109/WI-IAT.2014.117].
Word Sense Induction with Multilingual Features Representation
BENEVENTANO, Domenico;BERGAMASCHI, Sonia
2014
Abstract
The use of word senses in place of surface word forms has been shown to improve performance on many computational tasks, including intelligent web search. In this paper we propose a novel approach to automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Almost all the WSI approaches proposed in the literature dealt with monolingual data and only very few proposals incorporate bilingual data. The WSI method we propose is innovative as use multi-lingual data to perform WSI of words in a given language. The experiments show a clear overall improvement of the performance: the single-language setting is outperformed by the multi-language settings on almost all the considered target words. The performance gain, in terms of F-Measure, has an average value of 5% and in some cases it reaches 40%.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris