Intensive use of lexicon and corpus for WSD

  1. Martí Antonín, María Antonia
  2. Vázquez Pérez, Sonia
  3. Montoyo Guijarro, Andrés
  4. Nica, Iulia
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2004

Issue: 33

Pages: 147-154

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

The paper addresses the issue of how to use linguistic information in Word Sense Disambiguation (WSD). We introduce a knowledge-driven and unsupervised WSD method that requires only a large corpus previously tagged with POS and very little grammatical knowledge. The WSD process is performed taking into account the syntactic patterns in which the ambiguous occurrence appears, relaying in the hypothesis of "almost one sense per syntactic pattern". This integration allows us to obtain, from corpora, paradigmatic and syntagmatic information related to the ambiguous occurrence. We also use variants of EWN information for word senses and different WSD algorithms. We report the results obtained when applying the method on the Spanish lexical sample task in Senseval-2. This methodology is easily transportable to other languages.