User:Alvations/multilingual and crosslingual WSD

Multilingual and Crosslingual Word Sense Disambiguation (WSD) evaluation tasks focused on WSD across 2 or more languages simultaneously. While the Multlingual WSD evaluation task^[1] uses a fixed sense inventory (i.e. BabelNet), the sense inventory for the Crosslingual WSD evaluation task^[2] is built up on the basis of parallel corpora, e.g. the Europarl corpus.

Multilingual WSD

The Multilingual WSD task was introduced for the upcoming SemEval-2013 workshop. The task is aimed at evaluating Word Sense Disambiguation systems in a multilingual scenario using BabelNet as its sense inventory. Unlike similar task like crosslingual WSD or the multlingual lexical substitution task, where no fixed sense inventory is specified, Multilingual WSD uses the BabelNet as its sense inventory. Prior to the development of BabelNet, a bilingual lexical sample WSD evaluation task was carried out in SemEval-2007 on Chinese-English bitexts^[3].

The multlingual WSD task follows the all-word version of classic WSD, where participating systems will be expected to link all occurrences of noun phrases within arbitrary texts in different languages to their corresponding Babel synsets^[4].

The evaluation criterion for the multlingual WSD task follows the standard precision, recall and F1 measures similar to the evaluation for classic WSD.

BabelNet

BabelNet is a very large multilingual semantic network with millions of concepts obtained from:

an integration of WordNet and Wikipedia based on an automatic mapping algorithm and
translations of the concepts (i.e. English Wikipedia pages and WordNet synsets) based on Wikipedia cross-language links and the output of a machine translation system^[5]

An example of a sense label in BabelNet is as followed:

Target polysemous English word: bank 

Occurs in the phrase/sentence: "the bank of Scotland"


Princeton WordNet(3.0)^[6] synset (not necessarily used in the task): 
{08420278-n} | depository financial institution


BabelNet(1.0) synset: 
{bn:00008364n} depository_financial_institution 

ES:banco, CA:banc, IT:banca, DE:bank, FR:banque

Crosslingual WSD

The Crosslingual WSD task was introduced in the SemEval-2007 evaluation workshop and re-proposed in the upcoming SemEval-2013 workshop. To facilitate the ease of integrating WSD systems into other Natural Language Processing (NLP) applications, such as Machine Translation and multilingual Information Retrieval, the crosslingual WSD evaluation task was introduced a language-indepedent and knowledge-lean approach to WSD.

The task is an unsupervised Word Sense Disambiguation task for English nouns by means of parallel corpora. It follows the lexical-sample variant of the Classic WSD task, restricted to only 20 polysemous nouns.

The evaluation criterion uses a weighted version of the precision and recall metric inspired by the English lexical subsitution task in SemEval-2010^[7].

Europarl Sense Inventory

Participating systems in this evaluation task will use the Europarl corpus for building up the sense inventory. Then systems will perform WSD on polysemous English words based on that sense inventory. For evaluation, a sense inventory for all target nouns was manually built up on the basis of all retrieved translations from the Europarl corpus. All translations of a polysemous English word are grouped into clusters of that given word.

An example of a sense label in the Europarl sense inventory is as followed:

Target polysemous English word: bank 

Occurs in the phrase/sentence: "the bank of Scotland"


Princeton WordNet(3.0)^[8] synset (not necessarily used in the task): 
{08420278-n} depository financial institution


Europarl sense invntory synset {Dutch, French, German, Italian, Spanish}: 
{bank/kredietinstelling, banque/établissement de crédit, Bank/Kreditinstitut, banca, banco}

References

^ http://www.cs.york.ac.uk/semeval-2013/task12/index.php?id=task-description
^ http://www.cs.york.ac.uk/semeval-2013/task10/
^ Peng Jin , Yunfang Wu and Shiwen Yu. SemEval-2007 task 05: multilingual Chinese-English lexical sample. Proceedings of the 4th International Workshop on Semantic Evaluations, p.19-23, June 23-24, 2007, Prague, Czech Republic.
^ http://www.cs.york.ac.uk/semeval-2013/task12/index.php?id=task-description
^ Roberto Navigli & Simone Paolo Ponzetto. BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11-16 July 2010, pp. 216-225.
^ Note: the latest version of Princeton WordNet (3.1) uses the synset ID {08437235-n} instead of {08420278-n}
^ Ravi Sinha , Diana McCarthy and Rada Mihalcea. SemEval-2010 task 2: cross-lingual lexical substitution. Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, June 04-04, 2009, Boulder, Colorado.
^ Note: the latest version of Princeton WordNet (3.1) uses the synset ID {08437235-n} instead of {08420278-n}

This computing article is a stub. You can help Wikipedia by expanding it.

This linguistics article is a stub. You can help Wikipedia by expanding it.

Category:Computational linguistics Category:Natural language processing Category:Semantics

[1] ttp://www.cs.york.ac.uk/semeval-2013/task12/index.php?id=task-description

[2] ttp://www.cs.york.ac.uk/semeval-2013/task10/

[3] Peng Jin , Yunfang Wu and Shiwen Yu. SemEval-2007 task 05: multilingual Chinese-English lexical sample. Proceedings of the 4th International Workshop on Semantic Evaluations, p.19-23, June 23-24, 2007, Prague, Czech Republic.

[4] ttp://www.cs.york.ac.uk/semeval-2013/task12/index.php?id=task-description

[5] Roberto Navigli & Simone Paolo Ponzetto. BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11-16 July 2010, pp. 216-225.

[6] Note: the latest version of Princeton WordNet (3.1) uses the synset ID {08437235-n} instead of {08420278-n}

[7] Ravi Sinha , Diana McCarthy and Rada Mihalcea. SemEval-2010 task 2: cross-lingual lexical substitution. Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, June 04-04, 2009, Boulder, Colorado.

[8] Note: the latest version of Princeton WordNet (3.1) uses the synset ID {08437235-n} instead of {08420278-n}

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]