Wiktionary and NLP: Improving synonymy networks - Franck Sajous

'to poo' (childish), 'to prefetch', 'to google', (technical neologisms) are in Wikt and not ... Wikt really misses some WN's relations: 'to act↔to play'. Franck Sajous.
6MB taille 3 téléchargements 235 vues
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP Singapore, 7 Aug 2009

Emmanuel Navarro Franck Sajous Bruno Gaume Laurent Pr´evot ShuKai Hsieh Tzu-Yi Kuo Pierre Magistry Chu-Ren Huang

Franck Sajous

IRIT, CNRS & Univ. of Toulouse CLLE-ERSS, CNRS & Univ. of Toulouse CLLE-ERSS & IRIT, CNRS & Univ. of Toulouse LPL, CNRS & Univ. of Provence English Department, NTNU, Taiwan Graduate Institute of Linguistics, NTU, Taiwan TIGP, CLCLP, Academia Sinica, GIL, NTU, Taiwan Dept. of Chinese and Bilingual Studies, Hong Kong Poly U., Hong Kong.

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

1/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Goals

giving a method for improving synonymy networks; applying it to Wiktionary; in the meanwhile, investigate the possibilities of: using Wiktionary as a resource for NLP; using NLP for improving Wiktionary.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

2/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Summary

1

Wiktionary

2

Synonymy networks Wiktionary graph Gold standards Comparison

3

Improving Wiktionary’s network Exploiting its Small World structure Using translation links

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

3/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary as a lexical resource Lexical resources NLP requires lexical resources English: Princeton WordNet Some other languages (eg. French): non-satisfaying and/or non-free Some others: purely under-resourced

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

4/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary as a lexical resource Lexical resources NLP requires lexical resources English: Princeton WordNet Some other languages (eg. French): non-satisfaying and/or non-free Some others: purely under-resourced Wiktionary multilingual freely available → a perfect candidate? Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

4/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms The ‘regular’ case, but. . . content&layout heterogeneous over languages and even within a given language Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

5/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

1

Wiktionary

2

Synonymy networks Wiktionary graph Gold standards Comparison

3

Improving Wiktionary’s network Exploiting its Small World structure Using translation links

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

6/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

7/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but:

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)

1 2

mukluk (N)

1

A half-boot A type of boot worn by the ancient Athenian tragic actors

A soft boot made of reindeer skin or sealskin and worn by Inuit.

kick (N)

1 2 3 4 5

A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)

1 2

mukluk (N)

1

A half-boot A type of boot worn by the ancient Athenian tragic actors

A soft boot made of reindeer skin or sealskin and worn by Inuit.

kick (N)

1 2 3 4 5

A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)

1 2

mukluk (N)

1

A half-boot A type of boot worn by the ancient Athenian tragic actors

A soft boot made of reindeer skin or sealskin and worn by Inuit.

kick (N)

1 2 3 4 5

A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)

1 2

mukluk (N)

1

A half-boot A type of boot worn by the ancient Athenian tragic actors

A soft boot made of reindeer skin or sealskin and worn by Inuit.

kick (N)

1 2 3 4 5

A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)

1 2

mukluk (N)

1

A half-boot A type of boot worn by the ancient Athenian tragic actors

A soft boot made of reindeer skin or sealskin and worn by Inuit.

kick (N)

1 2 3 4 5

A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet

Another reason: One of our gold standard (Dicosyn) has its wordsenses flattened Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

8/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet synonymy between wordsenses relations already symmetric same POS in a given synset

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

+ using hyponymy with leave synsets containing single-words

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

+ using hyponymy with leave synsets containing single-words

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

+ using hyponymy with leave synsets containing single-words

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

+ using hyponymy with leave synsets containing single-words

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting WordNet’s synonymy network WordNet

Modeling

synonymy between wordsenses

vertices: words

relations already symmetric

edges between all words in a given synset

same POS in a given synset

+ using hyponymy with leave synsets containing single-words

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

9/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Extracting Dicosyn synonymy network

Dicosyn compilation of synonymy relations extracted from 7 dictionaries (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse and Robert) ; produced at ATILF, corrected at CRISCO lab: http://elsap1.unicaen.fr/dicosyn.html wordsenses are flattened ; network already built ; just need to be symmetrized.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

10/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ;

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ;

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW

Eg.: to throw (WordNet)

globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW

Eg.: to throw (WordNet)

globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW

Eg.: to throw (WordNet)

globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW

Eg.: to throw (WordNet)

globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Small Worlds (SW) Lexical resources are (often) SW

Eg.: to throw (WordNet)

globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW → we can take advantage of SW’s characteristics!

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

11/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

N A V

Wikt. 18017 5411 3897

Franck Sajous

DicoSyn 29372 9452 9147

Words Shared 10393 3076 2966

P 58% 57% 76%

ACL-IJCNLP - 7 Aug 2009

Exemple: Nouns synonymy network

R 35% 33% 32%

Wiktionary and NLP: Improving synonymy networks

12/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

N A V

Wikt. 18017 5411 3897

DicoSyn 29372 9452 9147

Words Shared 10393 3076 2966

P 58% 57% 76%

Exemple: Nouns synonymy network

R 35% 33% 32%

Relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

12/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

N A V

Wikt. 18017 5411 3897

DicoSyn 29372 9452 9147

Words Shared 10393 3076 2966

P 58% 57% 76%

Exemple: Nouns synonymy network

R 35% 33% 32%

Relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

12/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network

N A V

Wikt. 18017 5411 3897

DicoSyn 29372 9452 9147

Words Shared 10393 3076 2966

P 58% 57% 76%

Exemple: Nouns synonymy network

R 35% 33% 32%

Relations

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

12/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wiktionary FR/Dicosyn

Lexical coverage/Synonymy network Words Shared 10393 3076 2966

P 58% 57% 76%

R 35% 33% 32%

P 69% 78% 71%

R 8% 7% 4%

N A V

Wikt. 18017 5411 3897

DicoSyn 29372 9452 9147

N A V

Wikt. 3510 1300 899

Relations DicoSyn Shared 44501 3510 17404 1677 23968 1267

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Exemple: Nouns synonymy network

Wiktionary and NLP: Improving synonymy networks

12/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Wiktionary EN/WordNet

Lexical coverage/Synonymy network

N A V

Wikt. 22075 8437 6368

Words WordNet Shared 117798 14120 21479 5874 11529 5157

N A V

Wikt. 6453 3139 2667

Relations Wordnet Shared 18440 2763 12792 1314 18725 993

Franck Sajous

Exemple: Nouns synonymy network

P 64% 70% 81%

R 12% 27% 45%

P 43% 42% 37%

R 15% 10% 5%

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

13/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but:

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?

→ (we assume that) with time, recall will grow Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Wiktionary graph Gold standards Comparison

Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?

→ (we assume that) with time, recall will grow → is it possible to (automatically) measure precision? Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

14/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

1

Wiktionary

2

Synonymy networks Wiktionary graph Gold standards Comparison

3

Improving Wiktionary’s network Exploiting its Small World structure Using translation links

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

15/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood”

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

16/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

17/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1

Initial Franck Sajous

1 1.00

2 0.00

3 0.00

4 0.00

ACL-IJCNLP - 7 Aug 2009

5 0.00

6 0.00

7 0.00

8 0.00

9 0.00

Wiktionary and NLP: Improving synonymy networks

17/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1

Step 1 Franck Sajous

1 0.25

2 0.25

3 0.25

ACL-IJCNLP - 7 Aug 2009

4 0.25

5 0.00

6 0.00

7 0.00

8 0.00

9 0.00

Wiktionary and NLP: Improving synonymy networks

17/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1

Step 2 Franck Sajous

1 0.22

2 0.22

3 0.15

ACL-IJCNLP - 7 Aug 2009

4 0.17

5 0.11

6 0.04

7 0.04

8 0.01

9 0.00

Wiktionary and NLP: Improving synonymy networks

17/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1

Step 3 Franck Sajous

1 0.16

2 0.19

3 0.17

ACL-IJCNLP - 7 Aug 2009

4 0.15

5 0.11

6 0.08

7 0.06

8 0.00

9 0.02

Wiktionary and NLP: Improving synonymy networks

17/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Results fr.V

prox3 neigh random

F

R

P

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.09 0.08 0.07 0.06 0.05 0.04 0.030 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.050

Franck Sajous

2000

4000

6000

8000

10000

12000

14000

2000

4000

6000

8000

10000

12000

14000

2000

4000

6000

8000

10000

12000

14000

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

18/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Results fr.V

prox3 neigh random

F

R

P

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.09 0.08 0.07 0.06 0.05 0.04 0.030 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.050

Franck Sajous

2000

4000

6000

8000

10000

12000

14000

Comments Prox method provides (ordered) relevant links eg. ‘to absolve’↔‘to forgive’, absent from WN false positives may be intersting to consider:

2000

4000

6000

8000

10000

12000

14000

2000

4000

6000

8000

10000

12000

14000

ACL-IJCNLP - 7 Aug 2009

‘to uncover’↔‘to peel’ (hypernymy) ‘to skin’↔‘to peel’ (‘inter-domain synonymy’)

Wiktionary and NLP: Improving synonymy networks

18/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Translation links method Intuition 2 words sharing many translations in different languages are likely to be synonymous

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

19/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Translation links method Intuition

Method

2 words sharing many

let Tw be the set of a word w ’s translations

translations in

for every pair of words (w ,w 0 ):

different languages Jaccard(w , w 0 ) =

are likely to be synonymous

|Tw ∩ Tw 0 | |Tw ∪ Tw 0 |

incrementally add relations, according to the Jaccard rank, up to a given threshold

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

19/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Translation links method Intuition

Method

2 words sharing many

let Tw be the set of a word w ’s translations

translations in

for every pair of words (w ,w 0 ):

different languages Jaccard(w , w 0 ) =

are likely to be synonymous

|Tw ∩ Tw 0 | |Tw ∪ Tw 0 |

incrementally add relations, according to the Jaccard rank, up to a given threshold

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

19/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Results figure 2 (French Verb)

Jaccard random

F

R

P

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.020 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.040

Franck Sajous

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

20/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Results Comments Jaccard random

F

R

P

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.020 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.040

figure 2 (French Verb)

Franck Sajous

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

ACL-IJCNLP - 7 Aug 2009

adding first 1000 edges (+55%) → loss of only 2% precision added links are not the same as with Prox method

Wiktionary and NLP: Improving synonymy networks

20/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Exploiting its Small World structure Using translation links

Results Comments adding first 1000 edges (+55%) → loss of only 2% precision

Jaccard random

F

R

P

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.020 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.040

figure 2 (French Verb)

Franck Sajous

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

ACL-IJCNLP - 7 Aug 2009

added links are not the same as with Prox method Idea use translations method to densify the graph then use the clusters’ structure (Prox)

Wiktionary and NLP: Improving synonymy networks

20/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

21/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

21/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results Direct application support for collaborative editing → module to be included in Wiktionary’s framework? a list of synonyms, ordered by relevancy may be provided to the contributor Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

21/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Future work

Diachronic study study how wiktionaries evolve → forsee contributors’ NLP needs eg. when to apply the methods presented here

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

22/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Future work

Diachronic study study how wiktionaries evolve → forsee contributors’ NLP needs eg. when to apply the methods presented here Invariants and variability study of the (in)varibility of semantic pairings (Wiktionary as a multilingual synonymy networks) eg. house/family, child/fruit, feel/know

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

22/23

Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work

Thank you! Questions?

Franck Sajous

ACL-IJCNLP - 7 Aug 2009

Wiktionary and NLP: Improving synonymy networks

23/23