Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary and NLP: Improving synonymy networks ACL-IJCNLP Singapore, 7 Aug 2009
Emmanuel Navarro Franck Sajous Bruno Gaume Laurent Pr´evot ShuKai Hsieh Tzu-Yi Kuo Pierre Magistry Chu-Ren Huang
Franck Sajous
IRIT, CNRS & Univ. of Toulouse CLLE-ERSS, CNRS & Univ. of Toulouse CLLE-ERSS & IRIT, CNRS & Univ. of Toulouse LPL, CNRS & Univ. of Provence English Department, NTNU, Taiwan Graduate Institute of Linguistics, NTU, Taiwan TIGP, CLCLP, Academia Sinica, GIL, NTU, Taiwan Dept. of Chinese and Bilingual Studies, Hong Kong Poly U., Hong Kong.
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
1/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Goals
giving a method for improving synonymy networks; applying it to Wiktionary; in the meanwhile, investigate the possibilities of: using Wiktionary as a resource for NLP; using NLP for improving Wiktionary.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
2/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Summary
1
Wiktionary
2
Synonymy networks Wiktionary graph Gold standards Comparison
3
Improving Wiktionary’s network Exploiting its Small World structure Using translation links
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
3/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary as a lexical resource Lexical resources NLP requires lexical resources English: Princeton WordNet Some other languages (eg. French): non-satisfaying and/or non-free Some others: purely under-resourced
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
4/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary as a lexical resource Lexical resources NLP requires lexical resources English: Princeton WordNet Some other languages (eg. French): non-satisfaying and/or non-free Some others: purely under-resourced Wiktionary multilingual freely available → a perfect candidate? Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
4/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary: (very) short description Collaborative editing Non experts-led OK, we know but it’s worth taking a chance, remaining aware of it Articles content etymology parts of speech definitions, examples translations synonyms/antonyms hypernyms/hyponyms The ‘regular’ case, but. . . content&layout heterogeneous over languages and even within a given language Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
5/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
1
Wiktionary
2
Synonymy networks Wiktionary graph Gold standards Comparison
3
Improving Wiktionary’s network Exploiting its Small World structure Using translation links
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
6/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Wiktionary’s graph of synonymy Modeling one English entry’s POS → one vertex synonymy links among same POS wordsenses flattened edges symmetrized
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
7/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but:
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)
1 2
mukluk (N)
1
A half-boot A type of boot worn by the ancient Athenian tragic actors
A soft boot made of reindeer skin or sealskin and worn by Inuit.
kick (N)
1 2 3 4 5
A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)
1 2
mukluk (N)
1
A half-boot A type of boot worn by the ancient Athenian tragic actors
A soft boot made of reindeer skin or sealskin and worn by Inuit.
kick (N)
1 2 3 4 5
A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)
1 2
mukluk (N)
1
A half-boot A type of boot worn by the ancient Athenian tragic actors
A soft boot made of reindeer skin or sealskin and worn by Inuit.
kick (N)
1 2 3 4 5
A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)
1 2
mukluk (N)
1
A half-boot A type of boot worn by the ancient Athenian tragic actors
A soft boot made of reindeer skin or sealskin and worn by Inuit.
kick (N)
1 2 3 4 5
A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wordsenses flattened. . . Why? no underlying linguistic theory, but: constraints stem from the resource wordsenses may appear in the definitions and not in the semantic relations wordsenses of the target vertices? buskin (N)
1 2
mukluk (N)
1
A half-boot A type of boot worn by the ancient Athenian tragic actors
A soft boot made of reindeer skin or sealskin and worn by Inuit.
kick (N)
1 2 3 4 5
A hit or strike with the leg or foot The action of swinging a foot or leg Sth that tickles the fancy (Internet) The removal of a person from an online activity (figuratively) Any bucking motion of an object that lacks legs or feet
Another reason: One of our gold standard (Dicosyn) has its wordsenses flattened Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
8/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet synonymy between wordsenses relations already symmetric same POS in a given synset
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
+ using hyponymy with leave synsets containing single-words
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
+ using hyponymy with leave synsets containing single-words
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
+ using hyponymy with leave synsets containing single-words
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
+ using hyponymy with leave synsets containing single-words
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting WordNet’s synonymy network WordNet
Modeling
synonymy between wordsenses
vertices: words
relations already symmetric
edges between all words in a given synset
same POS in a given synset
+ using hyponymy with leave synsets containing single-words
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
9/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Extracting Dicosyn synonymy network
Dicosyn compilation of synonymy relations extracted from 7 dictionaries (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse and Robert) ; produced at ATILF, corrected at CRISCO lab: http://elsap1.unicaen.fr/dicosyn.html wordsenses are flattened ; network already built ; just need to be symmetrized.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
10/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ;
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ;
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW
Eg.: to throw (WordNet)
globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW
Eg.: to throw (WordNet)
globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW
Eg.: to throw (WordNet)
globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW
Eg.: to throw (WordNet)
globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Small Worlds (SW) Lexical resources are (often) SW
Eg.: to throw (WordNet)
globally sparses in edges, locally dense (high clustering) ; average path length is short ; heavy-tailed degree distribution. studying graphs’ properties shows that Wiktionary, WordNet and Dicosyn are SW → we can take advantage of SW’s characteristics!
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
11/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wiktionary FR/Dicosyn
Lexical coverage/Synonymy network
N A V
Wikt. 18017 5411 3897
Franck Sajous
DicoSyn 29372 9452 9147
Words Shared 10393 3076 2966
P 58% 57% 76%
ACL-IJCNLP - 7 Aug 2009
Exemple: Nouns synonymy network
R 35% 33% 32%
Wiktionary and NLP: Improving synonymy networks
12/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wiktionary FR/Dicosyn
Lexical coverage/Synonymy network
N A V
Wikt. 18017 5411 3897
DicoSyn 29372 9452 9147
Words Shared 10393 3076 2966
P 58% 57% 76%
Exemple: Nouns synonymy network
R 35% 33% 32%
Relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
12/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wiktionary FR/Dicosyn
Lexical coverage/Synonymy network
N A V
Wikt. 18017 5411 3897
DicoSyn 29372 9452 9147
Words Shared 10393 3076 2966
P 58% 57% 76%
Exemple: Nouns synonymy network
R 35% 33% 32%
Relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
12/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wiktionary FR/Dicosyn
Lexical coverage/Synonymy network
N A V
Wikt. 18017 5411 3897
DicoSyn 29372 9452 9147
Words Shared 10393 3076 2966
P 58% 57% 76%
Exemple: Nouns synonymy network
R 35% 33% 32%
Relations
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
12/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wiktionary FR/Dicosyn
Lexical coverage/Synonymy network Words Shared 10393 3076 2966
P 58% 57% 76%
R 35% 33% 32%
P 69% 78% 71%
R 8% 7% 4%
N A V
Wikt. 18017 5411 3897
DicoSyn 29372 9452 9147
N A V
Wikt. 3510 1300 899
Relations DicoSyn Shared 44501 3510 17404 1677 23968 1267
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Exemple: Nouns synonymy network
Wiktionary and NLP: Improving synonymy networks
12/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Wiktionary EN/WordNet
Lexical coverage/Synonymy network
N A V
Wikt. 22075 8437 6368
Words WordNet Shared 117798 14120 21479 5874 11529 5157
N A V
Wikt. 6453 3139 2667
Relations Wordnet Shared 18440 2763 12792 1314 18725 993
Franck Sajous
Exemple: Nouns synonymy network
P 64% 70% 81%
R 12% 27% 45%
P 43% 42% 37%
R 15% 10% 5%
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
13/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but:
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?
→ (we assume that) with time, recall will grow Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Wiktionary graph Gold standards Comparison
Comments. . . Gold standards, precision&recall a rough comparison, but: how meaningful are these measures (in our case)? ↔ how “perfect” are our gold standards? examples: ‘to horn’, in WN, absent from Wikt.en.N in our experiment, appeared in Wikt. in 2009 ‘to poo’ (childish), ‘to prefetch’, ‘to google’, (technical neologisms) are in Wikt and not in WN → noise? Wikt really misses some WN’s relations: ‘to act↔to play’ but are relations from Wikt and absent from WN nessecarily errors: ‘to reduce↔to decrease’, ‘to cook↔to microwave’ (all words appear in WN) → noise?
→ (we assume that) with time, recall will grow → is it possible to (automatically) measure precision? Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
14/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
1
Wiktionary
2
Synonymy networks Wiktionary graph Gold standards Comparison
3
Improving Wiktionary’s network Exploiting its Small World structure Using translation links
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
15/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood”
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Neighbours method Intuition of transitivity “Neighbours of my neighbours should be in my neighbourhood” → a neighbour of my neighbours which is not in my neighbourhood should be a good neighbour candidate.
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
16/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.”
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
17/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1
Initial Franck Sajous
1 1.00
2 0.00
3 0.00
4 0.00
ACL-IJCNLP - 7 Aug 2009
5 0.00
6 0.00
7 0.00
8 0.00
9 0.00
Wiktionary and NLP: Improving synonymy networks
17/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1
Step 1 Franck Sajous
1 0.25
2 0.25
3 0.25
ACL-IJCNLP - 7 Aug 2009
4 0.25
5 0.00
6 0.00
7 0.00
8 0.00
9 0.00
Wiktionary and NLP: Improving synonymy networks
17/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1
Step 2 Franck Sajous
1 0.22
2 0.22
3 0.15
ACL-IJCNLP - 7 Aug 2009
4 0.17
5 0.11
6 0.04
7 0.04
8 0.01
9 0.00
Wiktionary and NLP: Improving synonymy networks
17/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Prox method Prox (Gaume et al.) A stochastic method designed to study Hierarchical SW Metrics: for any 2 vertices (u, v), computes “the probability that a randomly wandering particle starting from u stands in v after k steps.” Eg. from Word1
Step 3 Franck Sajous
1 0.16
2 0.19
3 0.17
ACL-IJCNLP - 7 Aug 2009
4 0.15
5 0.11
6 0.08
7 0.06
8 0.00
9 0.02
Wiktionary and NLP: Improving synonymy networks
17/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Results fr.V
prox3 neigh random
F
R
P
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.09 0.08 0.07 0.06 0.05 0.04 0.030 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.050
Franck Sajous
2000
4000
6000
8000
10000
12000
14000
2000
4000
6000
8000
10000
12000
14000
2000
4000
6000
8000
10000
12000
14000
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
18/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Results fr.V
prox3 neigh random
F
R
P
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.09 0.08 0.07 0.06 0.05 0.04 0.030 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.050
Franck Sajous
2000
4000
6000
8000
10000
12000
14000
Comments Prox method provides (ordered) relevant links eg. ‘to absolve’↔‘to forgive’, absent from WN false positives may be intersting to consider:
2000
4000
6000
8000
10000
12000
14000
2000
4000
6000
8000
10000
12000
14000
ACL-IJCNLP - 7 Aug 2009
‘to uncover’↔‘to peel’ (hypernymy) ‘to skin’↔‘to peel’ (‘inter-domain synonymy’)
Wiktionary and NLP: Improving synonymy networks
18/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Translation links method Intuition 2 words sharing many translations in different languages are likely to be synonymous
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
19/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Translation links method Intuition
Method
2 words sharing many
let Tw be the set of a word w ’s translations
translations in
for every pair of words (w ,w 0 ):
different languages Jaccard(w , w 0 ) =
are likely to be synonymous
|Tw ∩ Tw 0 | |Tw ∪ Tw 0 |
incrementally add relations, according to the Jaccard rank, up to a given threshold
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
19/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Translation links method Intuition
Method
2 words sharing many
let Tw be the set of a word w ’s translations
translations in
for every pair of words (w ,w 0 ):
different languages Jaccard(w , w 0 ) =
are likely to be synonymous
|Tw ∩ Tw 0 | |Tw ∪ Tw 0 |
incrementally add relations, according to the Jaccard rank, up to a given threshold
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
19/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Results figure 2 (French Verb)
Jaccard random
F
R
P
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.020 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.040
Franck Sajous
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
20/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Results Comments Jaccard random
F
R
P
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.020 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.040
figure 2 (French Verb)
Franck Sajous
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
ACL-IJCNLP - 7 Aug 2009
adding first 1000 edges (+55%) → loss of only 2% precision added links are not the same as with Prox method
Wiktionary and NLP: Improving synonymy networks
20/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Exploiting its Small World structure Using translation links
Results Comments adding first 1000 edges (+55%) → loss of only 2% precision
Jaccard random
F
R
P
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.020 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.040
figure 2 (French Verb)
Franck Sajous
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
ACL-IJCNLP - 7 Aug 2009
added links are not the same as with Prox method Idea use translations method to densify the graph then use the clusters’ structure (Prox)
Wiktionary and NLP: Improving synonymy networks
20/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
21/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
21/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Conclusion Hypothesis are confirmed many missing links should be added among members of the same cluster words sharing many translations are likely to be synonymous Our methods work. . . but there is room for improvement → combine both methods should give better results Direct application support for collaborative editing → module to be included in Wiktionary’s framework? a list of synonyms, ordered by relevancy may be provided to the contributor Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
21/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Future work
Diachronic study study how wiktionaries evolve → forsee contributors’ NLP needs eg. when to apply the methods presented here
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
22/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Future work
Diachronic study study how wiktionaries evolve → forsee contributors’ NLP needs eg. when to apply the methods presented here Invariants and variability study of the (in)varibility of semantic pairings (Wiktionary as a multilingual synonymy networks) eg. house/family, child/fruit, feel/know
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
22/23
Wiktionary Synonymy networks Improving Wiktionary’s network Conclusion and future work
Thank you! Questions?
Franck Sajous
ACL-IJCNLP - 7 Aug 2009
Wiktionary and NLP: Improving synonymy networks
23/23