Lesson 10 Deep learning for NLP: MulDlingual ... - Nikolaos Pappas

Dec 15, 2016 - corpus (600k docs, 8 langs). 80 ... "Google's Neural Machine TranslaDon System: Bridging the Gap .... h ps://code.google.com/p/word2vec/.
10MB taille 1 téléchargements 241 vues
Human Language Technology: Applica6on to Informa6on Access

Lesson 10 Deep learning for NLP: Mul6lingual Word Sequence Modeling December 15, 2016

EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins6tute, Mar6gny

Outline of the talk 1. Recap: Word Representa6on Learning 2. Mul6lingual Word Representa6ons • Alignment models • Evalua6on tasks 3. Mul6lingual Word Sequence Modeling • Essen6als: RNN, LSTM, GRU • Machine Transla6on • Document Classifica6on 4. Summary

* Figure from Lebret's thesis, EPFL, 2016

Nikolaos Pappas

2 /88

Disclaimer •

• •

Research highlights rather than in-depth analysis • By no means exhaus6ve (progress too fast!) • Tried to keep most representa6ves Focus on feature learning and two major NLP tasks Not enough 6me to cover other exci6ng tasks: • Ques6on answering • Rela6on classifica6on • Paraphrase detec6on • Summariza6on Nikolaos Pappas

3 /88

Recap: Learning word representa6ons from text •

Why should we care about them? • tackles curse of dimensionality • captures seman6c and analogy rela6ons of words • captures general knowledge in an unsupervised way king - man + woman ≈ queen

Nikolaos Pappas

4 /88

Recap: Learning word representa6ons from text •

How can we benefit from them? • study linguis6c proper6es of words • inject general knowledge on downstream tasks • transfer knowledge across languages or modali6es • compose representa6ons of word sequences

Nikolaos Pappas

5 /88

Recap: Learning word representa6ons from text •

Which method to use for learning them? • neural versus count-based methods ➡ neural ones implicitly do SVD over a PMI matrix ➡ similar to count-based when using the same tricks • neural methods appear to have the edge (word2vec) ➡ efficient and scalable objec6ve + toolkit ➡ intui6ve formula6on (=predict words in context)

Nikolaos Pappas

6 /88

Recap: Con6nuous Bag-ofWords (CBOW)

Nikolaos Pappas

7 /88

Recap: Con6nuous Bag-ofWords (CBOW)

Nikolaos Pappas

8 /88

Recap: Learning word representa6ons from text •

What else can we do with word embeddings? • • • • •

dependency-based embeddings: Levy and Goldberg 2014 retrofijed-to-lexicons embeddings: Faruqui et al. 2014 sense-aware embeddings: Li and Jurafsky 2015 visually-grounded embeddings: Lazaridou et al. 2015 mul6lingual embeddings: Gouws et al 2015

Nikolaos Pappas

9 /88

Outline of the talk 1. Recap: Word Representa6on Learning 2. Mul6lingual Word Representa6ons • Alignment models • Evalua6on tasks 3. Mul6lingual Word Sequence Modeling • Essen6als: RNN, LSTM, GRU • Machine Transla6on • Document Classifica6on 4. Summary

* Figure from Gouts et al., 2015.

Nikolaos Pappas

/88 10

Learning cross-lingual word representa6ons •

Monolingual embeddings capture seman6c, syntac6c and analogy rela6ons between words



Goal: capture this rela6onships two or more languages

* Figure from Gouts et al., 2015.

Nikolaos Pappas

/88 11

Supervision of cross-lingual alignment methods •

Parallel sentences for MT: Guo et al., 2015 Sentence by sentence and word alignments



Parallel sentences: Gouws et al., 2015 Sentence by sentence alignments



Parallel documents: Søgaard et al., 2015 Documents with topic or label alignments



Bilingual dicHonary: Ammar et al., 2016 Word by word transla6ons



No parallel data: Faruqui and Dyer, 2014 Really! Nikolaos Pappas

high

Annotation cost

low /88 12

Cross-lingual alignment with no parallel data •



Nikolaos Pappas

/88 13

Cross-lingual alignment with parallel sentences •



Nikolaos Pappas

/88 14

Cross-lingual alignment with parallel sentences

(Gows et al., 2016) Nikolaos Pappas

/88 15

Cross-lingual alignment with parallel sentences for MT

Nikolaos Pappas

/88 16

Unified framework for analysis of cross-lingual methods •

Minimize monolingual objec6ve



Constraint/Regularize with bilingual objec6ve

Nikolaos Pappas

/88 17

Evalua6on: Cross-lingual document classifica6on and transla6on

(Gows et al., 2015)

Nikolaos Pappas

/88 18

Bonus: Mul6lingual visual sen6ment concept matching concept = adjec6ve-noun-phrase (ANP)

(Pappas et al., 2016)

Nikolaos Pappas

/88 19

Mul6lingual visual sen6ment concept ontology

(Jou et al., 2015)

Nikolaos Pappas

/88 20

Word embedding model

(Pappas et al., 2016)

Nikolaos Pappas

/88 21

Mul6lingual visual sen6ment concept retrieval

(Pappas et al., 2016)

Nikolaos Pappas

/88 22

Mul6lingual visual sen6ment concept clustering

(Pappas et al., 2016)

Nikolaos Pappas

/88 23

Mul6lingual visual sen6ment concept clustering

(Pappas et al., 2016)

Nikolaos Pappas

/88 24

Discovering interes6ng clusters: Mul6lingual

(Pappas et al., 2016)

Nikolaos Pappas

/88 25

Discovering interes6ng clusters: Western vs. Eastern

(Pappas et al., 2016) (Pappas et al., 2016)

Nikolaos Pappas

/88 26

Discovering interes6ng clusters: Monolingual

(Pappas et al., 2016)

Nikolaos Pappas

/88 27

Evalua6on: Mul6lingual visual sen6ment concept analysis •

Aligned embeddings are bejer than transla6on in concept retrieval, clustering and sen6ment predic6on

Nikolaos Pappas

/88 28

Conclusion •

Aligned embeddings are cheaper than transla6on and usually work bejer than it in several mul6lingual or crosslingual NLP tasks without parallel data •

document classifica6on Gows et al., 2015



named en6ty recogni6on Al-Rfou et al., 2014



dependency parsing Guo et al., 2015



concept retrieval and clustering Pappas et al., 2016

Nikolaos Pappas

/88 29

Outline of the talk 1. Recap: Word Representa6on Learning 2. Mul6lingual Word Representa6ons • Alignment models • Evalua6on tasks 3. Mul6lingual Word Sequence Modeling • Essen6als: RNN, LSTM, GRU • Machine Transla6on • Document Classifica6on * Figure from Colah’s blog, 2015.

4. Summary Nikolaos Pappas

/88 30

Language Modeling •

Computes the probability of a sequence of words or simply “likelihood of a text”: P(w1, w2, …, wt)



N-gram models with Markov assump6on:



Where is it useful? • speech recogni6on • machine transla6on • POS tagging and parsing



Nikolaos Pappas

What are its limitaHons? • unrealis6c assump6on • huge memory needs • back-off models /88 31

Recurrent Neural Network (RNN) •

Neural language model:



What are its main limitaHons? • vanishing gradient problem (error doesn’t propagate far) • fail to capture long-term dependencies • tricks: gradient clipping, iden6ty ini6aliza6on + ReLus Nikolaos Pappas

/88 32

Long Short Term Memory (LSTM) •

Long-short term memory nets are able to learn longterm dependencies: Hochreiter and Schmidhuber 1997 Simple RNN:

* Figure from Colah’s blog, 2015.

Nikolaos Pappas

/88 33

Long Short Term Memory (LSTM) •

Long-short term memory nets are able to learn longterm dependencies: Hochreiter and Schmidhuber 1997 • Ability to remove or add informa6on to the cell state regulated by “gates”

* Figure from Colah’s blog, 2015.

Nikolaos Pappas

/88 34

Gated Recurrent Unit (GRU) •

Gated RNN by Chung et al, 2014 combines the forget and input gates into a single “update gate” • keep memories to capture long-term dependencies • allow error messages to flow at different strengths

* Figure from Colah’s blog, 2015.

zt: update gate — rt: reset gate — ht: regular RNN update Nikolaos Pappas

/88 35

Deep Bidirec6onal Models •

Here RNN but it applies to LSTMs and GRUs too

(Irsoy and Cardie, 2014)

Nikolaos Pappas

/88 36

Convolu6onal Neural Network (CNN) • •



Typically good for images Convolu6onal filter(s) is (are) applied every k words: Similar to Recursive NNs but without constraining to gramma6cal phrases only, as Socher et al., 2011 • no need for a parser (!) • less linguis6cally mo6vated ?

(Collobert et al., 2011) (Kim, 2014) Nikolaos Pappas

/88 37

Hierarchical Models •

Word-level and sentence-level modeling with any type of NN layers

(Tang et al., 2015) Nikolaos Pappas

/88 38

Ajen6on Mechanism for Machine Transla6on •

Chooses “where to look” or learns to assign a relevance to each input posi6on given encoder hidden state for that posi6on and the previous decoder state •

learns a sou bilingual alignment model

(Bahdanau et al., 2015) Nikolaos Pappas

39 /59

Ajen6on Mechanism for Document Classifica6on • •

Operates on input word sequence (or intermediate hidden states: Pappas and Popescu-Belis 2016) Learns to focus on relevant parts of the input with respect to the target labels • learns a sou extrac6ve summariza6on model

(Pappas and Popescu-Belis, 2014) Nikolaos Pappas

40 /59

Outline of the talk 1. Recap: Word Representa6on Learning 2. Mul6lingual Word Representa6ons • Alignment models • Evalua6on tasks 3. Mul6lingual Word Sequence Modeling • Essen6als: RNN, LSTM, GRU • Machine Transla6on • Document Classifica6on * Figure from Colah’s blog, 2015.

4. Summary Nikolaos Pappas

/88 41

RNN encoder-decoder for Machine Transla6on • •



GRU as hidden layer Maximize the log likelihood of the target sequence given the source sequence: WMT 2014 (EN→FR)

(Cho et al., 2014) Nikolaos Pappas

/88 42

Sequence to sequence learning for Machine Transla6on

• •

LSTM hidden layers instead of GRU 4 layers deep instead of shallow encoder-decoder (Sutskever et al., 2014) Nikolaos Pappas

/88 43

Sequence to sequence learning for Machine Transla6on •

WMT 2014 (EN→FR)



PCA projec6on of the hidden state of the last encoder layer

(Sutskever et al., 2014) Nikolaos Pappas

/88 44

Jointly learning to align and translate for Machine Transla6on •



LimitaHon: can we compress all the needed informa6on in the last encoder state? Idea: use all the hidden states of the encoder • length propor6onal to that of the sentence! • compute a weighted average of all the hidden states (Bahdanau et al., 2015)

Nikolaos Pappas

/88 45

Jointly learning to align and translate for Machine Transla6on •

WMT 2014 (EN→FR)

(Bahdanau et al., 2015) Nikolaos Pappas

/88 46

Effec6ve approaches to ajen6on-based NMT • • •

Global and local ajen6on Input-feeding approach Stacked LSTM instead of single-layer

(Luong et al., 2015) Nikolaos Pappas

/88 47

Mul6-source NMT • • •

Train p(e|f, g) model directly on trilingual data Use it to decode e given any (f, g) pair Take local-ajen6on NMT model and concatenate context from mul6ple sources (Zoph and Knight, 2016) Nikolaos Pappas

/88 48

Mul6-source NMT •

Mul6-source training improves over individual French English and German English pairs • Best: basic concatena6on with ajen6on

(Zoph and Knight, 2016) Nikolaos Pappas

/88 49

Mul6-source NMT •

Mul6-source training improves over individual French English and German English pairs • Best: basic concatena6on with ajen6on

(Zoph and Knight, 2016) Nikolaos Pappas

/88 50

Mul6-target NMT •

Mul6-task learning framework for mul6ple target language transla6on • Op6miza6on for one to many model

(Dong et al., 2015) Nikolaos Pappas

/88 51

Mul6-target NMT •



Improves over NMT and moses baselines over WMT 2013 test • but also on larger datasets Faster and bejer convergence in mul6ple language transla6on (Dong et al., 2015) Nikolaos Pappas

/88 52

Mul6-way, Mul6lingual NMT •



Encoder-decoder model with mul6ple encoders and decoders shared across pairs • share knowledge across langs • universal space for all langs • good for low-resource langs Ajen6on is pair specific, hence expensive O(L^2) • instead share ajen6on across all pairs!

Figure: n_th encoder and m_th decoder at 6mestep t / φ makes encoder & decoder states compa6ble with the ajen6on mechanism / f_adp makes context vector compa6ble with the decoder → all these transforma6ons to support different types of encoders/decoders for different languages! (Firat et al., 2016) Nikolaos Pappas

/88 53

Mul6-way, Mul6lingual NMT •



Consistent improvements for lowresource languages • the lower the training data the bigger the improvement In large-scale translaHon improves only translaHon to English • hypothesis: EN appears always as source or target language for all pairs → bejer decoder ?

(Firat et al., 2016) Nikolaos Pappas

/88 54

Mul6-way, Mul6lingual NMT •



Consistent improvements for lowresource languages • the lower the training data the bigger the improvement In large-scale translaHon improves only translaHon to English • hypothesis: EN appears always as source or target language for all pairs → bejer decoder ?

(Firat et al., 2016) Nikolaos Pappas

/88 55

Google’s Neural Machine Transla6on System “Monster” •

An encoder, a decoder and an ajen6on network • Plus 8-layer deep with residual connec6ons • Plus refinement with Reinforcement Learning • Plus sub-word units…Plus….

(Wu et al., 2016) Nikolaos Pappas

/88 56

Google’s Neural Machine Transla6on System “Monster” •

EN->FR training takes 6 days on 96GPUS !!!! and 3 more days for refinement...

(Wu et al., 2016) Nikolaos Pappas

/88 57

Future of NMT and other possibili6es •



MulH-task learning: Training mul6ple pairs of languages jointly and with other tasks → Image cap6oning, Speech recogni6on !

Larger context: Modeling larger sequences than sentences as in document classifica6on will be key • understanding long-term dependencies • leveraging structural informa6on of the input • being able to reason over it to solve any task → Effec6ve Ajen6on / Memory? (Luong, Cho, Manning tutorial, 2016) Nikolaos Pappas

/88 58

Outline of the talk 1. Recap: Word Representa6on Learning 2. Mul6lingual Word Representa6ons • Alignment models • Evalua6on tasks 3. Mul6lingual Word Sequence Modeling • Essen6als: RNN, LSTM, GRU • Machine Transla6on • Document Classifica6on * Figure from Colah’s blog, 2015.

4. Summary Nikolaos Pappas

/88 59

Paragraph vectors for Document Classifica6on





Learning vectors of paragraphs inspired by word2vec • trained without supervision on a large corpus • preferably similar domain as the target Two methods: with or without word ordering

(Le et al., 2014) Nikolaos Pappas

/88 60

Paragraph vectors for Document Classifica6on • •

Learned paragraph vectors + logis6c regression Outperformed previous method on sentence-level and document-level sen6ment classifica6on

(Le et al., 2014) Nikolaos Pappas

/88 61

Convolu6onal neural network for Document Classifica6on • •

Used mul6ple filter widths Dropout regulariza6on (randomly dropping por6on of hidden units during back-propaga6on)

(Kim et al., 2014) Nikolaos Pappas

/88 62

Convolu6onal neural network for Document Classifica6on



Not all baseline methods used drop-out though (Kim et al., 2014) Nikolaos Pappas

/88 63

Modeling and Summarizing Documents with a Convolu6onal Network •

Similar to Kim et al, 2014 however different • K-max pooling instead of max pooling • Two layers of convolu6ons

(Denil et al., 2014) Nikolaos Pappas

/88 64

Modeling and Summarizing Documents with a Convolu6onal Network

(Denil et al., 2014) Nikolaos Pappas

/88 65

Modeling and Summarizing Documents with a Convolu6onal Network

(Denil et al., 2014) Nikolaos Pappas

/88 66

Modeling and Summarizing Documents with a Convolu6onal Network

(Denil et al., 2014) Nikolaos Pappas

/88 67

Gated recurrent neural network for Document Classifica6on

(Tang et al., 2015) Nikolaos Pappas

/88 68

Gated recurrent neural network for Document Classifica6on

(Tang et al., 2015) Nikolaos Pappas

/88 69

Standard Pipeline for Document Classifica6on • •

Feature engineering: BOW, n-grams, topic models, etc. Feature learning: auto-encoders, convolu6onal, recurrent, recursive NNs

(Pappas and Popescu-Belis, 2014) Nikolaos Pappas

/88 70

Mul6ple-instance Learning for Document Classifica6on

(Pappas and Popescu-Belis, 2014) Nikolaos Pappas

/88 71

How to combine vectors? Structural assump6ons

(Pappas and Popescu-Belis, 2014) Nikolaos Pappas

/88 72

Joint learning of an instance relevance mechanism and a classifier

(Pappas and Popescu-Belis, 2014) Nikolaos Pappas

/88 73

Joint differen6able objec6ve for solving with SGD

(Pappas and Popescu-Belis, 2014) Nikolaos Pappas

/88 74

Observa6ons on aspect ra6ng predic6on •

• •

The proposed mechanism is superior than alterna6ves • all text regions are useful but to a different extent Benefit regardless of the input features used Reaches state-of-the-art without using: • structured output learning • segmented text (Pappas and Popescu-Belis, 2014)

Nikolaos Pappas

/88 75

• • •

Comparison with neural network models

This mechanism can be used as a parametric pooling func6on of NNs • opera6ng on intermediate hidden states Works bejer than Dense, GRU neural methods + average pooling Outperforms RCNN and uses far less parameters

(Pappas and Popescu-Belis, 2016) Nikolaos Pappas

/88 76

Hierarchical ajen6on networks for Document Classifica6on •

Very similar hierarchical structure as Tang et al., 2015 except average pooling • ajen6on mechanism at the word and document levels

(Yang et al., 2016) Nikolaos Pappas

/88 77

Hierarchical ajen6on networks for Document Classifica6on

(Yang et al., 2016) Nikolaos Pappas

/88 78

Reflec6ons on Mul6lingual Document Classifica6on •

What are the present limitaHons? • Current evalua6on datasets contain small number of target classes and examples • RCV1/RCV2 → 6,000 documents, 2 langs, 4 labels • TED corpus → 12,078 documents, 12 langs, 15 labels • Requires the labels to be common across languages • Data are not enough to train SOA neural architectures



ObservaHon: currently there are several domains which support mul6ple languages but only monolingual classifica6on is possible Nikolaos Pappas

/88 79

New dataset: Deutsche Welle corpus (600k docs, 8 langs)

Nikolaos Pappas

/88 80

Conclusion • •





Mul6lingual word embeddings are useful for tasks where there is lack of parallel data Word sequence modeling is advancing quickly with the establishment of neural methods • Machine Transla6on • Document Classifica6on MulHlingual Neural Machine TranslaHon • is useful for low-resourced languages • transfers knowledge in large-scale se}ng MulHlingual Document ClassificaHon • several large resources available but with disjoint labels • could possibly benefit from NMT lessons Nikolaos Pappas

/88 81

References (1/3) Le, Quoc V., and Tomas Mikolov. "Distributed Representa6ons of Sentences and Documents." In ICML, vol. 14, pp. 1188-1196. 2014. • Kim, Yoon. "Convolu6onal neural networks for sentence classifica6on." arXiv preprint arXiv:1408.5882, 2014. • Denil, Misha, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, and Nando de Freitas. "Modelling, visualising and summarising documents with a single convolu6onal neural network." arXiv preprint arXiv:1406.3830, 2014. • Yang, Zichao, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. "Hierarchical ajen6on networks for document classifica6on." In Proceedings of the 2016 Conference of the North American Chapter of the Associa6on for Computa6onal Linguis6cs: Human Language Technologies. 2016. • Tang, Duyu, Bing Qin, and Ting Liu. "Document modeling with gated recurrent neural network for sen6ment classifica6on." In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422-1432, 2015. • Firat, Orhan, Kyunghyun Cho, Baskaran Sankaran, Fatos T. Yarman Vural, and Yoshua Bengio. "Mul6-way, mul6lingual neural machine transla6on." Computer Speech & Language, 2016. • Pappas, Nikolaos, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, Tao Chen, and Shih-Fu Chang. "Mul6lingual visual sen6ment concept matching.” In Interna6onal Conference of Mul6media Retrieval, 2016. • Pappas, Nikolaos, and Andrei Popescu-Belis. "Explaining the stars: Weighted mul6ple-instance learning for aspectbased sen6ment analysis." In Conference on Empirical Methods in Natural Language Processing, 2014. • Pappas Nikolaos, Andrei Popescu-Belis. “Explicit Document Modeling through Weighted Mul6ple-Instance Learning”, Under review. • Yoav Goldberg. “A primer on neural network models for natural language processing” arXiv preprint:1510.00726, 2015. • Ian Goodfellow, Aaron Courville, and Joshua Bengio. “Deep learning”. Book in prepara6on for MIT Press., 2015. •

Nikolaos Pappas

/88 82

References (2/3) Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun et al. "Google's Neural Machine Transla6on System: Bridging the Gap between Human and Machine Transla6on." arXiv preprint arXiv:1609.08144, 2016. • Zoph, Barret, and Kevin Knight. "Mul6-Source Neural Transla6on." arXiv preprint arXiv:1601.00710, 2016. • Dong, Daxiang, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. "Mul6-task learning for mul6ple language transla6on." In Proceedings of the 53rd Annual Mee6ng of the ACL and the 7th Interna6onal Joint Conference on Natural Language Processing, pp. 1723-1732. 2015. • Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effec6ve approaches to ajen6on-based neural machine transla6on." arXiv preprint arXiv:1508.04025, 2015. • Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine transla6on by jointly learning to align and translate." arXiv preprint arXiv:1409.0473, 2014. • Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." In Advances in neural informa6on processing systems, pp. 3104-3112. 2014. • Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representa6ons using RNN encoder-decoder for sta6s6cal machine transla6on." arXiv preprint arXiv:1406.1078, 2014. • Irsoy, Ozan, and Claire Cardie. "Deep recursive neural networks for composi6onality in language." In Advances in Neural Informa6on Processing Systems, pp. 2096-2104. 2014. • Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. "Empirical evalua6on of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014). • Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computa6on 9, no. 8 (1997): 1735-1780. • Levy, Omer, and Yoav Goldberg. "Dependency-Based Word Embeddings." In ACL (2), pp. 302-308. 2014. •

Nikolaos Pappas

/88 83

References (3/3) • • • • • •

• •

• • •

Pappas, Nikolaos, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, Tao Chen, and Shih-Fu Chang. "Mul6cultural Visual Concept Retrieval and Clustering.”, under review, 2016. Klemen6ev, Alexandre, Ivan Titov, and Binod Bhajarai. "Inducing crosslingual distributed representa6ons of words." 2012. Bengio, Yoshua, and Greg Corrado. "Bilbowa: Fast bilingual distributed representa6ons without word alignments." 2014. Hermann, Karl Moritz, and Phil Blunsom. "Mul6lingual distributed representa6ons without word alignment." arXiv preprint arXiv:1312.6173, 2013. Faruqui, Manaal, and Chris Dyer. "Improving vector space word representa6ons using mul6lingual correla6on." Associa6on for Computa6onal Linguis6cs, 2014. Søgaard, Anders, Željko Agić, Héctor Marˆnez Alonso, Barbara Plank, Bernd Bohnet, and Anders Johannsen. "Inverted indexing for cross-lingual nlp." In The 53rd Annual Mee6ng of the Associa6on for Computa6onal Linguis6cs and the 7th Interna6onal Joint Conference of the Asian Federa6on of Natural Language Processing (ACL-IJCNLP 2015), 2015. Ammar, Waleed, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, and Noah A. Smith. "Massively mul6lingual word embeddings." arXiv preprint arXiv:1602.01925, 2016. Guo, Jiang, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. "Cross-lingual dependency parsing based on distributed representa6ons." In Proceedings of the 53rd Annual Mee6ng of the Associa6on for Computa6onal Linguis6cs and the 7th Interna6onal Joint Conference on Natural Language Processing, vol. 1, pp. 1234-1244, 2015. Lazaridou, Angeliki, Nghia The Pham, and Marco Baroni. "Combining language and vision with a mul6modal skip-gram model." arXiv preprint arXiv:1501.02598, 2015. Li, Jiwei, and Dan Jurafsky. "Do mul6-sense embeddings improve natural language understanding?." arXiv preprint arXiv:1506.01070 (2015). Faruqui, Manaal, Jesse Dodge, Sujay K. Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. "Retrofi}ng word vectors to

Nikolaos Pappas

/88 84

Resources (1/2) ➡ • • •

Online courses

Coursera course on “Neural networks for machine learning” by Geoffrey Hinton hjps://www.coursera.org/learn/neural-networks Coursera course on “Machine learning” by Andrew Ng hjps://www.coursera.org/learn/machine-learning Stanford CS224d “Deep learning for NLP” by Richard Socher hjp://cs224d.stanford.edu/



Conference tutorials



Richard Socher and Christopher Manning, “Deep learning for NLP”, EMNLP 2013 tutorial. hjp://nlp.stanford.edu/courses/NAACL2013/ David Jurgens and Mohammad Taher Pilehvar, “Seman6c Similarity Fron6ers: From Concepts to Documents”, EMNLP 2015 tutorial. hjp://www.emnlp2015.org/tutorials.html#t1 Mitesh M Kharpa, Sarath Chandar, “Mul6lingual and Mul6modal Language Processing”, NAACL 2016 tutorial. hjp://naacl.org/naacl-hlt-2016/t2.html





Nikolaos Pappas

85 /59

Resources (2/2) ➡ Deep learning toolkits

Theano hjp://deeplearning.net/souware/theano • Torch hjp://www.torch.ch/ • Tensorflow hjp://www.tensorflow.org/ • Keras hjp://keras.io/ •

➡ Pre-trained word vectors and codes

• Word2vec toolkit and vectors hjps://code.google.com/p/word2vec/ • GloVe code and vectors hjp://nlp.stanford.edu/projects/glove/ • Hellinger PCA hjps://github.com/rlebret/hpca • Online word vector evalua6on hjp://wordvectors.org/ Nikolaos Pappas

86 /59