Machine Translation – UdS Summer Semester ... - Raphael Rubino

Recent advances in Neural Network (NN) approaches applied to Computational ... as Machine Translation (MT) [Sennrich and Haddow, 2016] and several ...
313KB taille 33 téléchargements 80 vues
Machine Translation – UdS Summer Semester 2018 Neural MT Lab Session 1 Raphael Rubino

Introduction Recent advances in Neural Network (NN) approaches applied to Computational Linguistics and Natural Language Processing (NLP) denote a growing trend in applied machine learning where researchers and industrials let data "speak" without linguistic assumptions or preconceptions. NNs are at the core of many state-of-the-art NLP applications such as Machine Translation (MT) [Sennrich and Haddow, 2016] and several architectures were proposed by researchers in the field. The objectives of this first lab session on Neural MT (NMT) is twofold: 1) setting up the environment to train and evaluate NMT models based on state-of-the-art approaches implemented in popular toolkits, with controlled data and approach, 2) understanding the impact of NN hyper-parameters on modelling the translation process and optimizing them to obtain the best result (i.e. BLEU score) on a held-out test set.

Data Training and Validation The provided datasets are composed of three subsets: training, validation and test. The training and validation sets are provided today1 , while the test set will be released later and will be used to provide a final evaluation of the trained models. All the provided datasets were build using publicly available resources from the opus parallel corpus collection2 and contain a mix of the EU Bookshop, JRC-Acquis, News Commentary and Open Subtitles corpora. The data is pre-processed using the tokenizer and lowercasing tools available in the moses toolkit3 , with the former script applied with the parameters -a and -no-escape. The resulting sentence pairs were then filtered, keeping only the ones containing words within the top thousand most frequent words, and shuffled before being split in the three subsets. The resulting source and target top-thousand words composing our vocabularies are also provided along with the training and validation sets. Statistics about these two subsets are presented in Table 1. It is mandatory to only use these two sets for training and validation, no additional corpora are allowed! Test The test set is built the same way as the training and validation sets but will be release at a later stage and will be used for the final evaluation of the models. It is important to keep a held-out set, such as this test set, in order to avoid over-fitting the training and validation sets. The evaluation results will be reported in terms of BLEU scores calculated using the multi-bleu.perl script distributed with the moses toolkit. The BLEU scores will be calculated on lowercased and tokenized translations of the test set. Corpus Training Validation

Segments

German Words

German 1-grams

Spanish Words

Spanish 1-grams

4, 000 200

18, 904 947

880 337

20, 341 1, 032

860 318

Table 1: Training and validation sets statistics

Tools NMT Implementations Two NMT toolkits are suggested for this lab session: OpenNMT4 and marian5 . While the latter toolkit is written in c++, the former is declined in multiple versions depending on its backend: Lua, PyTorch or Tensorflow. You are free to pick the implementation you want among these two toolkits. For this first lab session, we will use the Recurrent NN (RNN) with gated units for both the encoder and decoder of our NMT system. You are free to experiment with hyper-parameters, including the variants of RNN (mono or bi-directional, single or multiple-layered) and the variants of gated hidden units. It is recommended to first train a model for a few epochs (a few iterations over the training data) using the default parameters of the toolkit used and evaluate the best model on the validation set using BLEU. This step will define a baseline system, i.e. a system that you should outperform during your experiments with hyper-parameters tweaking. 1 http://www.coli.uni-saarland.de/~rubino/nmt/uds_summer_2018.nmt_lab_1.train_validate.tgz 2 http://opus.nlpl.eu/ 3 https://github.com/moses-smt/mosesdecoder.git 4 http://opennmt.net/ 5 https://marian-nmt.github.io/

1

OpenNMT LSTM BiRNN Embedding 32, Hidden 64, Layers 2, Dropout 0.1 25

20

BLEU

15

10

5

0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

epoch

Figure 1: BLEU scores obtained for each epoch during training of a Bi-RNN LSTM implemented in OpenNMT NLP Tools Additional NLP tools might be used to pre-process the data prior to training, validation and testing, to improve the performance of the NMT system, but are not mandatory. Such tools could be, for instance, part-of-speech taggers, lemmatizer, syntactic parsers, etc. Most of these tools are available in the Stanford CoreNLP toolkit6 and are usable out-of-the-box without additional training or customization. Furthermore, it was shown in empirical studies that NMT performance can be improved by using sequences of characters (subwords) instead of modelling words [Sennrich et al., 2016]. One popular approach is based on the byte-pair encoding algorithm and is straightforward to train and apply on your datasets7 .

Report Hyper-parameters Tuning For each set of hyper-parameters used to train the NMT system, such as the encoder type, the embedding size, the dropout value, etc., you must report the results obtained for each epoch during training. Some NMT implementations have a built-in BLEU calculation but it is mandatory to translate the validation set using the resulting model from each epoch and use the multi-bleu.perl script for evaluation. The BLEU scores can be plotted as presented in Figure 1.

Analysis Tweaking hyper-parameters of a NMT system impacts the performance in terms of BLEU scores. Some hyperparameters allow to reduce or increase the quantity of memory the network has access to (for instance the embedding dimensionality) and thus have a interpretable influence on the NMT performance. However, other hyper-parameters might be more difficult to tune because of their interaction with each other. It is then crucial to analyse the impact of modifying individual and combinations of hyper-parameters. This analysis will be conducted on the obtained BLEU scores after evaluation, but also on the translated validation set: does it contain out-of-vocabulary words, truncated sentences, repetitions, etc. This analysis will focus on language independent and surface features requiring no linguistic knowledge of the source or the target languages. Deliverables For this first lab session on NMT, one deliverable is a report containing the details about the implementation used, the additional tools involved if any, the BLEU results plotted on one or several figures and the analysis of the results according to the hyper-parameters selected. The report should also contain your name and student ID. A second deliverable is the best model based on the BLEU score obtained on the validation set. All deliverables have to be put in an archive (zip, tar.gz, etc.) and sent to [email protected] with the following object: COLI NMT Lab Session 1. 6 https://stanfordnlp.github.io/CoreNLP/ 7 https://github.com/rsennrich/subword-nmt.git

2

References [Sennrich and Haddow, 2016] Sennrich, R. and Haddow, B. (2016). Linguistic input features improve neural machine translation. In Proceedings of the First Conference on Machine Translation, pages 83–91, Berlin, Germany. Association for Computational Linguistics. [Sennrich et al., 2016] Sennrich, R., Haddow, B., and Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1715–1725.

3