Bootstrap Methods for Multi-Task Dependency Parsing in Low PDF

quête de terrain qualitative. Sten Hagberg professeur en anthropologie cultu- relle de l'université d'Uppsala

Uppsala Papers in Africa Studies 2 Editor: Sten Hagberg

quête de terrain qualitative. Sten Hagberg professeur en anthropologie cultu- relle de l'université d'Uppsala

Uppsala Papers in Africa Studies 4 Editor: Sten Hagberg

Sten Hagberg Ludovic O. Kibora et Gabriella Körling. Uppsala 2019 sur le terrain et les points de vue qui en découlent : émique se réfère aux notions

Complete French All-in-One - Annie Heminway.pdf

Zoé is reading a poetry book in her fauteuil. armchair. -ain le bain bath swim le pain bread le grain grain le terrain ground

Écologie spatiale du micronecton: distribution diversité et

campagnes de marquage des oiseaux marins (pétrels et puffins) et du barcoding et encadrés des journées d'information auprès du grand public ont été ...

Drawing Habits in Nineteenth-Century France Shana Cooperstein

Chapter 1: Drawing at The French Academy and its Roulleau so to him I owe un grand merci. ... terrain on which drawing reforms acquired ground.8.

RÉPUBLIQUE CENTRAFRICAINE : ANATOMIE DUN ÉTAT

13-Dec-2007 C. LE GRAND BOND EN ARRIÈRE DE L'ÉCONOMIE CENTRAFRICAINE . ... campagne pour sa réélection – qui s'assimile à un plébiscite.

Bootstrap Methods for Multi-Task Dependency Parsing in Low

supervisées à partir de grands corpus annotés. tion aux campagnes d'évaluation proposées dans le cadre de la conférence CoNLL.

ASSOCIATION pour la CONNAISSANCE des TRAVAUX PUBLICS

Participe à la séance : Ludovic Bidois et Noël Richet (Délégué général ASCO-TP) Etude de la faisabilité du projet Grand Paris.

Untitled

le plus grand nombre à traduire les peurs ou les aspirations en une image Au fondamental

Bootstrap Methods for Multi-Task Dependency Parsing in

Low-resource Conditions

KyungTae Lim

Abstract

Dependency parsing is an essential component of several NLP applications owing its ability to capture complex relational information in a sentence. Due to the wider availability of dependency treebanks, most dependency parsing systems are built us- ing supervised learning techniques. These systems require a significant amount of annotated data and are thus targeted toward specific languages for which this type of data are available. Unfortunately, producing sufficient annotated data for low-re- source languages is time- and resource-consuming. To address the aforementioned issue, the present study investigates three bootstrapping methods, namely, (1) multi- lingual transfer learning, (2) deep contextualized embedding, and (3) co-training. Mul- tilingual transfer learning is a typical supervised learning approach that can transfer dependency knowledge using multilingual training data based on multilingual lexical representations. Deep contextualized embedding maximizes the use of lexical features during supervised learning based on enhanced sub-word representations and language model (LM). Lastly, co-training is a semi-supervised learning method that leverages parsing accuracies using unlabeled data. Our approaches have the advantage of re- quiring only a small bilingual dictionary or easily obtainable unlabeled resources (e.g., Wikipedia) to improve parsing accuracy in low-resource conditions. We evaluated our parser on 57 official CoNLL shared task languages as well as on Komi, which is a lan- guage we developed as a training and evaluation corpora for low-resource scenarios. The evaluation results demonstrated outstanding performances of our approaches in both low- and high-resource dependency parsing in the 2017 and 2018 CoNLL shared tasks. A survey of both model transfer learning and semi-supervised methods for low-resource dependency parsing was conducted, where the effect of each method under different conditions was extensively investigated. I

Méthodes dĜ Ĝ

par

KyungTae Lim

Résumé

Note : Le résumé étendu en français se trouve en annexe, à la section ( B.1 L"analyse en dépendances est une composante essentielle de nombreuses applica- tions de TAL (Traitement Automatique des Langues), dans la mesure où il s"agit de fournir une analyse des relations entre les principaux éléments de la phrase. La plu- part des systèmes d"analyse en dépendances sont issus de techniques d"apprentissage supervisées, à partir de grands corpus annotés. Ce type d"analyse est dès lors lim- ité à quelques langues seulement, qui disposent des ressources adéquates. Pour les langues peu dotées, la production de données annotées est une tâche impossible le plus souvent, faute de moyens et d"annotateurs disponibles. Afin de résoudre ce problème, la thèse examine trois méthodes dĜ Ĝ II

Acknowledgments

I am extremely happy and fortunate to have met all the members of Lattice. Words can"t express my deep appreciation to my supervisor, Thierry. He is not only an adviser for me but more of a life mentor. Thierry(patiently)guided and helped me continuously to grow all through out my PhD. Back in the first year of my Phd, I only pursued to focus on improving implementation skills by participating in the CoNLL shared task. Thierry guided me in the right path along with a conducive environment. And through his help and non-stop effort, I had reached my dream on the shared task. I have a good memory of the ACL 2017 conference; I was in Vancouver to present our shared task results there and met many brilliant researchers who participated in the same shared task. They are professionals not only in the technical aspect but also incredibly passionate to share their ideas. It motivated me to be one of them and I also wanted to share my ideas with them by publishing conference papers. I knew it is not easy to publish an article in a good conference, but it was much harder than I expected. Sometimes, I got frustrated and too emotional. Whenever I get overly emotional, Thierry encouraged me and helped me find out the reason why I started all of these and that thought kept me motivated. Thanks to him. He was kind and understanding every time I"m not quite myself and when I needed a person to lean on. I also want to thank my doctoral committee members, Benjamin and Remi. Their thoughtful comments and wisdom led me to getting accepted in a CICLing paper. I also would like to give thanks to Jamie and Jay-Yoon in CMU, who gave me most of the ideas for the Co-training work for AAAI conference paper. Thanks, Niko and Alex, they always make me crazy in parsing low-resource languages. I"m grateful for many LATTICE lab members: Loïc, Pablo, Martine, Sophie, Clé- ment, Frédérique, Fabien, and others. Whenever I"m in trouble, they have always supported me. Most of you know, it is tough to make a living in Paris as an inter- national student aged over 30. Back in 2017, when I first came to Paris, many lab members helped me to find accommodation, helped me to study French, and even III tried to search a French class for my wife. During my time as a student, I was fortunate to have many friends and professors to support me. I would like to give special thanks to the Paris NLP study group: Djame, Benoît, Éric, Benjamin, Pedro, Gael, Clementine, and others. I have learned not only NLP theories but also a way of thinking from a linguistic point of view from them, and the fun memories such as our regular beer time will be cherished forever and ever. Finally, I wouldn"t be where I am today without the support of my family. I have always kept in mind a lot of sacrifices and commitment from my parents and my wife. I want to thank everyone and say that I love you with all my heart. My journey with my wife in Paris will always be an unforgettable memory until the end of my life. IV

1 Introduction

1.1 Research Questions

1.2 Contributions

1.3 Thesis Structure

1.4 Publications Related to the Thesis

2 Background

2.1 Syntactic Representation

2.2 Dependency Parsing

2.2.1 Transition-based Parsing

2.2.2 Graph-based Parsing

2.2.3 Neural Network based Parsers

2.2.4 A Typical Neural Dependency Parser: the BIST-Parser

2.2.5 Evaluation Metrics

2.3 Transfer Learning for Dependency Parsing

2.4 Semi-Supervised Learning for Dependency Parsing

3 A Baseline Monolingual Parser, Derived from The BIST Parser

3.1 A Baseline Parser Derived from the BIST Parser

3.2 Experiments during the CoNLL 2017 Shared Task

3.2.1 The CoNLL 2017 Shared Task

3.2.2 Experimental Setup

3.2.3 Results

51
V

3.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

4 A Multilingual Parser based on Transfer Learning

4.1 Our Approach

4.2 A Multilingual Dependency Parsing Model

4.2.1 Cross-Lingual Word Representations

4.2.2 Cross-Lingual Dependency Parsing Model

4.3 Experiments on Komi and Sami

4.3.1 Experiment Setup

4.3.2 Results

4.4 Experiments on The CoNLL 2017 data

4.4.1 Experiment Setup

4.4.2 Results

4.5 Summary

5 A Deep Contextualized Tagger and Parser

5.1 Multi-Attentive Character-Level Representations

5.2 Deep Contextualized Representation (ELMo)

5.3 Deep Contextualized Tagger

5.3.1 Two Taggers from Character Models

5.3.2 Joint POS Tagger

5.3.3 Experiments and Results

5.4 A Deep Contextualized Multi-task Parser

5.4.1 Multi-Task Learning for Tagging and Parsing

100

5.4.2 Experiments on The CoNLL 2018 Shared Task.

103

5.4.3 Results and Analysis

106

5.5 Summary

115

6 A Co-Training Parser on Meta Structure

116

6.1 Parsing on Meta Structure

119

6.1.1 ThebaselineModel

121
VI

6.1.2 Supervised Learning onMetaStructure (meta-base). . . .123

6.2 Parsing on Co-Training

124

6.2.1Co-meta

125

6.2.2 Joint Semi-Supervised Learning

126

6.3 Experiments

127

6.3.1 Data Sets

127

6.3.2 Evaluation Metrics

127

6.3.3 Experimental Setup

128

6.4 Results and Analysis

129

6.4.1 Results in Low-Resource Settings

132

6.4.2 Results in High-Resource Settings

137

6.5 Summary

139

7 Multilingual Co-Training

141

7.1 Integration of Co-Training and Multilingual Transfer Learning

142

7.2 Experiments

143

7.2.1 Preparation of Language Resources

143

7.2.2 Experiments strategies

144

7.3 Results

144

7.4 Summary

146

8 Conclusion

147

8.1 Summary of the Thesis

147

8.2 Discussion over the Research Questions of the Thesis

148

8.3 Perspectives

153

A Universal Dependency

155

A.1 The CoNLL-U Format

155

A.2 Tagsets

157

B Résumé en français de la thèse

160

B.1 Introduction

160
VII B.2 État de l"art. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .166 B.3 Mise au point d"un modèle lexical multilingue 169

B.3.1 Préparation de ressources linguistiques

170
B.3.2 Projection de plongements de mots pour obtenir une ressource multilingue 170
B.3.3 Corpus annotés au format Universal Dependencies 172
B.4 Modèle d"analyse en dépendancescrosslingue 172

B.4.1 Architecture du système d"analyse

173

B.4.2 Modèle d"analyse

174

B.5 Expériences

175

B.6 Résultats et analyse

178

B.7 Conclusion

182

References

188
VIII

List of Figures

2-1 Syntactic representation of the sentence ''The big dog chased the cat"".

On the left a constituent analysis, on the right the dependency analysis. 12

2-2 An example of English Universal Dependency corpus

2-3 Representation of the structure of the sentence ''I prefer the morning

flight through Denver" using a dependency representation. The goal of a parser is to produce this kind of representation for unseen sen- tences, i.e., find relations among words and represent these relations with directed labeled arcs. We call this a typed dependency structure because the labels are drawn from a fixed inventory of grammatical relations. (taken from Stanford Lecture:https://web.stanford.

15.pdf)

2-4 Basic transition-based parser. (taken from Stanford Lecture:https:

@M\{}jurafsky/slp3/15.pdf) 19

2-5 An example of a dependency tree and the transitions-based parsing

process (taken from (Zhang et al., 2019)) 21

2-6 An example of a graph-based dependency parsing (taken from (Yu,

2018))

2-7 An example of binary feature representations (fromhttps://blog.

2-8 An example of the continuous representations (same source as for the

previous figure). 26
IX

2-9 An example of the skip-gram model. Here, it predicts the center (focus)

word ''learning" based on the context words (same source as for the previous figure). 27

2-10 Illustration of the neural model scheme of the graph-based parser when

calculating the score of a given parse tree (this figure and caption are taken from the original paper (Kiperwasser and Goldberg, 2016a)). The parse tree is depicted below the sentence. Each dependency arc in the sentence is scored using an MLP that is fed by the BiLSTM encoding of the words at the arc"s end points (the colors of the arcs correspond to colors of the MLP inputs above), and the individual arc scores are summed to produce the final score. All the MLPs share the same parameters. The figure depicts a single-layer BiLSTM, while in practice they use two layers. When parsing a sentence, they compute scores for all possiblen2arcs, and find the best scoring tree using a dynamic-programming algorithm. 31

2-11 Illustration of multilingual transfer learning in NLP (the figure is based

from (Jamshidi et al., 2017)) 37

2-12 ""How the transfer learning transfers knowledge in parsing?"". A parser

learns the shared parameters (Wd) based on supervised-learning. Since the learning is a data-driven task with inputs, source language can affect to tune the parameter (Wd) for the target language (the figure is taken from (Yu et al., 2018). 38
X

3-1 Overall system structure for training language models.(1) Embed-

ding Layer: vectorized features that are feeding into Bidirectional LSTM.(2) Bidirectional-LSTM: train representation of each to- ken as vector values based on bidirectional LSTM neural network.(3) Multi-Layer Perceptron: build candidate of parse trees based on trained(changed) features by bidirectional LSTM layer, and then cal- culate probabilistic scores for each of candidates. Finally, if it has multiple roots, revise it or select the best parse tree. 44

4-1 An example of the cross-lingual representation learning method be-

tween English (Source Language) and French (Target Language) 63

4-2 An example of our cross-lingual dependency parsing for Russian (Source

Language) and Komi (Target Language)

5-1 An example of the word-based character model with a single attention

representation (Dozat et al., 2017b) 83

5-2 An example of the word-based character model with three attention

representations. 84

5-3 (A) Structure of the tagger proposed by Dozat et al. (2017b) using a

word-based character model and (B) structure of the tagger proposed by Bohnet et al. (2018a) using a sentence-based character model with meta-LSTM. 85

5-4 Overall structure of our contextualized tagger with three different clas-

sifiers. 90

5-5 An example of the procedure to generate a weighted POS embedding.

5-6 Overall structure of our multi-task dependency parser.

101

6-1 An example of word similarity captured by different Views (from CS224N

Stanford Lecture:http://web.stanford.edu/class/cs224n/) 121
XI

6-2 Overall structure of our baseline model. This system generates word-

and character-level representation vectors, and concatenates them as a unified word embedding for every token in a sentence. To trans- form this embedding into a context-sensitive one, the system encodes it based on the individual BiLSTM for each tagger and parser. 122

6-3 Overall structure of ourCo-metamodel. The system consists of three

different pairs of taggers and parsers that are trained using limited context information. Based on the input representation of the word, character, and meta, each model draws a differently shaped parse tree. Finally, our co-training module induces models to learn from each other using each model"s predicted result. 124

6-4 An example of the label selection method forensembleandvoting.

133

6-5 Evaluation results for Chinese (zh_gsd) based on different sizes of

the unlabeled set and proposed models. We applyensemble-based Co-metawith the fixed size of 50 training sentences while varying the unlabeled set size. 134

6-6 Evaluation results for Chinese (zh_gsd) based on the different sizes

of the train set and proposed models. We applyensemblebased Co-metawith the fixed size of 12k unlabeled sentences while varying training set size. 136

7-1 The overall structure of ourCo-metaMmodel. This system gener-

ates word- and character-level representation vectors and concatenates them into a unified word embedding for every token in a sentence. The word-level representation can be a multilingual embedding as proposed in Section 4.2. Thus, this system can train a dependency model, using both labeled and unlabeled resources from several languages. 142
A-1 An example of tokenization of Universal Dependency 155
A-2 An example of syntactic annotation of Universal Dependency 156
XII B-1 Architecture du réseau de neurones. . . . . . . . . . . . . . . . . . .174 XIII

List of Tables

3.1 Official results with rank. (number): number of corpora

3.2 Official results with monolingual models (1).

3.3 Official results with monolingual models (2).

3.4 Relative contribution of the different representation methods on the

English development set (English_EWT).

3.5 Contribution of the multi-source trainable methods on the English de-

velopment set (English_EWT). 54

4.1 Dictionary sizes and size of bilingual word embeddings generated by

each dictionary. 64

4.2 Labeled attachment scores (LAS) and unlabeled attachment scores

(UAS) for Northern Sami (sme) 68

4.3 The highest results of this experiment (FinnishSami model) compared

with top 3 results for Sami from the CoNLL 2017 Shared Task. 69

4.4 Labeled attachment scores (LAS) and unlabeled attachment score (UAS)

for Komi (kpv). We doesn"t conduct training for ''kpv + eng + rus" language combination because of unrealistic training scenario (It takes more than 40GB memory for training) 69
XIV

4.5 Languages trained by a multilingual model.Embedding model:ap-

plied languages that were used for making multilingual word embed- dings.Bilingual Dic:resources to generate bilingual dictionaries Training corpora:Training corpora that were used.7 languages: English, Italian, French, Spanish, Portuguese, German, Swedish.(num- ber):the number of multiplication to expand the total amount of corpus. 72

4.6 Official experiment results with rank. (number): number of corpora

4.7 Official experiment results processed by multilingual models.

5.1 Hyperparameter Details

quotesdbs_dbs47.pdfusesText_47

[PDF] Bootstrap Methods for Multi-Task Dependency Parsing in Low

Low-resource Conditions

KyungTae Lim

Abstract

Méthodes dĜ Ĝ

KyungTae Lim

Résumé

Acknowledgments

Contents

1 Introduction

1.1 Research Questions

1.2 Contributions

1.3 Thesis Structure

1.4 Publications Related to the Thesis

2 Background

2.1 Syntactic Representation

2.2 Dependency Parsing

2.2.1 Transition-based Parsing

2.2.2 Graph-based Parsing

2.2.3 Neural Network based Parsers

2.2.4 A Typical Neural Dependency Parser: the BIST-Parser

2.2.5 Evaluation Metrics

2.3 Transfer Learning for Dependency Parsing

2.4 Semi-Supervised Learning for Dependency Parsing

3 A Baseline Monolingual Parser, Derived from The BIST Parser

3.1 A Baseline Parser Derived from the BIST Parser

3.2 Experiments during the CoNLL 2017 Shared Task

3.2.1 The CoNLL 2017 Shared Task

3.2.2 Experimental Setup

3.2.3 Results

3.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

4 A Multilingual Parser based on Transfer Learning

4.1 Our Approach

4.2 A Multilingual Dependency Parsing Model

4.2.1 Cross-Lingual Word Representations

4.2.2 Cross-Lingual Dependency Parsing Model

4.3 Experiments on Komi and Sami

4.3.1 Experiment Setup

4.3.2 Results

4.4 Experiments on The CoNLL 2017 data

4.4.1 Experiment Setup

4.4.2 Results

4.5 Summary

5 A Deep Contextualized Tagger and Parser

5.1 Multi-Attentive Character-Level Representations

5.2 Deep Contextualized Representation (ELMo)

5.3 Deep Contextualized Tagger

5.3.1 Two Taggers from Character Models

5.3.2 Joint POS Tagger

5.3.3 Experiments and Results

5.4 A Deep Contextualized Multi-task Parser

5.4.1 Multi-Task Learning for Tagging and Parsing

5.4.2 Experiments on The CoNLL 2018 Shared Task.

5.4.3 Results and Analysis

5.5 Summary

6 A Co-Training Parser on Meta Structure

6.1 Parsing on Meta Structure

6.1.1 ThebaselineModel

6.1.2 Supervised Learning onMetaStructure (meta-base). . . .123

6.2 Parsing on Co-Training

6.2.1Co-meta

6.2.2 Joint Semi-Supervised Learning

6.3 Experiments

6.3.1 Data Sets

6.3.2 Evaluation Metrics

6.3.3 Experimental Setup

6.4 Results and Analysis

6.4.1 Results in Low-Resource Settings

6.4.2 Results in High-Resource Settings

6.5 Summary

7 Multilingual Co-Training

7.1 Integration of Co-Training and Multilingual Transfer Learning

7.2 Experiments

7.2.1 Preparation of Language Resources

7.2.2 Experiments strategies

7.3 Results

7.4 Summary

8 Conclusion

8.1 Summary of the Thesis

8.2 Discussion over the Research Questions of the Thesis