LE LANGAGE FACEBOOK - (Mémoire de licence)
07-Mar-2011 Chris Couteau : « ya une différence entre l'abrégé et le pas francais laisse écrit lece sa veu rien dire c'est tout ». Remarque : pratiquement ...
Abréviations = chat sms = langage texto (envoyer / recevoir un sms
en début de ligne ou jouxtant le mot erroné) utilisé pour spécifier que l'on tente de corriger une erreur dans son dernier message. @2m1 a2m1 = à demain.
FINANCES & DÉVELOPPEMENT
03-Dec-2021 bien qu'elle s'en tienne généralement au langage mesuré ... tique est le bit quantique (ou qubit en abrégé). Il est géné-.
Viral Marketing Strategies on Facebook: Analysis experimentation
13-Sept-2016 Stratégie du marketing viral sur Facebook ... disciplines ont pu parler le même langage et contribuer efficacement au même savoir. Ainsi
Detecting Abuse on the Internet: Its Subtle
ABRÉGÉ. Le langage abusif utilisé sur les plateformes virtuelles a des [91] and aggression identification on Facebook posts in Hindi and English [47].
La néographie dans les pratiques langagières des jeunes
Le Facebook et le langage des jeunes. Le Facebook représente le plus grand l'adverbe bien peut s'écrire B1 l'expression Inshallah peut-être abrégé.
Quantifying Language Understanding of Neural Language Models
Abrégé. La compréhension du langage a été un sujet d'étude qui a attiré sity Mila
Limpact de lutilisation du langage SMS sur lorthographe
06-Apr-2017 Le langage SMS : une menace pour l'orthographe ? ... certains réseaux sociaux comme Facebook ou Gmail. Toutefois il existe également les.
Anomie et culture écrite. Enquête dethnographie linguistique sur le
31-Dec-2017 jeunes Tunisiens sur Facebook. THÈSE. Pour obtenir le diplôme de doctorat. Spécialité (SCIENCES DU LANGAGE - LINGUISTIQUE).
Thème :
24-Apr-2019 Notre but de recherche est de savoir si le langage Facebook a un impact sur la langue française utilisée par les jeunes algériens.
Quantifying Language Understanding
of Neural Language ModelsPrasanna Parthasarathi
Doctor of Philosophy
School of Computer Science
McGill University
Montreal, Quebec, Canada
A thesis submitted to McGill University in partial fulfillment of the requirements of the degree ofDoctor of Philosophy
cPrasanna Parthasarathi, 2022
Abstract
Language understanding has been a topic of study that has drawn attention from a variety of disciplines like linguistics, formal semantics, computer science, and psychology. Modern neural language understanding models with a trainable "end-to-end" set up have replaced the classical language pipeline. Although such approaches have shown remarkable suc- cess, the opaqueness in their mechanisms have raised concerns recently. There are several contemporary works that argue that such end-to-end neural models do mimic the classical pipeline while a few works take a more critical stand. This thesis too takes a critical stand, and proposes novel techniques to quantify the lack of understanding of conventional syn- tax and semantics. First, we quantify the semantic understanding of neural models in the task of dialogue prediction by analysing the representation of input learned by the end-to- end models. The results highlight a lack of correlation between the models" performance and the discriminative abilities of the representation learned by the neural language models. Following that, we propose a framework to evaluate syntactic understanding of neural mod- els by analyzing the performance on samples stripped of any conventional notion of syntax in the task of natural language inference. Observing the lack of understanding of syntax, we explore the cost of the models hallucinating a probably correct input by analysing the trade-off on faithfulness vs robustness in machine translation in the subsequent chapter. Finally, we attempt at quantifying the unnaturalness in the language understanding through novel metrics that capture local and global order of tokens. The work compiled together aims at building interpretable techniques for language understanding in neural models and, towards that, does a comprehensive study on quantifying the language understanding in neural models on a variety of language tasks. iAbrégé
La compréhension du langage a été un sujet d"étude qui a attiré l"attention de diverses dis-
ciplines comme la linguistique, la sémantique formelle, l"informatique et la psychologie. Les modèles neuronaux modernes de compréhension du langage pouvant être entraînée avec une configuration " de bout en bout » ont remplacé l"approche séquentielle classique de modules spécialisés. Bien que de tels modèles neuronaux aient connu un succès re-marquable, l"opacité de leurs mécanismes a récemment soulevé des inquiétudes. Il existe
plusieurs travaux contemporains qui soutiennent que de tels modèles neuronaux de bout en bout imitent certains modules spécialisés dans le traitement du langage, tandis que d"autres travaux adoptent une position plus critique. La thèse adopte également une position cri- tique et propose des nouvelles techniques pour quantifier le manque de compréhension de la syntaxe et de la sémantique conventionnelles. Dans un premier temps, nous quantifions la compréhension sémantique des modèles neuronaux dans une tâche de prédiction de di- alogue en analysant la représentation des entrées apprise de bout en bout. Les résultats mettent en évidence un manque de corrélation entre les performances du modèle neuronalet les capacités discriminantes des représentations apprises par celui-ci. Suite à cela, nous
proposons un cadre pour évaluer la compréhension syntaxique des modèles neuronaux en analysant leurs performances sur des échantillons dépourvus de toute notion convention- nelle de syntaxe dans une tâche d"inférence de langage naturel. Observant le manque de compréhension de la syntaxe, nous explorons le coût des modèles hallucinant une entrée probablement correcte en analysant le compromis entre fidélité et robustesse en traduction automatique dans le chapitre suivant. Enfin, nous tentons de quantifier le manque de natureldans la compréhension du langage grâce à de nouvelles métriques qui capturent l"ordre lo-
ii cal et global des mots. L"ensemble du travail vise à construire des techniques interprétables pour la compréhension du langage dans les modèles neuronaux et, à cet effet, réalise une étude approfondie sur la quantification de la compréhension du langage dans les modèles neuronaux sur une variété de tâches linguistiques. iiiContributions to Original Knowledge
The thesis contributes to the topic of natural language processing by proposing novel tech- niques to interpret the almost opaque, and over-parameterized neural models. Specifically, 1. W ehighlight the discrepancies in language generation by e valuatingthe specificity of the representation learnt by neural dialogue models in estimating the dialogue state through linear probing. 2. W eidentify a serious issue of systematic unnatural syntactic underst andingof neural language models in the language inference task. This work showcases that the neural models lack understanding of any conventional notion of syntax. 3. T owardspointing out at a pot entialsocial issue of such unnaturally rob ustmodels, we propose a framework to understand the trade-off between faithfulness and robustness in the domain of neural machine translation. 4. W epropose no velmetrics to quanti fythe sensiti vityto perturbations in an attempt to interpret the word-order insensitivity of neural models. We find evidence suggesting that models may be learning syntactic rules that are governed more so at the level of sub-words and characters than at the word-level. ivContribution of Authors
Most of Chapters 1 and 2 were written specifically for this thesis. mary student author. My contributions were on developing the tools for setting up probe tasks - devising experiments, set up amazon mechanical turk for human evaluations. The paper was jointly written with Sarath Chandar, and Joelle Pineau. Chapter 4 is based on a conference paper (Sinha et al.,2020a ). The primary observa- tion of the unnaturalness in syntax was by Koustuv Sinha. I contributed to extend the experiments on recurrent and convolutional language models. Although the experiments on transformer models were contributed by Koustuv, we present them in the thesis for better understanding of the work. The idea of using a metric to quantify the unnatural- ness, and the designing perturbations through POS mini-tree hypothesis germinated over discussions. Koustuv prototyped the metric and word shuffling operation. We attribute a huge credit to Adina Williams who advised us on the project, and contributed to the majority of writing. This work got the "Outstanding Paper" award at ACL 2021. Joelle Pineau supervised the project by providing useful comments and helping in the writing. Chapter 5 is based on a conference paper (Parthasarathi et al.,2021c ). This is an ex- tension of the previous work to language generation task in neural machine translation. In this, I was the primary contributor of the experiments taking suggestions from Kous- tuv for its design and organization. We attribute a joint credit to the conceptualization of the metrics, to the writing, and for the narrative on faithfulness vs robustness. Adina v Williams helped a lot in the writing of the paper. Joelle Pineau supervised the project by providing useful comments and helping in the writing. Chapter 6 is based on a conference paperClouatre et al. ( 2021). This is a joint work with Louis Cloûatre, Sarath Chandar and Amal Zouaq. My contribution to this work was pre- dominantly to the conceptualization of quantifying perturbations and correlating it with the performance of neural models on language understanding tasks. Louis Cloûatre took lead in the experiments, and I contributed to the writing of the paper. Louis made initial observations on drawing a connection between IDC metric and the use of positional en- coding and formalized the experiments. Sarath Chandar and Amal Zouaq supervised the project by providing useful suggestions and helping in the the writing. Chapter 7 was written specifically for this thesis. The ideas presented in the future work section are attributable to the discussions I have had with Saujas Vaduguru, Marc Alexandre-Cote, Eric Yuan, Sarath Chandar, Koustuv Sinha, Adina Williams and JoellePineau.
Throughout my Ph.D. I am also fortunate to have collaborated in many other research that are not part of this thesis (Rajendran et al.
2017T ruonget al.
2017P arthasarathi
and Pineau 2018Gontier et al.
2018Sinha et al.
2020bP arthasarathiet al.
2020a2021a
McRae et al.
2021vi
Acknowledgements
Thank you Joelle, for the opportunity you provided and your insightful advice. I am im- mensely thankful for guiding me towards a career in research. Thank you Radhika for your unconditional support throughout my Ph.D. journey. I cannot thank you enough for being there for me and consistently cheering me. Thank you Amma and Appa, for instilling the confidence in me to dream bigger and to be an independent person. Thank you Anand for your support and constant motivation. Thank you Sarath, for being a friend, mentor and a well-wisher.1,Saujas,Ma-
mal, Igor, Arjun, Srinivas, Varsha, Sruthika, Sankari, Sai Rajeshwar, Shagun, Disha and Gunshi for the many conversations and board game nights which were a welcome respite. I would like to thank all my mentors, collaborators and colleagues at McGill Univer- sity, Mila, Facebook AI Research (Meta AI) Montreal, Google Brain and Noah"s Ark Lab (Huawei) for many great discussions over coffees and lunches. I would like to also thank the McGill and Mila administration for facilitating my aca- demic journey.1 Special thanks for helping me with the French abstract. viiContents
1 Introduction
12 Background
72.1 Language Model
72.2 Generalized language Models Through Distributed Representations
82.2.1 Feed Forward Neural Network based LM
92.2.2 Recurrent Neural Network based LM
92.2.3 Transformer Language Model
132.3 Natural Language Processing Tasks
162.3.1 Text prediction
162.3.2 Text Classification
182.4 Overview of Metrics
192.5 Probe tasks
213 On quantifying the semantics of language encoders
233.1 Related Work
243.2 Probe Tasks
253.2.1 Datasets
253.2.2 Models
263.2.3 Motivating Semantic Probe Tasks for Dialogue Generation
273.2.4 Dialogue Probe Tasks
283.3 Experiments
303.3.1 Results
303.4 Discussion
373.5 Summary
38viii
4 On the effective role of word-orders in natural language tasks39
4.1 Related Work
404.2 Our Approach
424.3 Methods
434.4 Results
464.5 Analyzing Syntactic Structure Associated with Tokens
504.6 Human Evaluation
534.7 Summary
535 On the effects of unchecked normativity in language generation
555.1 Related Work
565.2 Metrics
585.3 Perturbations
605.3.1 Random Shuffles
615.3.2 Part-of-Speech tag Based Perturbations
615.3.3 Dependency Tree Based
625.3.4 Distribution
635.4 Experiments
645.5 Results
655.5.1 Faithfulness vs. Robustness
655.5.2 Patterns in1and2, and Length. . . . . . . . . . . . . . . . . . 70
5.6 Discussion
745.7 Summary
786 On quantifying the perceived unnaturalness
796.1 Related Work
816.2 Proposed Metrics
836.3 Perturbation Functions
856.4 Experiments
876.5 Analysis
886.5.1 Correlation with other metrics
886.5.2 Comparison of Perturbation Functions
88ix
6.5.3 IDC/DND vs GLUE tasks. . . . . . . . . . . . . . . . . . . . . . 89
6.5.4 Model specific analysis
956.5.5 Character-Level Experimentation
976.6 Summary
987 Final Conclusion & Future Work
1007.1 Final Conclusion
1007.2 Future Work
101Bibliography
104Acronyms
135x
List of Figures
2.1 A Recurrent Neural Network Language Model.
102.2 A single LSTM-RNN cell.
112.3 Bidirectional Recurrent Neural Network
122.4 Transformer architecture (
Vaswani et al.
201713
3.1 The mean of the distribution of tie in three different experiments.
283.2 Progression of performance of models on the probe tasks in MultiWoZdataset.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Progression of performance of models on the probe tasks in MultiWoZdataset (Continued).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Downsampled encoder hidden states on MultiWoZ dataset with PCA.
364.1 Graphical representation of the Permutation Acceptance class of metrics.
454.2 Average entropy of model confidences on permutations across differentmodels.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 BLEU-2 score versus acceptability of permuted sentences across all testdatasets.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 POS Tag Mini Tree overlap score.
525.1 Effect of the different perturbation functions.
605.2 Tree based perturbation example.
635.3 Heatmap illustrating the average of Levenshtein distances between differ-ent perturbations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Plot showing the trend of1scores being generally higher than2.. . . . . 66
5.5 Analysing the BLEURT score as a choice of.. . . . . . . . . . . . . . . 67
5.6 Analysing the BERT-score as a choice of.. . . . . . . . . . . . . . . . . 68
5.7 Analysing the Levenshtein score as a choice of.. . . . . . . . . . . . . . 69
xi5.8 The robustness of the NMT systems having a strong correlation with the
performance of the machine translation system. 705.9 Correlation of length of the text to the two different metrics.
725.10Le. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
5.11 General trend of SoTA NMT systems to favor robust or faithful translations.
745.12 Averaging the difference between faithfulness and robustness across lan-guages and NMT systems.. . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1 Example of perturbation operationss applied at different granularity.
806.2 Word-level full shuffling perturbation example.
866.3 Subword-level phrase shuffling perturbation example.
866.4 Character-level full neighbor flip perturbation example.
876.5 Pairwise correlation between the different metrics on the GLUE tasks.
896.6 Relation between the different choices of metrics measuring the amount ofperturbation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.7 Analysing different perturbation functions discussed in the literature withthe proposed metrics - IDC and DND.. . . . . . . . . . . . . . . . . . . 91
6.9 Correlation between the models" performance to perturbed samples on thedifferent GLUE tasks.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.8 Comparison of different neural architectures" performances with differentlevel of perturbation as measured by DND.. . . . . . . . . . . . . . . . . 92
6.10 Correlation between perturbations measured by different metrics and theperformance on GLUE Tasks of pre-trained Transformers.. . . . . . . . . 93
6.11 Correlation between perturbations measured by different metrics and theperformance on GLUE Tasks of different non-pretrained architectures.. . . 94
6.12 Correlation between perturbations measured by different metrics and theperformance on GLUE Tasks of ConvNets and BiLSTMs using only char-acters as tokens.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.13 Difference in GLUE scores between a Transformer and the same Trans-former without positional embeddings.. . . . . . . . . . . . . . . . . . . . 97
xiiList of Tables
3.1 Distribution of the dialogues in the datasets.
253.2 Size of parameters of the models used in all the experiments on the two
data sets.Mfor Million.. . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 BLEU scores of the models from runs with different seeds on PersonaChat
and MultiWoZ dataset. 273.4 The difficulty levels of different tasks is measured with the average perfor-
mance of an untrained encoder. 293.5 Performance of neural models on the probe tasks constructed with Per-
sonaChat dataset. 313.6 Performance of different neural models on the probe tasks constructed with
MultiWoZ 2.0 dataset.
323.7 Aggregate F1 scores of the models on performance in probe tasks on Mul-
tiWoZ dataset. 374.1 Statistics for Transformer-based models trained on MNLI corpus.
474.2 Results on evaluation on OCNLI Dev set.
484.3 Human (expert) evaluation on 200 permuted examples from the MNLI
matched development set. 53xiii
5.1 Performances in BLEU-4 () of our NMT models.. . . . . . . . . . . . . 75
5.2 Number of flips by language and NMT models.
775.3 The distribution count of flips by every perturbation functions across the
languages and models. 78xiv 1
Introduction
Natural language, as we speak, write and plan with, has its myriad connections with evo- lution, culture, and knowledge of humans as a species. The origins of the early (proto) languages could be traced back to tens of thousands of years (Everett
2017). Language served the purpose of communicating information about predator and prey to an individual or a group. As humans and their societies evolved, the language too became sophisticated about describing events that occurred in the past or occurring in the present or going to oc- cur in the future. Such sophistication in description allowed humans to plan future actions or analyse the actions of the past and so on. Although planning a task and communicating an event directly benefited from the development of language, language also served other purposes like community building; where the speakers engaged in conversations to advise, argue or chat without a direct purpose. Evolution of languages for the most part of the history were endemic (
Harari
2014Within a region, the primary learning involved through familiarizing with the rules appli- cable in the different usecases (syntax), and learning to use the vocabulary appropriately (semantics). The geographical expansion of different communities added variety to the dif- ferent tasks within natural languages like translation, understanding a common grammar, developing novel languages by fusing the vocabularies among others (
Pinker
2003). Mod- ern day studies on natural language attempt to learn the structure dictated by the gram- mar, generate language, understanding language emergence or perform natural language understanding (NLU) tasks like - summarizing, answering questions from a passage, rec- 1
Introduction
ognizing textual entailment, classifying the sentiment, machine translation among many other tasks. Wide spread usage of modern commercial applications that are powered by models solving aforementioned language tasks have been aiding the advancement in such technologies. Need for Neural Language ModelsThe use cases in several commercial language ap- plications enabled the collection of large corpora of user interaction that could in-turn be used to learn statistical solutions. The premise of such sophisticated data-driven models is straightforward in that a sequence of projection operations applied to identify a rep- resentation space that allows maximal distinction of the possible classes in the samples. Recently, such architectures with deep representations with over-parameterized models for the task of language learning has garnered attention (Vaswani et al.
2017). The prospects of transformers piqued the interests of the NLP community.Pre-Trainingthe transformer architectures with large corpora of data was observed as an effective technique to learn sophisticated latent representation for several language understanding tasks
Radford et al .
2018De vlinet al.
2018aLiu et al.
2019dRaf felet al.
2019While applying powerful transformer networks on NLP tasks has shown success, ver- ifying whether its predictions are indeed an entailment to the information in the text is necessary. For example, consider a sentence that conveys some specific instruction like, "Give the blue ball to the child playing with the green toy". The syntax or the grammar rules of the English language allow comprehending the instruction to appropriately extract the information (action and entities). If the words were to be ordered in a different way, the syntactic rules cannot be effectively applied to decode themeaning. Psycho-linguistic research have observed that humans find it easier to identify or recall words presented in canonical orders than in disordered, ungrammatical sentences; this phenomenon is called the"sentence superiority effect"((Cattell,1886 ;Scheerer ,1981 ;T oyota,2001 ;Baddele y et al. 2009
Snell and Grainger
20172019
W enet al.
2019), i.a.). The role of syntax in NLUGenerally, knowing the syntax of a sentence is taken to be a prerequisite for understanding what that sentence means (
Heim and Kratzer
1998Models should have to know the syntax first then, if performing any particular NLU task that genuinely requires a humanlike understanding of meaning (cf. (
Bender and Koller
2Introduction
2020)). The null-hypothesis for what should a model do when it encounters a sentence that probably does not make sense is very much dependent on the task. For a text classification task, where the objective is to predict a label from a finite set of categories the lack of understanding could be correlated with high perplexity. On the other hand, in a generative utterance or generating a translation) it becomes less straightforward. In such scenarios, arguments can be constructed in support of the models staying robust or to be faithful to the noisy input with conviction or if possible (in case of dialogue prediction) can ask for clarification. out-of-domaindata(
LuongandManning
2015),orwhentrainedwithnoisyinputdatacon- taining small orthographic (
Sakaguchi et al.
2017Belink ovand Bisk
quotesdbs_dbs46.pdfusesText_46[PDF] langage c exercices corrigés
[PDF] langage c somme de 2 entiers
[PDF] langage c++
[PDF] langage calculatrice ti-83 plus
[PDF] langage de programmation pdf
[PDF] langage de texto
[PDF] Langage des fonctions, algébrique et lié au graphique
[PDF] langage et mathématiques
[PDF] langage javascript cours
[PDF] langage javascript debutant
[PDF] langage mathématique de base
[PDF] langage naturel maths
[PDF] langage pascal exercices corrigés pdf
[PDF] langage pascal informatique