Statistics in Corpus Linguistics PDF

Pour cela deux méthodes: soit on met un texte en particulier à l'honneur parmi les textes du corpus en se prononçant rapidement sur son originalité

AN INTRODUCTION TO CORPUS LINGUISTICS

Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Gena R. Bennett pora there is a “method” to employ. The Corpus ...

Corpus sampling

This difficulty may account partly for the reaction against corpus-based linguistics during the Chomsky-dominated decades of the 1960s and 1970s. Intro-.

DE LA PRESENTATION DU CORPUS

Quelle(s) méthode(s) pour appréhender un corpus en bac ? 1. Découvrir le corpus. 2. Caractériser les documents. 3. Hiérarchiser les documents.

An automated method to build a corpus of rhetorically-classified

26 thg 6 2014 Abstract. The rhetorical classification of sentences in biomedical texts is an important task in the recognition of the components of.

Rédiger lintroduction de la question sur le corpus : un exemple

Rédiger l'introduction de la question sur le corpus : un exemple. Sujet proposé : quel regard ces textes portent-ils sur les femmes du peuple ?

Statistics in Corpus Linguistics

Statistical techniques intro- Corpus linguistics is a scientific method of language analysis. ... Statistics and scientific method: an introduction.

La synthèse de documents

la conclusion). On peut commencer par une phrase affirmative puis continuer par une phrase interro-négative. Exemple : Certes

Méthodologie « Analyse méthodique dun corpus dœuvres et

Méthodologie. « Analyse méthodique d'un corpus d'œuvres et réflexion sur certains aspects de la création artistique ». PREMIÈRE QUESTION DE L'ÉPREUVE

Towards a Methodology for a Corpus-Based Approach to

Évaluation : paramètres méthodes

[PDF] La question de corpus : cours et exemple Barème : Sur 4 points pour

Introduction: présentation synthétique du corpus proposé en ajoutant quelques infos (on ne se contente pas de reformuler ou paraphraser le paratexte) sur

[PDF] Fiche méthode : la question sur le corpus - Créer son blog

Méthode 1) Faire une introduction : • « Ce corpus est constitué de X textes » • Puis citez chaque texte avec son titre son auteur sa date de parution

[PDF] Méthode de la question sur corpus - Lettrines

Une introduction (un paragraphe) présente le corpus Elle indique le nom des auteurs et des œuvres ainsi que le genre et l'époque auxquels ils appartiennent

[PDF] Méthode de la question de corpus - Zone littéraire

- Introduction : Elle doit être rapide Vous présentez les documents du corpus selon le classement que vous avez trouvé (points communs et différences) et non

[PDF] LA QUESTION SUR CORPUS A LEPREUVE ECRITE- METHODE

? Pour faire l'introduction : 1) Rappelez les titres des œuvres dont les textes du corpus sont extraits et le nom de leurs auteurs (ces indications figurent

Méthode de la question de corpus - Maxicours

Une fois ce travail au brouillon terminé il faut passer à la rédaction L'introduction et la conclusion doivent être courtes (contrairement aux travaux d'

[PDF] La méthodologie de la question sur corpus

Ce corpus est en relation avec un ou plusieurs objets d'étude du programme Le candidat doit traiter une ou deux questions le conduisant à confronter les textes

[PDF] Méthodologie 1) Létude dun corpus de textes - cloudfrontnet

Procéder à une 1ère approche globale du corpus : à quels objet(s) L'introduction amène le sujet expose la problématique en reprenant la citation ou

[PDF] Les corpus numériques pour laide à lécriture académique - HAL

réflexions méthodes Corpus écrits universitaires et vocabulaire de spécialité Introduction de la phraséologie 4 Conclusion C Cavalla - ACFAS 2019

[PDF] Introduction 1 Présentation du corpus - Université Côte dAzur

11 juil 2012 · Dans un pre- mier temps il s'agira de rendre compte des différents travaux en francophonie Il se- ra question par exemple d'étudier les

Comment faire l'introduction d'un corpus ?
Pour l'introduction, il suffit de présenter l'objet d'étude (le théâtre, l'argumentation) et le thème. Puis il faut reformuler la question dans une tournure indirecte. Par exemple, il conviendra de se demander quels sont les registres utilisés par les auteurs dans ces différents textes argumentatifs.
Comment bien Ecrire un corpus ?
Lisez tous les documents et les paratextes pour trouver des points communs. Reformulez l'idée principale de chaque texte. Définissez le thème général du corpus. Confrontez les documents : chercher comment ces idées se nuancent, se complètent ou au contraire se contredisent.
Qu'est-ce qu'un corpus exemple ?
Un corpus est un ensemble de documents, artistiques ou non (textes, images, vidéos, etc. ), regroupés dans une optique précise. On peut utiliser des corpus dans plusieurs domaines : études littéraires, linguistiques, scientifiques, philosophie, etc.
Si vous définissez votre corpus autour d'un thème ou d'une notion, la meilleure méthode consiste à définir une série de mots-clés et de synonymes pertinents par lesquels vous interrogerez les répertoires, catalogues et bases de données de livres anciens.

Statistics in Corpus Linguistics

Do you use language corpora in your research or study, butnd that you struggle with statistics? This practical introduction will equip you to understand the key principles of statistical thinking and apply these concepts to your own research, without the need for prior statistical knowledge. The book gives step-by-step guidance through the process of statistical analysis and provides multiple examples of how statistical techniques can be used to analyse and visualize linguistic data. It also includes a useful selection of discussion questions and exercises which you can use to check your understanding. The book comes with a companion website, which provides additional materials (including answers to exercises, datasets, advanced materials, teaching slides etc.) and Lancaster Stats Tools online (http://corpora.lancs.ac.uk/stats), a free click-and-analyse statistical tool for easy calculation of the statistical measures discussed in the book. vaclav brezinais a senior lecturer at the Department of Linguistics and English Language, Lancaster University. He specializes in corpus linguistics, statistics and applied linguistics, and has designed a number of different tools for corpus analysis.

Statistics in Corpus

Linguistics

A Practical Guide

VACLAV BREZINA

Lancaster University

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA

477 Williamstown Road, Port Melbourne, VIC 3207, Australia

314
-321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,

New Delhi-110025, India

79 Anson Road, #06

-04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge.

It furthers the University

"s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107125704

DOI: 10.1017/9781316410899

© Vaclav Brezina 2018

This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.

First published 2018

Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall A catalogue record for this publication is available from the British Library. Library of Congress Cataloging-in-Publication Data

Names: Brezina, Vaclav, 1979-author.

Title: Statistics in corpus linguistics : a practical guide / Vaclav Brezina, Lancaster University. Description: Cambridge ; New York : Cambridge University Press, 2018. |

Includes bibliographical references and index.

Identi

ers: LCCN 2018007010 | ISBN 9781107125704 (alk. paper) Subjects: LCSH: Corpora (Linguistics) | Linguistics-Statistical methods. Classication: LCC P128.C68 B76 2018 | DDC 410.1/88-dc23 LC record available at https://lccn.loc.gov/2018007010

ISBN 978-1-107-12570-4 Hardback

ISBN 978-1-107-56524-1 Paperback

Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

List of Figures pagex

List of Tablesxiv

About This Bookxvii

Acknowledgementsxix

1.1 What Is This Chapter About? 1

1.2 What Is Statistics? Science, Corpus Linguistics and Statistics 1

1.3 Basic Statistical Terminology 5

1.4 Building of Corpora and Research Design 15

1.5 Exploring Data and Data Visualization 22

1.6 Application and Further Examples: Do Fiction Writers Use

More Adjectives than Academics? 30

1.7 Exercises 32

Things to Remember 36

Advanced Reading 36

2.1 What Is This Chapter About? 38

2.2 Tokens, Types, Lemmas and Lexemes 38

2.3 Words in a Frequency List 42

2.4 The Whelk Problem: Dispersion 46

2.5 Which Words Are Important? Average Reduced Frequency 53

2.6 Lexical Diversity: Type/Token Ratio (TTR), STTR and MATTR 57

2.7 Application and Further Examples: Do the British Talk about

Weather All the Time? 59

2.8 Exercises 62

Things to Remember 64

Advanced Reading 65

Reliability of Manual Coding66

3.1 What Is This Chapter About? 66

3.2 Collocations and Association Measures 66

3.3 Collocation Graphs and Networks: Exploring Cross-associations 75

3.4 Keywords and Lockwords 79

3.5 Inter-rater Agreement Measures 87

3.6 Application and Further Examples: What Do Readers of British

Newspapers Think about Immigration? 92

3.7 Exercises 96

Things to Remember 100

Advanced Reading 101

102

4.1 What Is This Chapter About? 102

4.2 Analysing a Lexico-grammatical Feature 103

4.3 Cross-tabulation, Percentages and Chi-squared Test 108

4.4 Logistic Regression 117

4.5 Application:ThatorWhich? 130

4.6 Exercises 134

Things to Remember 137

Advanced Reading 138

139

5.1 What Is This Chapter About? 139

5.2 Relationships between Variables: Correlations 139

5.3 Classification: Hierarchical Agglomerative Cluster Analysis 151

5.4 Multidimensional Analysis (MD) 160

5.5 Application: Registers in New Zealand English 170

5.6 Exercises 177

Things to Remember 181

Advanced Reading 182

Variation183

6.1 What Is This Chapter About? 183

6.2 Individual Style and Social Variation: Where Does a

Sociolinguistic Variable Start? 183

6.3 Group Comparison: T-Test, ANOVA, Mann-WhitneyUTest,

Kruskal-Wallis Test 186

6.4 Individual Style: Correspondence Analysis 199

6.5 Linguistic Context: Mixed-Effects Models 207

6.6 Application: Who Is This Person from the White House? 211

6.7 Exercises 215

Things to Remember 217

Advanced Reading 218

219

7.1 What Is This Chapter About? 219

7.2 Time as a Variable: Measuring and Visualizing Time 219

viiicontents

7.3 Finding and Interpreting Differences: Percentage Change and

the Bootstrap Test 229

7.4 Grouping Time Periods: Neighbouring Cluster Analysis 235

7.5 Modelling Changes in Discourse: Peaks and Troughs and UFA 241

7.6 Application: Colours in the Seventeenth Century 247

7.7 Exercises 251

Things to Remember 255

Advanced Reading 256

Thinking, Meta-analysis and Effect Sizes257

8.1 What Is This Chapter About? 257

8.2 Ten Principles of Statistical Thinking 257

8.3 Meta-analysis: Statistical Synthesis of Research Results 267

8.4 Effect Sizes: A Guide for Meaningful Use 275

8.5 Exercises 280

Things to Remember 282

Advanced Reading 282

Final Remarks283

References285

Index294

Figures

1.1 The relationship between the relative frequency of

adjectives and verbspage4

1.2 Process of statistical analysis6

1.3 Example of a dataset7

1.4 The distribution of therst-person pronoun in theTrinity

Lancaster Corpus9

1.5 Standard normal distribution9

1.6 Dispersion of adjective frequencies in 11 corpusles11

1.7 Condence intervals: two situations14

1.8 Research designs in corpus linguistics21

1.9 Bar chart: variablexin three corpora24

1.10 Boxplot: variablexin three corpora24

1.11 Error bars: variablexin three corpora25

1.12 Histogram: the de

nite article in BE0626

1.13 Histogram: the f-word in BNC6426

1.14 Scatterplot:theandIin BNC6427

1.15 Scatterplot:the,Iandyouin BNC6428

1.16 Top ten places connected withgoing"ortravelling"in

the BNC28

1.17 Other types of visualizations29

1.18 The use of adjectives byction and academic writers:

boxplot31

1.19 The use of adjectives byction and academic writers: error bars 32

1.20 Great Britain: main island33

2.1 Distribution of word frequencies in the BNC45

2.2 Example corpus: calculation ofSD49

2.3 Distribution of wordsw

1 andw 2 55

3.1 Frequency and exclusivity scale74

3.2 Collocation graph:love"in BE06 (10a-log Dice (7), L3-R3,

C5-NC5)76

3.3 Collocation networks: concept demonstration77

3.4 Third-order collocates of time in LOB (3a-MI(5), R5-L5,

C4-NC4; nolter applied)78

3.5 Collocation network ofuniversity"based on BE06 (3b-MI(3),

L5-R5, C8-NC8)79

3.6 Collocation networks aroundimmigrants"in theGuardian

(3a-MI(6), R5-L5, C10-NC10; nolter applied)94

3.7 Collocation networks aroundimmigrants"in theDaily Mail

(3a-MI(6), R5-L5, C20-NC20; nolter applied)94

3.8 Selected collocation networks97

4.1 The de

nite and indenite articles in BNC subcorpora 104

4.2Thevsa(n)dataset: linguistic feature design (an excerpt) 105

4.3 A mosaic plot: article type by contextual determination 109

4.4 Logistic regression: a basic schema119

4.5 Article use in English: a dataset (an excerpt)122

4.6 A sentence from this book corrected forgrammar"130

4.7 Visualization of the relationship betweenwhichandthatand

a separator132

4.8Must,have toandneed toin British English (BE06) 135

5.1 Nouns and adjectives in BE06140

5.2 Verbs and adjectives in BE06140

5.3 Pronouns and coordinators in BE06141

5.4 Correlation:ve data points143

5.5 Correlation: covariance143

5.6 Statistically signicant (p <0.05) Pearson"s correlations

in relation to the number of observations145

5.7 Multi-panel scatterplot: nouns, adjectives, verbs, pronouns

and coordinators149

5.8 Correlation matrix: nouns, adjectives, verbs, pronouns and

coordinators150

5.9 Colour terms in the BNC152

5.10 Creating clusters: Steps 1-4155

5.11 Creating clusters:nal result156

5.12 Colour terms: a tree plot (dendrogram)-z-score

2 normalized,

Euclidean distance, SLINK method156

5.13 Tree plot: SLINK method157

5.14 Tree plot: CLINK method157

5.15 Tree plot: average linkage method158

5.16 Tree plot: Ward"s method159

5.17 A dataset for multidimensional analysis (a small extract) 164

5.18 Data reduction: ten variables into two factors165

5.19 Promax factor rotation166

5.20 Factor extraction: scree plot167

5.21 Mean scores of registers placed on Dimension 1: Involved vs

Informational169

5.22 Correlation matrix: 44 variables173

5.23 Correlation between mean word length and contractions:

List of Figuresxi

5.24 Cluster plot: registers in New Zealand English175

5.25 Dimension 1: New Zealand English-full MD analysis 177

5.26 Dimension 2: New Zealand English-full MD analysis 177

5.27 Relationship between mean word length (number of characters)

and mean sentence length (number of words) in BNC 178

5.28 Relationship between the use of the past and the present tense

in BE06178

5.29 Relationship between the use of adjectives and colour terms in

BE06179

5.30 Relationship between text length (tokens) and type-token ratio

(TTR) in BNC179

5.31 Dimension 3181

5.32 Dimension 4181

6.1 Distribution of personal pronouns in BNC64 female speakers 188

6.2 ANOVA calculation: between-group variance (top), within-

group variance (bottom)193

6.3 Dataset from BNC64-relative frequencies and ranks: use

of personal pronouns195

6.4 Distribution ofain'tin BNC64 speakers: social-class effect 198

6.5Ain'tin BNC64: 95% CI198

6.6 A correspondence plot: word classes in the speech of individual

speakers201

6.7 Speaker (row) pro

les: Euclidean distance204

6.8 Speaker (row) pro

les: chi-squared distance205

6.9 Sociolinguistic dataset: internal and external factors

(an excerpt)208

6.10 Mixed-effects models: output209

6.11 Correspondence analysis: use of word classes by White

House press secretaries214

6.12 Correspondence analysis: use of epistemic markers in BNC64 216

7.1 Modals in the Brown family corpora220

7.2 Modals in the Brown family corpora: an alternative interpretation 223

7.3 Google n-gram viewer:man"andwoman"224

7.4 Modals in the Brown family corpora: original (top) and

rescaled (bottom)225

7.5 Modals in British English: (a) boxplots; (b) 95%

CI error bars227

7.6 Candlestick plot: the development of individual modals

1931
-2006228

7.7 Bootstrapping: demonstration of the concept231

7.8 Example of a dataset for the bootstrap test:itsin EEBO 233

7.9 Data points over time: an invented example235

xiilist of Þgures

7.10 Two clustering principles: (a) hierarchical agglomerative

clustering; (b) variability-based neighbour clustering 237

7.11 Dendrograms: (a) hierarchical agglomerative clustering;

(b) variability-based neighbour clustering238

7.12 Dendrogram: use of the possessive pronounitsin the

seventeenth century239

7.13 Scree plot: use of the possessive pronounitsin the

seventeenth century240

7.14 Resulting peaks and troughs graphs: settings as indicated 244

7.15 Results of UFA forwar1940-2009 (3a-MI(3), L5-R5,

C10relative-NC10relative; AC1)246

7.16 Frequency of colour terms in the seventeenth century 248

7.17 Candlestick plot: colours in the seventeenth century249

7.18 Results of UFA forred1600-99 (3a-MI(3), L5-R5,

C10relative-NC10relative; AC1)250

7.19 VNC:redin the seventeenth century251

7.20 Number of tweets related to an episode of the UKX-Factor

(16/11/2014, 7-11pm)252

7.21 Development of frequencies ofhandsome,prettyand

beautifulfollowed by a male (M) or female (F) person in the seventeenth century252

7.22 Development of frequencies of the possessive pronounits

in the seventeenth century253

7.23 Four frequency change scenarios254

7.24Handsomein the seventeenth century254

7.25Prettyin the seventeenth century255

8.1 Overview of genres in BE06 (Baker 2009)260

8.2 Past tense in different written genres of BE06260

8.3 Past tense (a) and present tense (b)

in different written genres of BE06: boxplot rendition 265

8.4 Finding the Globe268

8.5 Forest plot: meta-analysis of four studies274

8.6 Comparison of two subcorpora278

8.7 Forest plot: example 1281

8.8 Forest plot: example 2281

List of Figuresxiii

1.1 The effect sizerand its standard interpretationpage14

1.2 Brown family sampling frame 16

1.3 Frequencies of selected words and expressions in three English

corpora 19

1.4 Different levels of analysis in corpus linguistics 20

1.5 Subcorpora in mini-research 30

2.1 Type, lemma and lexeme: advantages and disadvantages 41

2.2 Top ten words in the BNC 42

2.3 Example corpus: one million tokens 47

2.4 Calculation of DP with the example corpus 53

2.5 BE06 60

2.6 Weather-related lemmas in BE06 61

2.7 Ranks of weather-related lemmas in BE06 62

2.8 BNC: distribution of four selected words 64

3.1 Observed frequencies 70

3.2 Expected frequencies: random occurrence baseline 71

3.3 Association measures: overview 72

3.4 Ranking of collocates of'new'in BE06 (L3-R3) 73

3.5 Collocation parameters notation (CPN) 75

3.6 AmE06: American English keywords 80

3.7 Decisions about keywords: BASIC options 81

3.8 Comparison of selected lexical items in BE06 and AmE06 83

quotesdbs_dbs16.pdfusesText_22

[PDF] rapport d'expertise médicale modèle

[PDF] expertise medicale suite accident travail

[PDF] rapport expertise medicale assurance

[PDF] rapport d'expertise judiciaire batiment

[PDF] proces verbal police municipale

[PDF] proces verbal de contravention police municipale

[PDF] rapport d'infraction gendarmerie

[PDF] exemple de procès verbal de police

[PDF] proces verbal blanc

[PDF] beauchamp et childress les principes de l'éthique biomédicale

[PDF] non malfaisance infirmier

[PDF] 4 ans ne veut pas grandir

[PDF] principes éthiques soins infirmiers

[PDF] dessin d observation d une feuille

[PDF] mon fils ne veut pas travailler au collège

[PDF] Statistics in Corpus Linguistics

Comment faire l'introduction d'un corpus ?

Comment bien Ecrire un corpus ?

Qu'est-ce qu'un corpus exemple ?

Statistics in Corpus Linguistics

Statistics in Corpus

Linguistics

A Practical Guide

VACLAV BREZINA

Lancaster University

477 Williamstown Road, Port Melbourne, VIC 3207, Australia

New Delhi-110025, India

79 Anson Road, #06

It furthers the University

DOI: 10.1017/9781316410899

© Vaclav Brezina 2018

First published 2018

Names: Brezina, Vaclav, 1979-author.

Includes bibliographical references and index.

Identi

ISBN 978-1-107-12570-4 Hardback

ISBN 978-1-107-56524-1 Paperback

List of Figures pagex

List of Tablesxiv

About This Bookxvii

Acknowledgementsxix

1.1 What Is This Chapter About? 1

1.2 What Is Statistics? Science, Corpus Linguistics and Statistics 1

1.3 Basic Statistical Terminology 5

1.4 Building of Corpora and Research Design 15

1.5 Exploring Data and Data Visualization 22

1.6 Application and Further Examples: Do Fiction Writers Use

More Adjectives than Academics? 30

1.7 Exercises 32

Things to Remember 36

Advanced Reading 36

2.1 What Is This Chapter About? 38

2.2 Tokens, Types, Lemmas and Lexemes 38

2.3 Words in a Frequency List 42

2.4 The Whelk Problem: Dispersion 46

2.5 Which Words Are Important? Average Reduced Frequency 53

2.6 Lexical Diversity: Type/Token Ratio (TTR), STTR and MATTR 57

2.7 Application and Further Examples: Do the British Talk about

Weather All the Time? 59

2.8 Exercises 62

Things to Remember 64

Advanced Reading 65

Reliability of Manual Coding66

3.1 What Is This Chapter About? 66

3.2 Collocations and Association Measures 66

3.3 Collocation Graphs and Networks: Exploring Cross-associations 75

3.4 Keywords and Lockwords 79

3.5 Inter-rater Agreement Measures 87

3.6 Application and Further Examples: What Do Readers of British

Newspapers Think about Immigration? 92

3.7 Exercises 96

Things to Remember 100

Advanced Reading 101

4.1 What Is This Chapter About? 102

4.2 Analysing a Lexico-grammatical Feature 103

4.3 Cross-tabulation, Percentages and Chi-squared Test 108

4.4 Logistic Regression 117

4.5 Application:ThatorWhich? 130

4.6 Exercises 134

Things to Remember 137

Advanced Reading 138

5.1 What Is This Chapter About? 139

5.2 Relationships between Variables: Correlations 139

5.3 Classification: Hierarchical Agglomerative Cluster Analysis 151

5.4 Multidimensional Analysis (MD) 160

5.5 Application: Registers in New Zealand English 170

5.6 Exercises 177

Things to Remember 181

Advanced Reading 182

Variation183

6.1 What Is This Chapter About? 183

6.2 Individual Style and Social Variation: Where Does a

Sociolinguistic Variable Start? 183

6.3 Group Comparison: T-Test, ANOVA, Mann-WhitneyUTest,

Kruskal-Wallis Test 186

1.16 Top ten places connected withgoing"ortravelling"in