[PDF] Automatic Identification of Cognates and False Friends in French





Previous PDF Next PDF



False Friends

French as in the first exercise. This highlights the falseness ... Note: French-speaking learners of English sometimes use cognates that are actually correct in.



Medical English and French international and pseudo– international

Sep 16 2019 Now let us see some sample exercises on translation of “false friends” from English into Russian. Assignment 1. Translate “false friends” into ...



False Cognates (les faux amis)

1 French words which look like English words but have a different meaning are called false cognates (faux amis). Following is a list of the most common of these 



Learning false friends across contexts - Author(s)

However we can all imagine the awkward situations that could arise if the French word promiscuité. (lack of privacy



Intensive ESL Teachers Guide

Cognates and “false friends”. Cognates are words in different languages with a For example the words nation and table are cognates in English and French.



Automatic Identification of Cognates and False Friends in French

and False Friends in French and English. Diana Inkpen and Oana Frunza • A set of exercises for Anglophone learners of French. (Treville 1990) (152 Cognate ...



The Occurrence of Calque in Translation Scripts

Jul 30 2001 used in French and English. Both interlingual cognates and false friends can clearly be the source of calques



FRAGMENT EXERCISES

Circle the letter of the sentence you think is incorrect: 1 In the small French town. The town clerk spends two hours every day talking in the cafe to friends ...



False Friends Between Czech and English

Apr 17 2022 Even though the main influence of French on English came after the Norman Conquest ... plural



new insights into the study of english false friends: their use and

There exist noticeable lexical similarities between English and Spanish (mostly due to the influence of Latin and French on English). exercises eh which are ...



Medical English and French international and pseudo– international

16 sept. 2019 sample training exercises to revitalise the educational ... words translation of “false friends” from English and. French into Russian ...



A Complete French Grammar for Reference and Practice

False Cognates (les faux amis). French words which look like English words and have the same meaning are called cognates (mots apparentes).1 French.





False Friends

Sometimes English words look like French words but the meanings are different. They are false friends. Translate the underlined false friends into French.



Writing Tips Claires Clear

'clear' in French) suggests she is an expert on clear writing. These tips on writing in English ... Beware of false friends (faux amis).



Common Mistakes and False Friends - PDF Vocabulary Worksheet

The doctor told me to. as often as possible. (DO EXERCISE / MAKE. EXERCISE / EXERCISE). 2. I started to learn French with the aim. a teacher. (OF BECOMING /.



A Comparative Review of the False Friends (Macedonian- Fench

The macedonian term ????? ?????? (false pairs) is used as an equivalent for the English term false friends the. French term faux amis and the German term 



Automatic Identification of Cognates False Friends

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.148.9112&rep=rep1&type=pdf



Automatic Identification of Cognates and False Friends in French

focus on French and English but the methods English cognates and false friends from bitexts ... A set of exercises for Anglophone learners of. French ...

inFrenchandEnglish

DianaInkpenandOanaFrunza

SchoolofInformationTechnologyandEng.

UniversityofOttawa

Ottawa,ON,K1N6N5,Canada

DepartmentofComputingScience

UniversityofAlberta

Edmonton,AB,T6G2E8,Canada

kondrak@cs.ualberta.ca

Abstract

Cognatesarewordsindierentlanguagesthat

havesimilarspellingandmeaning.Theycan helpasecond-languagelearneronthetasksof vocabularyexpansionandreadingcomprehen- sion.Thelearneralsoneedstopayattention topairsofwordsthatappearsimilarbutarein factfalsefriends:theyhavedierentmeaning weproposeamethodtoautomaticallyclassify apairofwordsascognatesorfalsefriends.We focusonFrenchandEnglish,butthemethods areapplicabletootherlanguagepairs.Weuse featuresforclassication.Westudytheimpact andcombiningthemthroughmachinelearning techniques.

Keywords:similaritymeasures,machine

learning,cognatesandfalsefriends,second- languagelearning,machinetranslation.

1Introduction

Whenlearningasecondlanguage,astudentcan

benetfromknowledgeinhis/herrstlanguage (Gass87)(Ringbom87).Cognates{wordsthat arealsopairsofwordsthatappearsimilar,but havedierentmeaninginsomeorallcontexts: 96).

Cognateshavealsobeenemployedinnatural

sentencealignment(Simardetal.92;Melamed

99),inducingtranslationlexicons(Mann&

drak&Dorr04).Allthoseapplicationsdepend onaneectivemethodofidentifyingcognatesby computinganumericalscorethatre ectsthelike- tures.Thenweexplorevariouswaystocombine ingtechniquesfromtheWekapackage(Witten& asCognates;otherwise,theyareassumedtobe

False-Friends.

AlthoughFrenchandEnglishbelongtodier-

entbranchesoftheIndoeuropeanfamilyoflan- guages,theyshareanextraordinaryhighnum- berofcognates.Thecognatesderivefromsev-

LatinandGreekoriginthatpermeatethevocab-

veryold,\genetic"cognatesgobackalltheway toProto-Indoeuropean,e.g.,mere-motherand pied-foot.Othercognatescanbetracedtothe lapseoftheRomanEmpire,andbytheperiodof

FrenchdominationofEnglandaftertheNorman

conquest.

WhileourfocusisonFrenchandEnglish,the

methodsthatwedescribearealsoapplicableto otherlanguagepairs.Nowadays,newtermsre- latedtomoderntechnologyareoftenadopted hasthesamemeaning.

2RelatedWork

Previousworkonautomaticcognateidentica-

tionismostlyrelatedtobilingualcorporaand translationlexicons.Simardetal.(Simardet al.92)usecognatestoalignsentencesinbi- texts.Theyemployaverysimpletest:French-

Englishwordpairsareassumedtobecognatesif

McKelvie(Brew&McKelvie96)extractFrench-

sures.MannandYarowsky(Mann&Yarowsky thebasisofcognatepairs.Theyfoundthatedit hiddenMarkovmodelsandstochastictransduc- guagesbycombiningthephoneticsimilarityof

Kondrak&Dorr04)reportthatasimpleaverage

performsallindividualmeasuresonthetaskof theidenticationofdrugnames.

ForFrenchandEnglish,substantialwork

oncognatedetectionwasdonemanually.

LeBlancandSeguin(LeBlanc&Seguin96)

(Dubois81).6,447ofthecognateshadidenti- cognatesappeartomakeupover30%ofthevo- cabulary.

Theuseofcognatesinsecondlanguageteach-

versionfromEnglishtoFrenchwerealsoproved rules.Anexampleis:cal!queinpairssuchas

3.1Denitions

Weadoptthefollowingdenitions.Thedeni-

arepairsofFrenchandEnglishwords,respec- tively.

Cognates,orTrueFriends(VraisAmis),are

pairsofwordsthatareperceivedassimilarand reconnaissance.

FalseFriends(FauxAmis)arepairsofwords

intwolanguagesthatareperceivedassimilar buthavedierentmeanings,e.g.,main\hand"- main,blesser\toinjure"-bless.

PartialCognatesarepairsofwordsthathave

thesamemeaninginbothlanguagesinsomebut notallcontexts.Theybehaveascognatesoras ineachcontext.Forexample,inFrench,fac- whileetiquettecanalsomean\label".

GeneticCognatesarewordpairsinrelated

intheancestor(proto-)language.Becauseof periodsoftime,geneticcognatesoftendierin formand/ormeaning,e.g.,pere-father,chef- otheratsomepointoftime,suchasconcierge.

Unrelatedpairsarewordsthatexhibitnoor-

e.g.,glace-chair.

3.2OrthographicSimilarityMeasures

subjective.Inthissection,webrie ydescribethe measuresthatweuseasfeaturesforthecognate classicationtask.

IDENTisabaselinemeasurethatreturns1

ifthewordsareidentical,and0otherwise.

PREFIXisasimplemeasurethatreturnsthe

lengthofthecommonprexdividedbythe lengthofthelongerstring.1E.g.,thecom- ofSimardetal.(Simardetal.92)approach. monprexforfactoryandfabriquehaslength

2(thersttwoletters)which,dividedbythe

lengthof8,yields0:25.

DICE(Adamson&Boreham74)iscalcu-

latedbydividingtwicethenumberofshared letterbigramsbythetotalnumberofbigrams inbothwords:

DICE(x;y)=2jbigrams(x)\bigrams(y)j

jbigrams(x)j+jbigrams(y)j wherebigrams(x)isamulti-setofcharac- terbigramsinwordx.E.g.,DICE(colour, couleur)=6/11=0.55(thesharedbigrams areco,ou,ur).

TRIGRAMisdenedinthesamewayas

DICE,butemploystrigramsinsteadofbi-

grams.

XDICE(Brew&McKelvie96)isalsodened

inthesamewayasDICE,butemploys\ex- tendedbigrams",whicharetrigramswithout themiddleletter.

XXDICE(Brew&McKelvie96)isanexten-

sionoftheXDICEmeasurethattakesinto accountthepositionsofbigrams.Eachpair ofsharedbigramsisweightedbythefactor: 1

1+(pos(a)pos(b))2

wherepos(a)isthestringpositionofthebi- grama.2

LCSR(Melamed99)standsfortheLongest

CommonSubsequenceRatio,andiscom-

putedbydividingthelengthofthelongest commonsubsequencebythelengthofthe longerstring.E.g.,LCSR(colour,couleur) =5/7=0.71

NEDisanormalizededitdistance.Theedit

distance(Wagner&Fischer74)iscalculated bycountsuptheminimumnumberofedit operationsnecessarytotransformoneword intoanother.Inthestandarddenition,the anddeletions,allwiththecostof1.Anor- malizededitdistanceisobtainedbydividing thetotaleditcostbythelengthofthelonger string. proximationtophoneticnamematching.

SOUNDEXtransformsallbuttherstletter

tonumericcodesandafterremovingzeroes

Forthepurposesofcomparison,ourimple-

mentationofSOUNDEXreturnstheeditdis- tancebetweenthecorrespondingcodes.

BI-SIM,TRI-SIM,BI-DIST,andTRI-DIST

belongtoafamilyofn-grammeasures(Kon- drak&Dorr04)thatgeneralizeLCSRand

NEDmeasures.Thedierenceliesincon-

sideringletterbigramsortrigramsinstead ofsingleletter(i.e.,unigrams).Forexam- ple,BI-SIMndsthelongestcommonsub- sequenceofbigrams,whileTRI-DISTcalcu- latestheeditdistancebetweensequencesof trigrams.n-gramsimilarityiscalculatedby theformula: s(x1:::xn;y1:::yn)=1 nP n i=1id(xi;yi) whereid(a;b)returns1ifaandbareidenti- cal,and0otherwise.

4TheData

pairsofFrenchandEnglishwords(seeTable1). andexpressions.(Afterexcludingmulti- wordexpressions,wemanuallyclassied203 pairsasCognatesand527pairsasUnre- lated.)

2.Amanuallyword-alignedbitext(Melamed

98).(Wemanuallyidentied258Cognate

pairsamongthealignedwordpairs.)

3.AsetofexercisesforAnglophonelearnersof

French(Treville90)(152Cognatepairs).

nates"(314False-Friends).

Aseparatetestsetiscomposedof1040pairs

(seeTable1),extractedfromthefollowing sources:

1.Arandomsampleof1000wordpairsfrom

anautomaticallygeneratedtranslationlex- icon.(Wemanuallyclassied603pairsas

Cognatesand343pairsasUnrelated.)

TrainingsetTestset

Cognates613(73)603(178)

False-Friends314(135)94(46)

Unrelated527(0)343(0)

Total14541040

Table1:Thecompositionofdatasets.Thenum-

identical(ignoringaccents).

EnglishFalseCognates"(94additionalFalse-

Friends).

Inordertoavoidanyoverlapbetweenthetwo

sets,weremovedfromthetestsetallpairsthat happenedtobealreadyincludedinthetraining set.Thedatasethasa2:1imbalanceinfavour intheexperimentspresentedinSection5).All tionpairs.Itwouldhavebeeneasytoaddmore sampletranslationlexicons.

5Evaluation

Wepresentevaluationexperimentsusingthe

twodatasetsdescribedinSection4:atrain- returnedbyallthe13measures.Then,inorder tocombinethemeasures,werunseveralmachine learningclassiersfromtheWekapackage.

5.1ResultsontheTrainingDataSet

measure,weneedtochooseaspecicsimilarity fromtheUnrelatedpairs.FortheIDENTmea-

OrthographicThresholdAccuracy

similaritymeasure

IDENT143.90%

PREFIX0.0384592.70%

DICE0.2966989.40%

LCSR0.4580092.91%

NED0.3484593.39%

SOUNDEX0.6250085.28%

TRI0.047688.30%

XDICE0.2182592.84%

XXDICE0.1291591.74%

BI-SIM0.3798094.84%

BI-DIST0.3416594.84%

TRI-SIM0.3484595.66%

TRI-DIST0.3484595.11%

Averagemeasure0.1477093.83%

Table2:Resultsofeachorthographicsimilar-

Thelastlinepresentsanewmeasurewhichisthe

averageofallmeasuresforeachpairofwords. thebestsplit.Thevaluesofthethresholdsob- tainedinthiswayarealsoincludedinTable2.

Thetrainingdatasetformachinelearningex-

ingclassiersfromtheWekapackage:OneRule(a sionofSupportVectorMachine.

TheDecisionTreeclassierhastheadvantage

aninstanceasUnrelatediftheBI-SIMvalueis greaterthan0.3.Sinceallmeasuresattemptto todissimilarpairs,thepresenceofsuchanode eringthecondencelevelthresholdfromthede- faultCF=0:25untilweobtainedatreewithout

ClassierAccuracyonAccuracy

trainingsetcross-val

Baseline63.75%63.75%

OneRule95.94%95.66%

NaiveBayes94.91%94.84%

DecisionTrees97.45%95.66%

DecTree(pruned)96.28%95.66%

IBK99.10%93.81%

AdaBoost95.66%95.66%

Perceptron95.73%95.11%

SVM(SMO)95.66%95.46%

TRI-SIM<=0.3333

|TRI-SIM<=0.2083:UNREL(447.0/17.0) |TRI-SIM>0.2083 ||XDICE<=0.2:UNREL(97.0/20.0) ||XDICE>0.2 |||BI-SIM<=0.3:UNREL(3.0) |||BI-SIM>0.3:CG_FF(9.0)

TRI-SIM>0.3333:CG_FF(898.0/17.0)

Figure1:ExampleofDecisionTreeclassier,

CF=16%).

ure1).Ourhypothesiswasthatthelattertree wouldperformbetteronatestset.

Theresultspresentedintherightmostcolumn

ontrainingdataset(thedataisrandomlysplit in10parts,aclassieristrainedon9partsand ingset:theyarearticiallyhigh,duetoover- training.ThebaselinealgorithmintheTable

3alwayschoosesthemostfrequentclassinthe

dataset,whichhappenedtobeCognates/False-

Friends.Thebestclassicationaccuracy(for

OneRule,andAdaBoost(95.66%).Theperfor-

manceequalstheoneachievedbytheTRI-SIM measurealoneinTable2.

Erroranalysis:Weexaminedthemisclassi-

edpairsfortheclassiersbuiltonthetraining data.Thereweremanysharedpairsamongthe

60{70pairsmisclassiedbyseveralofthebest

formduetochangesoflanguageovertime.False theword,whichisastrongclueofcognation.

Also,thepresenceofanidenticalprexmade

unlessthewordrootsarerelated.

5.2ResultsontheTestSet

obtainedonthetestsetdescribedinSection4. asfeatures.Theclassiersaretheonesbuilton thetrainingset.

Therankingofmeasuresonthetestsetdif-

fersfromtherankingobtainedonthetraining set,whichmaybecausedbytheabsenceofge- averageoforthographicmeasures.Thepruned

DecisionTreeshowninFigure1achieveshigher

stillbelowthesimpleaverage.Amongthein- dividualorthographicmeasures,XXDICEper-

Englishcognatesreportedin(Brew&McKelvie

set.Weconcludethatourclassiersaregeneric enough:theyperformverywellonthetestset.

5.3ResultsontheGeneticCognates

Dataset

Greenberg(Greenberg87)givesalistof\most

ofthecognatesfromFrenchandEnglish".The todemonstratethatFrenchandEnglisharege- cognatesbetweenthosetwolanguages.Wetran- scribedthelistof82cognatepairsfromIPAto standardorthography.Weaugmentedthelist

DataCorpus5and17pairsthatweidentiedour-

selves.Thenallistcontains113truegenetic cognatesthatgobacktoProto-Indoeuropean6.

ClassierAccuracyAccuracy

(measureorongeneticontest combination)cognatessetset

IDENT1.76%55.00%

PREFIX36.28%90.97%

DICE13.27%93.37%

LCSR24.77%94.24%

NED23.89%93.57%

SOUNDEX39.82%84.54%

TRI4.42%92.13%

XDICE15.92%94.52%

XXDICE13.27%95.39%

BI-SIM29.20%93.95%

BI-DIST29.20%94.04%

TRI-SIM35.39%93.28%

TRI-DIST34.51%93.85%

Averagemeasure36.28%94.14%

Baseline|66.98%

OneRule35.39%92.89%

NaiveBayes29.20%94.62%

DecisionTrees35.39%92.08%

DecTree(pruned)38.05%93.18%

IBK43.36%92.80%

AdaBoost35.39%93.47%

Perceptron42.47%91.55%

quotesdbs_dbs1.pdfusesText_1
[PDF] false friends english french list pdf

[PDF] false friends english french pdf

[PDF] false friends list

[PDF] famille d'hier et d'aujourd'hui

[PDF] famille de maupassant

[PDF] famille negriere nantes

[PDF] famille traditionnelle et famille moderne

[PDF] family visitor visa uk

[PDF] famous english short stories pdf

[PDF] famous english story books pdf free download

[PDF] famous novels in english pdf

[PDF] famous short stories in english literature pdf

[PDF] fantasia air economie antwoorden

[PDF] fao statistique algerie

[PDF] fardeau excessif immigration canada