near, alongside Antonyms- far from, behind, in front beside Definition- without emotion Synonyms- peaceful, quiet Antonyms- wild, angry calm Definition-
Write the letter of the antonym next to the appropriate word 1 ______ happy a bury 2 ______smooth b cry 3 ______ hate c different 4 ______win d tall 5
native speakers consider antonyms, they have lim- formed poorly on synonyms and antonyms com- cessfully integrated along the syntactic path to rep-
PDF document for free
- PDF document for free
![[PDF] Learning Antonyms with Paraphrases and a Morphology-aware [PDF] Learning Antonyms with Paraphrases and a Morphology-aware](https://pdfprof.com/EN_PDFV2/Docs/PDF_1/7106_1learning_antonyms.pdf.jpg)
7106_1learning_antonyms.pdf LearningAntonymswithParaphrases andaMor phology-awareNeural
Network
SnehaRajana
!
ChrisCallison-Burch
!
MariannaApidianaki
!!
VeredShwartz
" ! ComputerandInformation ScienceDepartment,Uni versity ofPennsylvania, USA !
LIMSI,CNRS,Uni versitéP aris-Saclay,91403Orsay
! ComputerScienceDepartment, Bar-IlanUni versity, Israel {srajana,ccb,marapi}@seas.upenn.eduvered1986@gmail.com
Abstract
Recognizinganddistinguishing antonyms
fromothertypes ofsemanticrelations is anessentialpart oflanguageunderstand- ingsystems. Inthispaper ,wepresent a novelmethodforderivingantonym pairs usingparaphrasepairscontainingneg ation markers.Wefurtherpropose aneuralnet- workmodel,AntNET,thatint egratesmor - phologicalfeaturesindicativeofantonymy intoapath-based relationdetectionalgo- rithm.W edemonstratethatourmodel outperformsstate-of-the-art modelsindis- tinguishingantonyms fromothersemantic relationsandis capableofef ficientlyhan- dlingmulti-w ordexpressions.
1Introduction
Identifyingantonymy andexpressionswithcon-
trastingmeaningsis valuablefor NLPsystems whichgobe yondrecognizingsemantic related- nessandrequire toidentifyspecific semanticre- lations.Whilemanually createdsemantictax- onomies,lik eWordNet(Fellbaum,1998),define antonymyrelationsbetweensome wordpairs that nativespeakersconsiderantonyms, theyhavelim- itedcov erage.Further,aseachtermof anantony- mouspaircan have manysemantically close terms,thecontrasting wordpairs faroutnum- berthosethat arecommonlyconsidered antonym pairs,andthe yremainunrecorded. Therefore, automatedmethods have beenproposedtodeter- minefora given term-pair(x,y),whetherxandy areanton ymsofeachother,based ontheiroccur - rencesina largecorpus.
CharlesandMiller (1989)putforw ardtheco-
occurrencehypothesis thatantonymsoccurto- getherina sentencemore oftenthanchance. How- ever,non-antonymoussemanticallyrelatedwords
ParaphrasePairAntonymP air
notsufficient/insuf ficientsufficient/insufficient insignificant/negligiblesignificant/negligible dishonest/lyinghonest/lying unusual/prettystrange usual/prettystrange
Table1:Examplesofanton ymsderiv edfrom
PPDBparaphrases.The antonympairs incolumn
2werederi vedfrom thecorrespondingparaphrase
pairsincolumn 1. suchash ypernyms,holon yms,meronyms,and near-synonymsalsotendtooccurtogethermore oftenthanchance. Thus,separatinganton yms frompairslinkedbyotherrelationshipshasproven tobedif ficult.Approachesto antonymdetec- tionhav eexploiteddistributionalvectorrepresen- tationsrelyingon thedistributional hypothesisof semanticsimilarity( Harris,1954;Firth,1957)that wordsco-occurringinsimilarconte xtstendto be semanticallyclose.T womain informationsources areusedto recognizesemanticrelations: path- basedanddistrib utional.Path-based methodscon- siderthejointoccurrencesofthe twoterms in agiv ensentenceandusethedependencypaths thatconnectthe termsasfeatures (Hearst,1992;
RothandSchulte imWalde ,2014;Schwartzetal.,
2015).For distinguishingantonymsfromother re-
lations,Linetal. (2003)proposedto useantonym patterns(suchas eitherXorYandfromXto
Y).Distributional methodsarebasedonthe dis-
jointoccurrencesof eachterm andhav erecently becomepopularusing wordembeddings (Mikolov etal.,2013;Penningtonetal. ,2014)which pro- videadistrib utionalrepresentationfor eachterm.
Recently,combinedpath-basedanddistrib utional
methodsforrelation detectionhav ealsobeen proposed(Shwartzetal.,2016;Nguyenet al.,
2017).They showedthatagoodpath representa-
tioncanpro videsubstantialcomplementary infor- mationtothe distributionalsignal fordistinguish- ingbetweendif ferentsemanticrelations.
Whileanton ymyappliestoexpressionsthat
representcontrastingmeanings,paraphrasesare phrasesexpressing thesamemeaning,whichusu- allyoccurin similartextual contexts( Barzilayand
McKeown,2001)orha vecommon translations
inotherlanguages (BannardandCallison-Burch ,
2005).Specifically, iftwowords orphrasesare
paraphrases,the yareunlikelytobe antonymsof eachother. Ourfirstapproachtoantonym de- tectionexploits thisfactandusesparaphrases for detectingandgenerating antonyms( Thedemen- torscaughtSiriusBlack/ Blackcouldnotescape thedementors ).We startbyfocusingonphrase pairsthatare mostsalientf orderiving antonyms.
Ourassumptionis thatphrases(or words)contain-
ingneg atingwords(orprefixes)aremorehelp- fulforidentifying opposingrelationshipsbetween term-pairs.F orexample,fromtheparaphrase pair (caught/notescape),wecan derive theantonym pair(caught/escape)by justremoving theneg at- ingword 'not'.
Oursecondmethod isinspiredby therecent
successofdeep learningmethodsfor relationde- tection.Shwartzetal.(2016)proposedan inte- gratedpath-basedand distributionalmodel toim- provehypernymydetectionbetweenterm-pairs, andlatere xtendeditto classifymultiplesemantic relations(ShwartzandDagan,2016)(Le xNET).
AlthoughLexNET wasthebestperformingsys-
teminthe semanticrelationclassification taskof theCogALex 2016sharedtask,themodelper - formedpoorlyon synonymsand antonymscom- paredtoother relations.Thepath-based compo- nentisweak inrecognizingsynon yms,which do nottendto co-occur,and thedistributional infor- mationcausedconfusion betweensynonyms and antonyms,sincebothtendto occurinthe same contexts.WeproposeAntNET,ano vele xten- sionofLe xNETthatinte gratesinformationabout negatingprefixesasane wmorphologicalpat- ternfeature andisable todistinguishanton yms fromothersemantic relations.Inaddition, weop- timizethev ectorrepresentationsof dependency pathsbetweenthe given termpair, encodedusing aneuralnetw ork,byreplacing theembeddingsof wordswithnegating prefixesby theembeddings ofthebase, non-negated, formsofthe words.
Forexample,forthe termpairunhappy/joyful,
werecordthe neg atingprefixof unhappyusing ane wpathfeatureandreplacethe wordembed- dingofunhappywithhappyinthevectorrepresen- tationofthe dependencypath betweenunhappy andsad.Theproposed modelimprov esthe path embeddingstobetter distinguishanton ymsfrom othersemanticrelations andgets higherperfor- mancethanprior path-basedmethods onthistask.
Weusedtheanton ympairse xtractedfrom the
ParaphraseDatabase(PPDB)(Ganitkevitchetal.,
2013;Pavlicketal.,2015b)inthe paraphrase-
basedmethodas trainingdatafor ourneuralnet- workmodel.
Themaincontrib utionsofthis paperare:
•Wepresentanov eltechnique ofusingpara- phrasesforanton ymdetectionand success- fullyderiv eantonympairsfromparaphrases inthePPDB, thelargest paraphraseresource currentlyav ailable. •Wedemonstrateimprovements toaninte- gratedpath-basedand distributionalmodel, showingthatourmorphology-a wareneural networkmodel,AntNET, performsbetter thanstate-of-the-artmethods fora ntonymde- tection.
2RelatedW ork
ParaphraseExtractionMethodsParaphrases
arewords orphrasesexpressingthe samemean- ing.Paraphrase extractionmethodsthatexploit distributionalortranslationsimilarity mighthow- everproposeparaphrasepairsthatarenot mean- ingequiv alentbutlinkedbyothertypesof re- lations.Thesemethods oftene xtractpairsha v- ingarelated but notequiv alentmeaning,suchas contradictorypairs.F orinstance,LinandP an- tel(2001)extracted 12million"inferencerules" frommonolingualtextbyexploitingshareddepen- dencycontexts.Theirmethod learnsparaphrases thataretruly meaningequiv alent,but itjustas readilylearnscontradictory pairssuch as(Xrises,
Xfalls).Ganitkevitchetal.(2013)e xtractover
150millionparaphrase rulesfromparallel cor-
porabypi votingthrough foreigntranslations.This multilingualparaphrasingmethod oftenlearnsh y- pernym/hyponympairs,duetovariationinthe discoursestructureof translations,andunrelated pairsdueto misalignmentsorpolysemy inthefor - eignlanguage.Pavlicketal.(2015a)added inter- pretablesemantics toPPDB(seeSection3.1for
Method#pairs
(x,y)fromparaphrase (˜x,y)/(x,˜y)80,669 (x,paraphrase(y)),(paraphrase(x),y)81,221 (x,synset(y)),(synset(x),y)692,231
Table2:Numberofunique antonympairs deriv ed
fromPPDB ateachstep. Paraphrasesand synsets wereobtainedfrom PPDBandW ordNetrespec- tively. details)andsho wedthatparaphrases inthisre- sourcerepresenta varietyof relationsother than equivalence,includingcontradictorypairslike no- body/someoneandclose/open.
Pattern-basedMethodsPattern-basedmethods
forinducingsemantic relationsbetweena pairof terms(x,y)considerthele xico-syntacticpaths thatconnectthe jointoccurrencesof xandyin alarge corpus.Avarietyofapproaches have been proposedthatrely onpatterns betweenterms in acorpusto distinguishanton ymsfromother rela- tions.Linetal. (2003)usedtranslation informa- tionandle xico-syntacticpatternsto extractdis- tributionallysimilarwords, andthenfiltered out wordsthatappearedwith thepatterns'from Xto
Y'or'ei therXor Y'significantlyoften.Thein-
tuitionbehindthis wasthat iftwo wordsXandY appearinone ofthesepatterns, they areunlikely to representasynon ymouspair. RothandSchulte im
Walde(2014)combinedgeneral lexico-syntactic
patternswithdiscourse markers asindicatorsfor thespecificsemantic relationsbetween wordpairs (e.g.contrastrelations mightindicate antonymy andelaborationsmay indicatesynon ymyorh y- ponymy).Unlike previouspattern-basedmethods whichreliedon thestandard distributionof pat- terns,Schwartzetal.(2015)usedpatterns tolearn wordembeddings.Theypresented asymmetric pattern-basedmodelfor representingword vectors inwhichanton ymsareassigned todissimilarvec- torrepresentations.More recently,Nguyenetal. (2017)presenteda pattern-basedneuralnetw ork modelthate xploitslexico-syntactic patternsfrom syntacticparsetrees forthetask ofdistinguishing betweenantonyms andsynonyms.Theyapplied
HypeNETShwartzetal.(2016)tothe taskof dis-
tinguishingbetweensynon ymsandanton yms,re- placingthedirection featurewiththe distancein thepathrepresentation.
Source#pairs
WordNet18,306
PPDB773,452
Table3:Numberofunique antonym pairsderiv ed
fromdifferent sources.Thenumberofpairsob- tainedfromPPDB faroutnumber stheanton ym pairspresentin EVALutionand WordNet.
3Paraphrase-based AntonymDerivation
Existingsemanticresources likeW ordNet(Fell-
baum,1998)containa muchsmaller setof antonymscomparedtoothersemantic relations (synonyms,hypernymsand meronyms).Our aimisto createalar geresourceof highquality antonympairsusingparaphrases.
3.1TheP araphraseDatabase
TheParaphrase Database(PPDB)containsover
150millionparaphrase rulescov eringthreepara-
phrasetypes:le xical(singlew ord),phrasal(multi- word),andsyntacticrestructuringrules, andisthe largestcollectionofparaphrasescurrently avail- able.PPDB. Inthispaper ,wefoc usonle xicaland phrasalparaphrasesup totw owords inlength.W e examinetherelationshipsbetweenphrase pairsin thePPDBfocusing onphrasepairs thataremost salientforderi vingantonyms.
3.2Antonym Derivation
SelectionofP araphrasesWeconsiderall
phrasepairsfrom PPDB(p 1 ,p 2 )uptotw owords inlengthsuch thatoneof thetwo phraseseither beginswithaneg atingword likenot,orcontains aneg atingprefix. 1
Wechosethesetwo typesof
paraphrasepairssince webeliev ethemto bethe mostindicativ eofanantonymyrelationshipbe- tweenthetar getwords. Thereare7,878unordered phrasepairsof theform(p " 1 ,p 2 )wherep " 1 be- ginswith'not', and183,159phrases oftheform (p " 1 ,p 2 )wherep " 1 containsane gatingprefix.
ParaphraseTransformation Forparaphrases
containingane gatingprefix, weperformmorpho- logicalanalysisto identifyandremo vethe negat- ingprefixes. Foraphrasepairlik eunhappy/sad, anantonymy relationisderived betweenthebase formofthe negated word,without thenegation prefix,andits paraphrase(happ y/sad).We use 1
Negatingprefixesincludede,un,in,anti,il,non,dis
UnrelatedParaphrasesCategoriesEntailmentOtherrelation much/worthlesscorrect/that'srightJapan/Koreainvesting/increasedinvestmenttwinkle/dark disability/presentsimply/merelyblack/redefficiency/operationalefficiencynaw/notgonna equality/gaptill/untilJan/Febvalid/equallyvalidaccess/available Table4:Examplesofdif ferenttypesof non-antonymsderi vedfrom PPDB.
MORSEL(Lignos,2010)toperform morpholog-
icalanalysisand identifyneg ationmark ers.For multi-wordphraseswitha negating word,the negatingwordissimplydroppedto obtainan antonympair(e.g.different/ notidentical!dif- ferent/identical).Some examplesof PPDBpara- phrasepairsand antonym pairsderiv edfromthem areshown inTable1.The derived antonympairs arefurthere xpandedby associatingthesynonyms (fromW ordNet)andlexicalparaphrases(from
PPDB)of eachphrasewith theotherphrase in
thederiv edpair.Whileexpandingeachphrase inthederi vedpair byitsparaphrases,wefilter outparaphrasepairs withaPPDB score(Pavlick etal.,2015a)ofless than2.5.In theabov eex- ample,unhappy/sad,wefirst deriv ehappy/sadas ananton ympairandexpandit byconsideringall synonymsofhappyasantonyms ofsad(e.g.joy- ful/sad),andall synonymsof sadasantonyms ofhappy(e.g.happy/gloomy).Table 2shows thenumberof pairsderiv edateach stepusing
PPDB.Intotal, wewereable toderiv earound
773Kuniquepairs fromPPDB.This isamuch
largerdatasetthanexisting resourceslike Word-
NetandEV ALutionassho wninTable3.
AnalysisWeperformedamanuale valuation of
thequalityof thee xtractedantonyms byrandomly selecting1000 pairsclassifiedas 'antonym'and observedthatthedataset containedabout63% antonyms.Errorsmostlyconsisted ofphrasesand wordsthatdonot have anopposingmeaning after theremov alofthenegationpattern.For example, theequiv alentpairtill/untilthatwas derivedfrom thePPDBparaphrase rulenottill/until.Othernon- antonymsderivedfrom theabovemethodscanbe classifiedintounrelated pairs(background/figure), paraphrasesorpairs thathav eanequi valent mean- ing(admissible/permissible),w ordsthat belongto acategory (Africa/Asia),pairsthathave anentail- mentrelation(v alid/equallyvalid) andpairsthat arerelatedb utnotwith anantonymrelationship (twinkle/dark).Table 4givessomeexamplesof categoriesofnon-antonyms.
AnnotationSincethepairs derived fromPPDB
seemedtocontain avariety ofrelationsin addi- tiontoanton yms,wecro wdsourcedthetaskofla- bellingasubset ofthesepairs inorderto obtainthe truelabels. 2
Weaskedwork erstochoosebetween
thelabels:anton ym,synonym (orparaphrasefor multi-wordexpressions),unrelated,other ,entail- ment,andc ategory. Weshowedeachpairto3 workers,takingthemajoritylabelas truth.
4LSTM-basedAntonym Detection
Inthissection wedescribeAntNET ,along short
termmemory(LSTM) based,morphology-aw are neuralnetwork modelforantonymdetection.W e firstfocuson improvingthe neuralembeddingsof thepathrepresentation (Section4.1),andthen in- tegratedistributionalsignalsinto ournetworkre- sultingina combinedmethod(Section 4.2).
4.1Path-based Network
Similarlytoprior work,we representeachde-
pendencypathasasequence ofedgesthat leads fromxtoyinthedependenc ytree.W euse thesamepath-based featuresproposedby Shwartz etal.(2016)forrecognizing hypernym relations: lemmaandpart-of-speech (POS)tagof thesource node,thedependenc ylabel,and theedgedirection betweentwo subsequentnodes.Additionally,we alsoadda newfeature thatindicateswhether the sourcenodeis negated.
Ratherthantreating anentire dependencypath
asa singlefeature,we encodethesequence ofedgesusing alongshort termmemorynet- work(HochreiterandSchmidhuber,1997).The vectorsobtainedforthedif ferentpathsof agiv en (x,y)pairarepooled, andtheresulting vectoris usedforclassification. Theov erallnetwork struc- tureisdepicted inFigure1.
EdgeRepr esentationWedenoteeachedgeas
lemma/pos/dep/dir/neg.We areonlyinter- estedin checkingifxand/oryhavenegation 2
5884pairswere randomlychosenand wereannotatedon
www.crowdflower.com Figure1:Illustration oftheAntNET model.Each pairisrepresented bysev eralpathsand eachpath isasequence ofedges.An edgeconsistsof five features:lemma, POS,dependenc ylabel,dependenc y direction,andne gationmark er. markersbutnot theintermediateedgessinceneg a- tioninformationfor intermediatelemmasis un- likelytocontributeto identifyingwhetherthere is anantonym relationshipbetweenxandy.Hence, inourmodel, negisrepresentedin oneofthree ways:negatedifxoryisneg ated,not-negatedif xoryisnotne gated,and unavailablefortheinter - mediateedges.If thesourcenode isneg ated,we replacethelemma bythelemma ofitsbase, non- negated,form.Forexample,if weidentifiedun- happyasa'ne gated' word,wereplacethelemma embeddingofunhappybytheembeddingofhappy inthepath representation.Thene gationfeature willhelpin separatingantonyms fromotherse- manticrelations,especially thosethatare hardto distinguishfrom,lik esynonyms.
Thereplacement ofane gatedw ord'sembed-
dingbyits baseform's embeddingisdone fora fewreasons.First,words andtheirpolar antonyms aremorelik elytoco-occur insentencescompared towords andtheirnegatedforms. Fore xample,
Neitherhappynor sadisprobablya morecom-
monphrasethan Neitherhappynor unhappy,so thistechniquewill helpourmodel toidentify an opposingrelationshipbetween bothtypes ofpairs, happy/unhappyandhappy/sad.Second,a com- monpracticefor creatingword embeddingsfor multi-wordexpressions(MWEs)is byaveraging overtheembeddingsofeachword inthee xpres- sion.Ideally, thisisnotagood representation forphraseslik enotidenticalsincewelos eouton theneg atinginformationobtainedfromnot.In- dicatingthepresence ofnotusingane gationfea- tureandreplacing theembeddingof notidentical byidenticalwillincreasethe classifier' sprobabil- ityofidentifying notidentical/differ entaspara- phrasesandidentical/differentasantonyms. And finally,thismethodhelpsus distinguishbetween termsthatare seeminglyneg atedbut arenotin re- ality(e.g.invaluable).We encodethesequence ofedgesusing anLSTMnetw ork.Thev ectors obtainedforall thepathsconnecting xandyare pooledandcombined, andtheresulting vectoris usedforclassification. Thevector representation ofeachedgeistheconcatenationofitsfeaturevec- tors: ! v edge =[!v lemma , ! v pos , ! v dep , ! v dir , ! v neg ] where!v lemma , ! v pos , ! v dep , ! v dir , ! v neg representthe vectorembeddingsofthenegationmarker,lemma,
POStag,dependenc ylabeland dependencydirec-
tion,respectiv ely.
PathRepresentationTherepresentationfor
apathpcomposedofa sequenceofedges edge 1 ,edge 2 ,..,edge k isasequence ofedgev ec- tors:p=[ ! edge 1 , ! edge 2 ,..., ! edge k ].Theedge vec- torsarefed inorderto arecurrentneural network (RNN)withLSTM units,resultingin theencoded pathvector !v p .
ClassificationT askGivenalexicalorphrasal
pair(x,y)weinducepat ternsfroma corpuswhere eachpattern representsale xico-syntacticpath connectingxandy.Thev ectorrepresentationfor eachterm pair(x,y)iscomputedas theweighted averageofitspathvectorsby applyingav erage poolingasfollo ws: (1)!v p(x,y) = ! p#P(x,y)fp."vp ! p#P(x,y)fp ! v p(x,y) referstothe vector ofthepair (x,y);
P(x,y)isthe multi-setofpaths connectingxand
yinthecorpus andf p isthefrequenc yof pin
P(x,y).Thev ector!v
p(x,y) isthenfed intoaneu- ralnetwork thatoutputstheclassdistrib utioncfor eachclass(relation type),andthe pairisassigned totherelation withthehighest scorer: (2a)c=softmax(MLP(!v p(x,y) ) (2b)r=argmax i c[i]
MLPstands forMultiLayer Perceptronandcan
becomputed withorwithout ahiddenlayer (equa- tions4and5respectively). (3) ! h=tanh(W 1 . ! v p(x,y) +b 1 ) (4)MLP(!v p(x,y) )=W 2 . ! h+b 2 (5)MLP(!v p(x,y) )=W 1 . ! v p(x,y) +b 1
Wreferstoa matrixofweights thatprojectsin-
formationbetweentw olayers;b isalayer-specific vectorofbiastermsand ! histhehidden layer.
4.2CombinedP ath-basedandDistrib utional
Network
Thepath-basedsupervised modelinSection 4.1
classifieseachpair (x,y)basedonthe lexico- syntacticpatternsthat connectxandyinacor - pus.Inspiredby theimprov edperformanceof
Shwartzetal.'s (2016)integrated path-basedand
distributionalmethodover asimpler path-based algorithm,weinte gratedistrib utionalfeaturesinto ourpath-basednetw ork.W ecreateacombined vectorrepresentationusingboth thesyntacticpath featuresandthe co-occurrencedistributional fea- turesofxandyforeachpair (x,y).Thecom- binedve ctorrepresentationfor(x,y),!v c(xy) ,is computedby simplyconcatenatingthe wordem- beddingsof x(!v x )andy(!v y )tothe path-based featurevector !v p(x,y) : (6)!v c(xy) =[!v x , ! v p(x,y) , ! v y ]
5Experiments
Weexperimentwiththe path-basedandcombined
modelsforanton ymidentificati onbyperforming twotypesofclassification:binary andmulticlass classification.
TrainTestValTotal
5,1221,8293677,318
Table5:Numberofinstances presentin the
train/test/validationsplitsofthe crowdsourced dataset.
5.1Dataset
Neuralnetworks requirealarge amountoftrain-
ingdata.W eusethe labelledportionofthedataset thatwecreated usingPPDB,as describedinSec- tion3.Inorder toinducepaths forthepairs in thedataset,we identifysentencesin thecorpus thatcontainthe pairande xtractallpatterns for thegiv enpair.Pairswithanantonym relationship areconsideredas positiv einstancesin bothclas- sificatione xperiments.Inthebinaryclassification experiment,weconsiderallpairs relatedbyother relations(entailment,synon ymy,cate gory,unre- lated,other)as negati veinstances. Wealsoper- formav ariantof themulticlassclassificationwith threeclasses (antonym,other ,unrelated).Dueto theske wednatureofthedataset,wecombinedcat- egory,entailmentandsynonym/paraphrasesinto oneclass.F orbothclassification experiments,we performrandomsplit with70%train, 25%test, and5%v alidationsets.T able5displaysthenum- berofrelations inourdataset. Wikipedia 3 was usedasthe underlyingcorpusfor allmethodsand weperformmodel selectiononthe validationset totunethe hyper-parameters ofeachmethod. We applygridsearch forarange ofvalues andpickthe onesthatyield thehighest F 1 scoreonthe valida- tionset.The besthyper -parametersarereported in theappendix.
5.2Baselines
MajorityBaselineThemajoritybaseline is
achievedbylabellingallthei nstanceswiththe mostfrequentclass occuringinthe dataseti.e.
FALSE(binary)orUNRELATED (multiclass).
DistributedBaselineThemethod proposedby
Schwartzetal.(2015)usessymmetric patterns
(SPs)forgenerating wordembeddings. Theau- thorsautomaticallyacquired symmetricpatterns (definedasa sequenceof3-5 tokensconsisting of exactly2wildcardsand1-3 words)from alarge plain-textcorpus,andgeneratedv ectorswhere 3
WeusedtheEnglishW ikipediadump fromMay2015 as
thecorpus.
ModelBinaryMulticlass
PRF 1 PRF 1
Majoritybaseline0.3040.551 0.3920.2220.4720.303
SPbaseline0.6610.5680.436 0.5830.4880.344
Path-basedSDbaseline0.7230.7240.722 0.6360.675 0.651 Path-basedAntNET0.7320.7220.713 0.6520.687 0.661** CombinedSDbaseline 0.7900.7880.788 0.7440.750 0.738
CombinedAntNET0.8030.8020.802* 0.7460.757 0.746*
Table6:Performanceofthe AntNETmodels incomparisonto thebaselinemodels.
FeatureModelBinaryMulticlass
PRF 1 PRF 1 DistancePath-based0.7270.727 0.7240.6650.6920.664
Combined0.7890.7880.788 0.7320.7430.734
NegationPath-based0.7320.7220.713 0.6520.6870.661
Combined0.8030.8020.802 0.7460.757 0.746
Table7:Comparingtheno velne gationmarking featurewiththe distancefeatureproposedbyNguyen etal. (2017). eachco-ordinate representedtheco-occurrence in symmetricpatterns oftherepresented wordwith anotherw ordofthevocabulary .For antonymrep- resentation,theauthors reliedonthe patternssug- gestedby( Linetal. ,2003)toconstruct wordem- beddingscontainingan antonym parameterthat canbeturned oninorder torepresentanton ymsas dissimilar,andthatcanbe turnedoff torepresent antonymsassimilar.T oev aluatetheSPmethod onourdata, weusedthe pre-trainedSPembed- dings 4 with500dimensions. Weuse theSVM classifierwithRBF kernelfor theclassificationof wordpairs.
Path-basedandCombinedBaselineSince
AntNETisan extensionof thepath-based and
combinedmodelsproposed by( ShwartzandDa- gan,2016)forclassifying multiplesemanticrela- tions,we usetheirmodel sasadditional baselines.
Becausetheir modeluseda differentdataset that
containedvery fewantonyminstances,we repli- catedthebaseline(SD)withthedatasetandcorpus informationasin Sectionn5.1ratherthancompar- ingtothe reportedresults.
5.3Results
Table6displaystheperformance scoresof
AntNETandthe baselinesint ermsof precision,
4 https://homes.cs.washington.edu/ ~roysch/papers/sp_embeddings/sp_ embeddings.html recallandF 1 .Ourcombined modelsignificantly 5 outperformsallbaselines inboth binaryandmul- ticlassclassifications.Both path-basedand com- binedmodelsof AntNETachie vea muchbetter performancein comparisontothe majorityclass andSPbaselines.
Comparingthepath-based methods,the
AntNETmodelachie ves ahigherprecisioncom-
paredto thepath-basedSD baselineforbinary classification,andoutperforms theSDmodel in precision,recalland F 1 inthemulticlass clas- sificationexperiment. Thelowprecisionofthe
SDmodelstems fromitsinability todistinguish
betweenantonyms andsynonyms,andbetween relatedandunrela tedpairswhich arecommonin ourdataset, causingmanyfalsepositi vepairs such asdifficult/harsh,bad/cunning,finish/farwhich wereclassifiedas antonyms.
Comparingthecombined models,the AntNET
modeloutperformsthe SDmodelin precision,re- callandF 1 ,achieving state-of-the-artresultsfor antonymdetection.Inallthe experiments,the performanceofthe modelinthe binaryclassifi- cationtaskw asbetterthan inthemulticlassclas- sification.Multiclassclassification seemstobe in- herentlyharderfor allmethods,due tothelar ge numberofrelations andthesmaller numberofin- stancesforeach relation.We alsoobserved thatas weincreasedthe sizeofthe trainingdataset used 5
Weusedpairedt-test. *p<0.1,**p <0.05
inoure xperiments,theresults improvedforboth path-basedandcombined models,confirming the needforlar ge-scaledatasetsthat willbenefittrain- ingneuralmodels.
EffectoftheNegation-markingFeatureInour
models,theno velne gationmarkingfeatureissuc- cessfullyintegratedalongthesyntacticpathtorep- resentthepaths betweenxandy.Inorder toe val- uatetheef fectofour novelnegation-marking fea- tureforanton ymdetection,we comparethisfea- turetothe distancefeature proposedbyNguyen etal.(2017).Intheir approach,they integrate thedistancebetween relatedwords inale xico- syntacticpathas anew patternfeature,along withlemma,POS anddependency forthetask ofdistinguishinganton ymsandsynon yms.We re-implementedthismodel bymakinguse ofthe sameinformationregardingdatasetandpatternsas inSection5.1andthenreplacing thedirectionfea- turein theSDmodels bythedistance feature.
Theresultsare shownin Table7andindicate
thatthene gationmarking featureandthereplace- mentofthe embeddingsofne gatedw ordsbythe onesoftheir baseformsenhance theperformance ofourmodels moreeffecti velythan thedistance featuredoes,across bothbinaryand multiclass classifications.Although,the distancefeature has previouslybeenshownto performwellfor thetask ofdistinguishinganton ymsfromsynon yms,this featureisnot veryef fective inthemulticlassset- ting.
5.4Error Analysis
Figure2displaystheconfusion matricesforthe
binaryand multiclassexperiments ofthebest per- formingAntNETmodel. Theconfusion matrix showsthatpairsweremostly assignedtothe cor- rectrelationmore thantoan yotherclass.
FalsePositives Weanalyzedthefalse positiv es
fromboththe binaryandmulticlass experiments.
Wesampledabout20%f alsepositiv epairsand
identifiedthefollo wingcommonerrors. Thema- jorityofthe misclassificationerrorsstem from antonym-likeornear-antonymrelations:these are relationsthatcould beconsideredas antonymyb ut wereannotatedby crowd-work ersasother rela- tionsbecausethe ycontainpolysemous terms,for whichtherelation holdsina specificsense.F or example:north/southandpolite/sassywerela- belledas categoryandotherrespectively.Other
Figure2: Confusionmatricesfor thecombined
AntNETmodelfor binary(left) andmulticlass
(right)classifications.Ro wsindicate goldlabels andcolumnsindicate predictions.The matrixis normalizedalongro ws,so thatthepredictionsfor each(true) classsumto 100%. errorsstemfrom confusingantonyms andunre- latedpairs.
FalseNegatives Weagainsampledabout 20%
falsepositive pairsfromboththebinaryandmul- ticlassexperiments andanalyzedthemajortypes oferrors.Most ofthesepairs hadonlyfe wco- occurrencesinthe corpusoftendue toinfrequent terms(e.g.cisc/riscwhichdefinecomputer ar- chitectures).Whileour modele ffectiv elyhandled negativeprefixes,itfailedtohandlenegati vesuf- fixescausingincorrectclassification ofpairslik e spiritless/spirited.Apossible futurework isto simplyextend thismodeltohandleneg ative suf- fixesaswell.
6Conclusion
Inthispaper ,wepresented anoriginaltechnique
forderiving antonymsusingparaphrasesfrom
PPDB.We alsoproposedanovel morphology-
awareneuralnetworkmodel,AntNET, whichim- provesantonymypredictionforpath-basedand combinedmodels.In additiontole xicalandsyn- tacticinformation,wesuggestedtoinclude anovel morphologicalneg ation-markingfeature.
Ourmodelsoutperform thebaselines intwo re-
lationclas sificationtasks.Wealsodemonstrated thatthene gation markingfeatureoutperformspre- viouslysuggestedpath-basedfeaturesforthistask.
Sinceour proposedtechniquesfor antonymyde-
tectionarecorpus based,they canbeapplied to differentlanguagesandrelations.The paraphrase- basedmethodcan beappliedto otherlanguages byextracting theparaphrasesfortheselanguages fromthePPDB andusing amorphologicalanaly- sistool(e.g. MorfetteforFrench (Chrupalaetal. ,
2008))orby lookingupthe negation prefixesin a
grammarbookfor languagesthatdo notdispose ofsucha tool.TheLSTM-based modelcould also beusedin otherlanguages sincethemethod iscor- pusbased,b utwew ouldneedtocreateatraining setforne wlanguages.This wouldnotho wever be toodifficult; thetrainingsetusedbythe modelis notthatbig (theoneused herewas ˜
5000pairs)and
couldbeeasily labelledthroughcro wdsourcing.
Wereleaseourcodeand thelarge-scale dataset
derivedfromPPDB,annotatedwithsemanticrela- tions.
Acknowledgments
Thismaterialis basedinpart onresearchspon-
soredbyD ARPAunder grantnumberFA8750-
13-2-0017(theDEFT program).TheU.S. Gov-
ernmentisauthorized toreproduceand distribute reprintsforGo vernmentalpurposes. Theviews andconclusionscontained inthispublication are thoseofthe authorsandshould notbeinterpreted asrepresentingof ficialpoliciesor endorsements ofDARP AandtheU.S.Government.
Thiswork hasalsobeensupportedbythe
FrenchNationalResearch Agencyunder project
ANR-16-CE33-0013andpartiallysupportedbyan
IntelICRI-CI grant,theIsrael ScienceFoundation
grant880/12, andtheGerman ResearchFounda- tionthroughthe German-IsraeliProjectCoopera- tion(DIP ,grantDA1600/1-1).
Wewouldlike tothankouranonymous review-
ersfortheir thoughtfulandhelpful comments.
References
ColinBannardand ChrisCallison-Burch.2005. Para-
phrasingwithBilingual ParallelCorpora. InPro- ceedingsofthe 43rdAnnual MeetingonAssociation forComputationalLinguistics (ACL'05) .Strouds- burg,PA,pages597-604.
ReginaBarzilayandKathleenR. McKeo wn.2001.Ex-
tractingP araphrasesfromaParallelCorpus. InPro- ceedingsof the39thAnnual MeetingonAssociation forComputational Linguistics(ACL '01).Toulouse,
France,pages 50-57.
WalterG.CharlesandGeor geA.Miller .1989.Con-
textsofantonymousadjecti ves.AppliedPsycholo gy
10:357-375.
GrzegorzChrupala,GeorgianaDinu, andJosefv an
Genabith.2008.Learning MorphologywithMor -
fette.InProceedingsoftheSixthInternational
ConferenceonLanguage Resourcesand Evalua-
tion(LREC'08).Marrakech, Morocco,pages2362- 2367.
ChristianeFellbaum,editor .1998. WordNet:anelec-
troniclexicaldatabase.MITPress.
J.R.Firth.1957.Asynopsisoflinguistictheory,1930-
1955.InStudiesinLinguistic Analysis,BasilBlack-
well,Oxford, UnitedKingdom,pages 1-32.
JuriGanitke vitch,BenjaminVanDurme,andChris
Callison-Burch.2013. PPDB:TheP araphrase
Database.InProceedingsofthe2013Confer -
enceofthe NorthAmericanChapter oftheAssoci- ationforComputational Linguistics:HumanLan- guageTechnologies(NAA CL/HLT).Atlanta,Geor - gia,pages758-764. ZelligS.Harris. 1954.Distributional structure.Word
10(23):146-162.
MartiHearst.1992. Automaticacquisitionof hy-
ponymsfromlargete xtcorpora.In Proceedings ofthe14th InternationalConference onCompu- tationalLinguistics(COLING'92) .Nantes,France, pages539-545.
SeppHochreiterand JurgenSchmidhuber .1997.
Longshort-termmemory .NeuralComputation
9(8):1735-1780.
ConstantineLignos.2010. LearningfromUnseen
Data.InProceedingsoftheMorphoChallenge2010
Workshop.AaltoUni versitySchool ofScienceand
Technology,Helsinki,Finland,pages35-38.
DekangLinand PatrickP antel.2001.DIR T-Discov-
eryofInference RulesfromT ext. InProceedings oftheSe venthA CMSIGKDDInternationalCon- ferenceonKnowledge Discoveryand DataMining (KDD'01).San Francisco,California,pages 323- 328.
DekangLin,Shaojun Zhao,LijuanQin, andMing
Zhou.2003.Identifying synonyms amongdistribu-
tionallysimilarw ords.In ProceedingsoftheEigh- teenthInternationalJ ointConfer enceonArtificial
Intelligence(IJCAI'03).Acapulco,Me xico,pages
1492-1493.
TomasMikolov, IlyaSutskever,KaiChen,GregCor-
rado,andJef freyDean. 2013.DistributedRepresen- tationsofW ordsandPhrases andtheirComposition- ality.InProceedingsofthe26thInternational Con- ferenceonNeuralInformation ProcessingSystems (NIPS'13).Lake Tahoe,Nevada,pages 3111-3119.
KimAnhNguyen, SabineSchulteim Walde,and
NgocThangV u.2017.Distinguishing antonyms
andsynonyms inapattern-basedneuralnetwork. In
Proceedingsofthe15thConfer enceofthe European
Chapterofthe AssociationforComputational Lin-
guistics(EACL '17).Valencia, Spain,pages76-85.
ElliePa vlick,JohanBos,MalvinaNissim,Charley
Beller,BenjaminVanDurme, andChrisCallison-
Burch.2015a.Adding SemanticstoData-Dri ven
Paraphrasing.InThe53rd AnnualMeeting
oftheAssociat ionforComputational Linguistics (ACL'15).Beijing,China, pages1512-1522.
ElliePa vlick,PushpendreRastogi,JuriGanitkevich,
andChrisCallison-Burch BenVan Durme.2015b.
PPDB2.0:Better paraphraseranking,fine-grained
entailmentrelations,w ordembeddings,and style classification.InProceedingsofthe53rd Annual
Meetingof theAssociationfor ComputationalLin-
guistics(A CL'15).Beijing,China, pages425-430.
JeffreyPennington,RichardSocher,and Christopher
Manning.2014.GloV e:Global VectorsforWord
Representation.InProceedingsofthe2014Con-
ferenceonEmpiricalMethods inNatural Language
Processing(EMNLP'14).Doha,Qatar ,pages1532-
1543.
MichaelRothand SabineSchulteim Walde. 2014.
CombiningWordP atterns andDiscourseMarkers
forP aradigmaticRelationClassification.InPro- ceedingsofthe 52ndAnnual Meetingofthe Associ- ationfor ComputationalLinguistics(A CL'14).Bal- timore,MD, pages524-530.
RoySchwartz,RoiReichart, andAriRappoport.2015.
SymmetricPattern BasedWordEmbeddingsforIm-
provedWordSimilarityPrediction.InProceed- ingsofthe NineteenthConference onComputational
NaturalLanguageLearning (CoNLL'15).Beijing,
China,pages258-267.
VeredShwartzandIdo Dagan.2016.CogALex-V
SharedTask: LexNET-IntegratedP ath-basedand
DistributionalMethodfortheIdentification ofSe-
manticRelations. InProceedingsofthe5thW ork- shopon CognitiveAspects oftheLexicon(CogALe x-
V).Osaka,Japan, pages80-85.
VeredShwartz,Yoa vGoldberg,andIdoDag an.2016.
ImprovingHypernymyDetection withanIntegrated
Path-basedand DistributionalMethod. InPro-
ceedingsofthe 54thAnnualMeeting ofthe As- sociationforComputational Linguistics(A CL'16).
Berlin,Germany ,pages2389-2398.
ASupplementalMaterial
Forderivinganton ymsusingPPDB,weused
theXXXLsize ofPPDBv ersion2.0 foundin http://paraphrase.org/.
Tocomputethemetrics inTables 6and7, We
usedscikit-learnwith the"a veragedsetup", which computesthemetrics foreach relationandreports theirav erageweightedbysupport(thenumberof trueinstancesfor eachrelation).Note thatitcan resultina F 1 scorethatis notthe harmonicmean ofprecisionand recall.
Duringpreprocessingwe handledremov alof
punctuation.Sinceour datasetonlycontains short phrases,weremo vedan ystopwordsoccurringat thebeginni ngofasentence(Example:aman! man)andwe alsoremov edplurals.The besthy- perparametersforall modelsmentionedin thispa- peraresho wninT able8.Thelearningratew as setto0.001 foralle xperiments.
ModelTypeDropout
SD-pathBinary0.2
SD-pathMulticlass0.4
SD-combinedBinary0.4
SD-combinedMulticlass0.2
ASD-pathBinary0.0
ASD-pathMulticlass0.2
ASD-combinedBinary0.0
ASD-combinedMulticlass0.2
AntNET-pathBinary0.0
AntNET-pathMulticlass0.2
AntNET-combinedBinary0.4
AntNET-combinedMulticlass0.2
Table8:Thebesth yper-parametersin every
model.
Antonyms Documents PDF, PPT , Doc