[PDF] Learning Antonyms with Paraphrases and a Morphology-aware PDF

[PDF] Definition- next to Synonyms- near, alongside Antonyms- far from

near, alongside Antonyms- far from, behind, in front beside Definition- without emotion Synonyms- peaceful, quiet Antonyms- wild, angry calm Definition-

[PDF] Synonyms and Antonyms

Write the letter of the antonym next to the appropriate word 1 ______ happy a bury 2 ______smooth b cry 3 ______ hate c different 4 ______win d tall 5

[PDF] Learning Antonyms with Paraphrases and a Morphology-aware

native speakers consider antonyms, they have lim- formed poorly on synonyms and antonyms com- cessfully integrated along the syntactic path to rep-

PDF document for free

PDF document for free

7106_1learning_antonyms.pdf LearningAntonymswithParaphrases andaMor phology-awareNeural

Network

SnehaRajana

ChrisCallison-Burch

MariannaApidianaki

VeredShwartz

" ! ComputerandInformation ScienceDepartment,Uni versity ofPennsylvania, USA !

LIMSI,CNRS,Uni versitéP aris-Saclay,91403Orsay

! ComputerScienceDepartment, Bar-IlanUni versity, Israel {srajana,ccb,marapi}@seas.upenn.eduvered1986@gmail.com

Abstract

Recognizinganddistinguishing antonyms

fromothertypes ofsemanticrelations is anessentialpart oflanguageunderstand- ingsystems. Inthispaper ,wepresent a novelmethodforderivingantonym pairs usingparaphrasepairscontainingneg ation markers.Wefurtherpropose aneuralnet- workmodel,AntNET,thatint egratesmor - phologicalfeaturesindicativeofantonymy intoapath-based relationdetectionalgo- rithm.W edemonstratethatourmodel outperformsstate-of-the-art modelsindis- tinguishingantonyms fromothersemantic relationsandis capableofef ficientlyhan- dlingmulti-w ordexpressions.

1Introduction

Identifyingantonymy andexpressionswithcon-

trastingmeaningsis valuablefor NLPsystems whichgobe yondrecognizingsemantic related- nessandrequire toidentifyspecific semanticre- lations.Whilemanually createdsemantictax- onomies,lik eWordNet(Fellbaum,1998),define antonymyrelationsbetweensome wordpairs that nativespeakersconsiderantonyms, theyhavelim- itedcov erage.Further,aseachtermof anantony- mouspaircan have manysemantically close terms,thecontrasting wordpairs faroutnum- berthosethat arecommonlyconsidered antonym pairs,andthe yremainunrecorded. Therefore, automatedmethods have beenproposedtodeter- minefora given term-pair(x,y),whetherxandy areanton ymsofeachother,based ontheiroccur - rencesina largecorpus.

CharlesandMiller (1989)putforw ardtheco-

occurrencehypothesis thatantonymsoccurto- getherina sentencemore oftenthanchance. How- ever,non-antonymoussemanticallyrelatedwords

ParaphrasePairAntonymP air

notsufficient/insuf ficientsufficient/insufficient insignificant/negligiblesignificant/negligible dishonest/lyinghonest/lying unusual/prettystrange usual/prettystrange

Table1:Examplesofanton ymsderiv edfrom

PPDBparaphrases.The antonympairs incolumn

2werederi vedfrom thecorrespondingparaphrase

pairsincolumn 1. suchash ypernyms,holon yms,meronyms,and near-synonymsalsotendtooccurtogethermore oftenthanchance. Thus,separatinganton yms frompairslinkedbyotherrelationshipshasproven tobedif ficult.Approachesto antonymdetec- tionhav eexploiteddistributionalvectorrepresen- tationsrelyingon thedistributional hypothesisof semanticsimilarity( Harris,1954;Firth,1957)that wordsco-occurringinsimilarconte xtstendto be semanticallyclose.T womain informationsources areusedto recognizesemanticrelations: path- basedanddistrib utional.Path-based methodscon- siderthejointoccurrencesofthe twoterms in agiv ensentenceandusethedependencypaths thatconnectthe termsasfeatures (Hearst,1992;

RothandSchulte imWalde ,2014;Schwartzetal.,

2015).For distinguishingantonymsfromother re-

lations,Linetal. (2003)proposedto useantonym patterns(suchas eitherXorYandfromXto

Y).Distributional methodsarebasedonthe dis-

jointoccurrencesof eachterm andhav erecently becomepopularusing wordembeddings (Mikolov etal.,2013;Penningtonetal. ,2014)which pro- videadistrib utionalrepresentationfor eachterm.

Recently,combinedpath-basedanddistrib utional

methodsforrelation detectionhav ealsobeen proposed(Shwartzetal.,2016;Nguyenet al.,

2017).They showedthatagoodpath representa-

tioncanpro videsubstantialcomplementary infor- mationtothe distributionalsignal fordistinguish- ingbetweendif ferentsemanticrelations.

Whileanton ymyappliestoexpressionsthat

representcontrastingmeanings,paraphrasesare phrasesexpressing thesamemeaning,whichusu- allyoccurin similartextual contexts( Barzilayand

McKeown,2001)orha vecommon translations

inotherlanguages (BannardandCallison-Burch ,

2005).Specifically, iftwowords orphrasesare

paraphrases,the yareunlikelytobe antonymsof eachother. Ourfirstapproachtoantonym de- tectionexploits thisfactandusesparaphrases for detectingandgenerating antonyms( Thedemen- torscaughtSiriusBlack/ Blackcouldnotescape thedementors ).We startbyfocusingonphrase pairsthatare mostsalientf orderiving antonyms.

Ourassumptionis thatphrases(or words)contain-

ingneg atingwords(orprefixes)aremorehelp- fulforidentifying opposingrelationshipsbetween term-pairs.F orexample,fromtheparaphrase pair (caught/notescape),wecan derive theantonym pair(caught/escape)by justremoving theneg at- ingword 'not'.

Oursecondmethod isinspiredby therecent

successofdeep learningmethodsfor relationde- tection.Shwartzetal.(2016)proposedan inte- gratedpath-basedand distributionalmodel toim- provehypernymydetectionbetweenterm-pairs, andlatere xtendeditto classifymultiplesemantic relations(ShwartzandDagan,2016)(Le xNET).

AlthoughLexNET wasthebestperformingsys-

teminthe semanticrelationclassification taskof theCogALex 2016sharedtask,themodelper - formedpoorlyon synonymsand antonymscom- paredtoother relations.Thepath-based compo- nentisweak inrecognizingsynon yms,which do nottendto co-occur,and thedistributional infor- mationcausedconfusion betweensynonyms and antonyms,sincebothtendto occurinthe same contexts.WeproposeAntNET,ano vele xten- sionofLe xNETthatinte gratesinformationabout negatingprefixesasane wmorphologicalpat- ternfeature andisable todistinguishanton yms fromothersemantic relations.Inaddition, weop- timizethev ectorrepresentationsof dependency pathsbetweenthe given termpair, encodedusing aneuralnetw ork,byreplacing theembeddingsof wordswithnegating prefixesby theembeddings ofthebase, non-negated, formsofthe words.

Forexample,forthe termpairunhappy/joyful,

werecordthe neg atingprefixof unhappyusing ane wpathfeatureandreplacethe wordembed- dingofunhappywithhappyinthevectorrepresen- tationofthe dependencypath betweenunhappy andsad.Theproposed modelimprov esthe path embeddingstobetter distinguishanton ymsfrom othersemanticrelations andgets higherperfor- mancethanprior path-basedmethods onthistask.

Weusedtheanton ympairse xtractedfrom the

ParaphraseDatabase(PPDB)(Ganitkevitchetal.,

2013;Pavlicketal.,2015b)inthe paraphrase-

basedmethodas trainingdatafor ourneuralnet- workmodel.

Themaincontrib utionsofthis paperare:

•Wepresentanov eltechnique ofusingpara- phrasesforanton ymdetectionand success- fullyderiv eantonympairsfromparaphrases inthePPDB, thelargest paraphraseresource currentlyav ailable. •Wedemonstrateimprovements toaninte- gratedpath-basedand distributionalmodel, showingthatourmorphology-a wareneural networkmodel,AntNET, performsbetter thanstate-of-the-artmethods fora ntonymde- tection.

2RelatedW ork

ParaphraseExtractionMethodsParaphrases

arewords orphrasesexpressingthe samemean- ing.Paraphrase extractionmethodsthatexploit distributionalortranslationsimilarity mighthow- everproposeparaphrasepairsthatarenot mean- ingequiv alentbutlinkedbyothertypesof re- lations.Thesemethods oftene xtractpairsha v- ingarelated but notequiv alentmeaning,suchas contradictorypairs.F orinstance,LinandP an- tel(2001)extracted 12million"inferencerules" frommonolingualtextbyexploitingshareddepen- dencycontexts.Theirmethod learnsparaphrases thataretruly meaningequiv alent,but itjustas readilylearnscontradictory pairssuch as(Xrises,

Xfalls).Ganitkevitchetal.(2013)e xtractover

150millionparaphrase rulesfromparallel cor-

porabypi votingthrough foreigntranslations.This multilingualparaphrasingmethod oftenlearnsh y- pernym/hyponympairs,duetovariationinthe discoursestructureof translations,andunrelated pairsdueto misalignmentsorpolysemy inthefor - eignlanguage.Pavlicketal.(2015a)added inter- pretablesemantics toPPDB(seeSection3.1for

Method#pairs

(x,y)fromparaphrase (˜x,y)/(x,˜y)80,669 (x,paraphrase(y)),(paraphrase(x),y)81,221 (x,synset(y)),(synset(x),y)692,231

Table2:Numberofunique antonympairs deriv ed

fromPPDB ateachstep. Paraphrasesand synsets wereobtainedfrom PPDBandW ordNetrespec- tively. details)andsho wedthatparaphrases inthisre- sourcerepresenta varietyof relationsother than equivalence,includingcontradictorypairslike no- body/someoneandclose/open.

Pattern-basedMethodsPattern-basedmethods

forinducingsemantic relationsbetweena pairof terms(x,y)considerthele xico-syntacticpaths thatconnectthe jointoccurrencesof xandyin alarge corpus.Avarietyofapproaches have been proposedthatrely onpatterns betweenterms in acorpusto distinguishanton ymsfromother rela- tions.Linetal. (2003)usedtranslation informa- tionandle xico-syntacticpatternsto extractdis- tributionallysimilarwords, andthenfiltered out wordsthatappearedwith thepatterns'from Xto

Y'or'ei therXor Y'significantlyoften.Thein-

tuitionbehindthis wasthat iftwo wordsXandY appearinone ofthesepatterns, they areunlikely to representasynon ymouspair. RothandSchulte im

Walde(2014)combinedgeneral lexico-syntactic

patternswithdiscourse markers asindicatorsfor thespecificsemantic relationsbetween wordpairs (e.g.contrastrelations mightindicate antonymy andelaborationsmay indicatesynon ymyorh y- ponymy).Unlike previouspattern-basedmethods whichreliedon thestandard distributionof pat- terns,Schwartzetal.(2015)usedpatterns tolearn wordembeddings.Theypresented asymmetric pattern-basedmodelfor representingword vectors inwhichanton ymsareassigned todissimilarvec- torrepresentations.More recently,Nguyenetal. (2017)presenteda pattern-basedneuralnetw ork modelthate xploitslexico-syntactic patternsfrom syntacticparsetrees forthetask ofdistinguishing betweenantonyms andsynonyms.Theyapplied

HypeNETShwartzetal.(2016)tothe taskof dis-

tinguishingbetweensynon ymsandanton yms,re- placingthedirection featurewiththe distancein thepathrepresentation.

Source#pairs

WordNet18,306

PPDB773,452

Table3:Numberofunique antonym pairsderiv ed

fromdifferent sources.Thenumberofpairsob- tainedfromPPDB faroutnumber stheanton ym pairspresentin EVALutionand WordNet.

3Paraphrase-based AntonymDerivation

Existingsemanticresources likeW ordNet(Fell-

baum,1998)containa muchsmaller setof antonymscomparedtoothersemantic relations (synonyms,hypernymsand meronyms).Our aimisto createalar geresourceof highquality antonympairsusingparaphrases.

3.1TheP araphraseDatabase

TheParaphrase Database(PPDB)containsover

150millionparaphrase rulescov eringthreepara-

phrasetypes:le xical(singlew ord),phrasal(multi- word),andsyntacticrestructuringrules, andisthe largestcollectionofparaphrasescurrently avail- able.PPDB. Inthispaper ,wefoc usonle xicaland phrasalparaphrasesup totw owords inlength.W e examinetherelationshipsbetweenphrase pairsin thePPDBfocusing onphrasepairs thataremost salientforderi vingantonyms.

3.2Antonym Derivation

SelectionofP araphrasesWeconsiderall

phrasepairsfrom PPDB(p 1 ,p 2 )uptotw owords inlengthsuch thatoneof thetwo phraseseither beginswithaneg atingword likenot,orcontains aneg atingprefix. 1

Wechosethesetwo typesof

paraphrasepairssince webeliev ethemto bethe mostindicativ eofanantonymyrelationshipbe- tweenthetar getwords. Thereare7,878unordered phrasepairsof theform(p " 1 ,p 2 )wherep " 1 be- ginswith'not', and183,159phrases oftheform (p " 1 ,p 2 )wherep " 1 containsane gatingprefix.

ParaphraseTransformation Forparaphrases

containingane gatingprefix, weperformmorpho- logicalanalysisto identifyandremo vethe negat- ingprefixes. Foraphrasepairlik eunhappy/sad, anantonymy relationisderived betweenthebase formofthe negated word,without thenegation prefix,andits paraphrase(happ y/sad).We use 1

Negatingprefixesincludede,un,in,anti,il,non,dis

UnrelatedParaphrasesCategoriesEntailmentOtherrelation much/worthlesscorrect/that'srightJapan/Koreainvesting/increasedinvestmenttwinkle/dark disability/presentsimply/merelyblack/redefficiency/operationalefficiencynaw/notgonna equality/gaptill/untilJan/Febvalid/equallyvalidaccess/available Table4:Examplesofdif ferenttypesof non-antonymsderi vedfrom PPDB.

MORSEL(Lignos,2010)toperform morpholog-

icalanalysisand identifyneg ationmark ers.For multi-wordphraseswitha negating word,the negatingwordissimplydroppedto obtainan antonympair(e.g.different/ notidentical!dif- ferent/identical).Some examplesof PPDBpara- phrasepairsand antonym pairsderiv edfromthem areshown inTable1.The derived antonympairs arefurthere xpandedby associatingthesynonyms (fromW ordNet)andlexicalparaphrases(from

PPDB)of eachphrasewith theotherphrase in

thederiv edpair.Whileexpandingeachphrase inthederi vedpair byitsparaphrases,wefilter outparaphrasepairs withaPPDB score(Pavlick etal.,2015a)ofless than2.5.In theabov eex- ample,unhappy/sad,wefirst deriv ehappy/sadas ananton ympairandexpandit byconsideringall synonymsofhappyasantonyms ofsad(e.g.joy- ful/sad),andall synonymsof sadasantonyms ofhappy(e.g.happy/gloomy).Table 2shows thenumberof pairsderiv edateach stepusing

PPDB.Intotal, wewereable toderiv earound

773Kuniquepairs fromPPDB.This isamuch

largerdatasetthanexisting resourceslike Word-

NetandEV ALutionassho wninTable3.

AnalysisWeperformedamanuale valuation of

thequalityof thee xtractedantonyms byrandomly selecting1000 pairsclassifiedas 'antonym'and observedthatthedataset containedabout63% antonyms.Errorsmostlyconsisted ofphrasesand wordsthatdonot have anopposingmeaning after theremov alofthenegationpattern.For example, theequiv alentpairtill/untilthatwas derivedfrom thePPDBparaphrase rulenottill/until.Othernon- antonymsderivedfrom theabovemethodscanbe classifiedintounrelated pairs(background/figure), paraphrasesorpairs thathav eanequi valent mean- ing(admissible/permissible),w ordsthat belongto acategory (Africa/Asia),pairsthathave anentail- mentrelation(v alid/equallyvalid) andpairsthat arerelatedb utnotwith anantonymrelationship (twinkle/dark).Table 4givessomeexamplesof categoriesofnon-antonyms.

AnnotationSincethepairs derived fromPPDB

seemedtocontain avariety ofrelationsin addi- tiontoanton yms,wecro wdsourcedthetaskofla- bellingasubset ofthesepairs inorderto obtainthe truelabels. 2

Weaskedwork erstochoosebetween

thelabels:anton ym,synonym (orparaphrasefor multi-wordexpressions),unrelated,other ,entail- ment,andc ategory. Weshowedeachpairto3 workers,takingthemajoritylabelas truth.

4LSTM-basedAntonym Detection

Inthissection wedescribeAntNET ,along short

termmemory(LSTM) based,morphology-aw are neuralnetwork modelforantonymdetection.W e firstfocuson improvingthe neuralembeddingsof thepathrepresentation (Section4.1),andthen in- tegratedistributionalsignalsinto ournetworkre- sultingina combinedmethod(Section 4.2).

4.1Path-based Network

Similarlytoprior work,we representeachde-

pendencypathasasequence ofedgesthat leads fromxtoyinthedependenc ytree.W euse thesamepath-based featuresproposedby Shwartz etal.(2016)forrecognizing hypernym relations: lemmaandpart-of-speech (POS)tagof thesource node,thedependenc ylabel,and theedgedirection betweentwo subsequentnodes.Additionally,we alsoadda newfeature thatindicateswhether the sourcenodeis negated.

Ratherthantreating anentire dependencypath

asa singlefeature,we encodethesequence ofedgesusing alongshort termmemorynet- work(HochreiterandSchmidhuber,1997).The vectorsobtainedforthedif ferentpathsof agiv en (x,y)pairarepooled, andtheresulting vectoris usedforclassification. Theov erallnetwork struc- tureisdepicted inFigure1.

EdgeRepr esentationWedenoteeachedgeas

lemma/pos/dep/dir/neg.We areonlyinter- estedin checkingifxand/oryhavenegation 2

5884pairswere randomlychosenand wereannotatedon

www.crowdflower.com Figure1:Illustration oftheAntNET model.Each pairisrepresented bysev eralpathsand eachpath isasequence ofedges.An edgeconsistsof five features:lemma, POS,dependenc ylabel,dependenc y direction,andne gationmark er. markersbutnot theintermediateedgessinceneg a- tioninformationfor intermediatelemmasis un- likelytocontributeto identifyingwhetherthere is anantonym relationshipbetweenxandy.Hence, inourmodel, negisrepresentedin oneofthree ways:negatedifxoryisneg ated,not-negatedif xoryisnotne gated,and unavailablefortheinter - mediateedges.If thesourcenode isneg ated,we replacethelemma bythelemma ofitsbase, non- negated,form.Forexample,if weidentifiedun- happyasa'ne gated' word,wereplacethelemma embeddingofunhappybytheembeddingofhappy inthepath representation.Thene gationfeature willhelpin separatingantonyms fromotherse- manticrelations,especially thosethatare hardto distinguishfrom,lik esynonyms.

Thereplacement ofane gatedw ord'sembed-

dingbyits baseform's embeddingisdone fora fewreasons.First,words andtheirpolar antonyms aremorelik elytoco-occur insentencescompared towords andtheirnegatedforms. Fore xample,

Neitherhappynor sadisprobablya morecom-

monphrasethan Neitherhappynor unhappy,so thistechniquewill helpourmodel toidentify an opposingrelationshipbetween bothtypes ofpairs, happy/unhappyandhappy/sad.Second,a com- monpracticefor creatingword embeddingsfor multi-wordexpressions(MWEs)is byaveraging overtheembeddingsofeachword inthee xpres- sion.Ideally, thisisnotagood representation forphraseslik enotidenticalsincewelos eouton theneg atinginformationobtainedfromnot.In- dicatingthepresence ofnotusingane gationfea- tureandreplacing theembeddingof notidentical byidenticalwillincreasethe classifier' sprobabil- ityofidentifying notidentical/differ entaspara- phrasesandidentical/differentasantonyms. And finally,thismethodhelpsus distinguishbetween termsthatare seeminglyneg atedbut arenotin re- ality(e.g.invaluable).We encodethesequence ofedgesusing anLSTMnetw ork.Thev ectors obtainedforall thepathsconnecting xandyare pooledandcombined, andtheresulting vectoris usedforclassification. Thevector representation ofeachedgeistheconcatenationofitsfeaturevec- tors: ! v edge =[!v lemma , ! v pos , ! v dep , ! v dir , ! v neg ] where!v lemma , ! v pos , ! v dep , ! v dir , ! v neg representthe vectorembeddingsofthenegationmarker,lemma,

POStag,dependenc ylabeland dependencydirec-

tion,respectiv ely.

PathRepresentationTherepresentationfor

apathpcomposedofa sequenceofedges edge 1 ,edge 2 ,..,edge k isasequence ofedgev ec- tors:p=[ ! edge 1 , ! edge 2 ,..., ! edge k ].Theedge vec- torsarefed inorderto arecurrentneural network (RNN)withLSTM units,resultingin theencoded pathvector !v p .

ClassificationT askGivenalexicalorphrasal

pair(x,y)weinducepat ternsfroma corpuswhere eachpattern representsale xico-syntacticpath connectingxandy.Thev ectorrepresentationfor eachterm pair(x,y)iscomputedas theweighted averageofitspathvectorsby applyingav erage poolingasfollo ws: (1)!v p(x,y) = ! p#P(x,y)fp."vp ! p#P(x,y)fp ! v p(x,y) referstothe vector ofthepair (x,y);

P(x,y)isthe multi-setofpaths connectingxand

yinthecorpus andf p isthefrequenc yof pin

P(x,y).Thev ector!v

p(x,y) isthenfed intoaneu- ralnetwork thatoutputstheclassdistrib utioncfor eachclass(relation type),andthe pairisassigned totherelation withthehighest scorer: (2a)c=softmax(MLP(!v p(x,y) ) (2b)r=argmax i c[i]

MLPstands forMultiLayer Perceptronandcan

becomputed withorwithout ahiddenlayer (equa- tions4and5respectively). (3) ! h=tanh(W 1 . ! v p(x,y) +b 1 ) (4)MLP(!v p(x,y) )=W 2 . ! h+b 2 (5)MLP(!v p(x,y) )=W 1 . ! v p(x,y) +b 1

Wreferstoa matrixofweights thatprojectsin-

formationbetweentw olayers;b isalayer-specific vectorofbiastermsand ! histhehidden layer.

4.2CombinedP ath-basedandDistrib utional

Network

Thepath-basedsupervised modelinSection 4.1

classifieseachpair (x,y)basedonthe lexico- syntacticpatternsthat connectxandyinacor - pus.Inspiredby theimprov edperformanceof

Shwartzetal.'s (2016)integrated path-basedand

distributionalmethodover asimpler path-based algorithm,weinte gratedistrib utionalfeaturesinto ourpath-basednetw ork.W ecreateacombined vectorrepresentationusingboth thesyntacticpath featuresandthe co-occurrencedistributional fea- turesofxandyforeachpair (x,y).Thecom- binedve ctorrepresentationfor(x,y),!v c(xy) ,is computedby simplyconcatenatingthe wordem- beddingsof x(!v x )andy(!v y )tothe path-based featurevector !v p(x,y) : (6)!v c(xy) =[!v x , ! v p(x,y) , ! v y ]

5Experiments

Weexperimentwiththe path-basedandcombined

modelsforanton ymidentificati onbyperforming twotypesofclassification:binary andmulticlass classification.

TrainTestValTotal

5,1221,8293677,318

Table5:Numberofinstances presentin the

train/test/validationsplitsofthe crowdsourced dataset.

5.1Dataset

Neuralnetworks requirealarge amountoftrain-

ingdata.W eusethe labelledportionofthedataset thatwecreated usingPPDB,as describedinSec- tion3.Inorder toinducepaths forthepairs in thedataset,we identifysentencesin thecorpus thatcontainthe pairande xtractallpatterns for thegiv enpair.Pairswithanantonym relationship areconsideredas positiv einstancesin bothclas- sificatione xperiments.Inthebinaryclassification experiment,weconsiderallpairs relatedbyother relations(entailment,synon ymy,cate gory,unre- lated,other)as negati veinstances. Wealsoper- formav ariantof themulticlassclassificationwith threeclasses (antonym,other ,unrelated).Dueto theske wednatureofthedataset,wecombinedcat- egory,entailmentandsynonym/paraphrasesinto oneclass.F orbothclassification experiments,we performrandomsplit with70%train, 25%test, and5%v alidationsets.T able5displaysthenum- berofrelations inourdataset. Wikipedia 3 was usedasthe underlyingcorpusfor allmethodsand weperformmodel selectiononthe validationset totunethe hyper-parameters ofeachmethod. We applygridsearch forarange ofvalues andpickthe onesthatyield thehighest F 1 scoreonthe valida- tionset.The besthyper -parametersarereported in theappendix.

5.2Baselines

MajorityBaselineThemajoritybaseline is

achievedbylabellingallthei nstanceswiththe mostfrequentclass occuringinthe dataseti.e.

FALSE(binary)orUNRELATED (multiclass).

DistributedBaselineThemethod proposedby

Schwartzetal.(2015)usessymmetric patterns

(SPs)forgenerating wordembeddings. Theau- thorsautomaticallyacquired symmetricpatterns (definedasa sequenceof3-5 tokensconsisting of exactly2wildcardsand1-3 words)from alarge plain-textcorpus,andgeneratedv ectorswhere 3

WeusedtheEnglishW ikipediadump fromMay2015 as

thecorpus.

ModelBinaryMulticlass

PRF 1 PRF 1

Majoritybaseline0.3040.551 0.3920.2220.4720.303

SPbaseline0.6610.5680.436 0.5830.4880.344

Path-basedSDbaseline0.7230.7240.722 0.6360.675 0.651 Path-basedAntNET0.7320.7220.713 0.6520.687 0.661** CombinedSDbaseline 0.7900.7880.788 0.7440.750 0.738

CombinedAntNET0.8030.8020.802* 0.7460.757 0.746*

Table6:Performanceofthe AntNETmodels incomparisonto thebaselinemodels.

FeatureModelBinaryMulticlass

PRF 1 PRF 1 DistancePath-based0.7270.727 0.7240.6650.6920.664

Combined0.7890.7880.788 0.7320.7430.734

NegationPath-based0.7320.7220.713 0.6520.6870.661

Combined0.8030.8020.802 0.7460.757 0.746

Table7:Comparingtheno velne gationmarking featurewiththe distancefeatureproposedbyNguyen etal. (2017). eachco-ordinate representedtheco-occurrence in symmetricpatterns oftherepresented wordwith anotherw ordofthevocabulary .For antonymrep- resentation,theauthors reliedonthe patternssug- gestedby( Linetal. ,2003)toconstruct wordem- beddingscontainingan antonym parameterthat canbeturned oninorder torepresentanton ymsas dissimilar,andthatcanbe turnedoff torepresent antonymsassimilar.T oev aluatetheSPmethod onourdata, weusedthe pre-trainedSPembed- dings 4 with500dimensions. Weuse theSVM classifierwithRBF kernelfor theclassificationof wordpairs.

Path-basedandCombinedBaselineSince

AntNETisan extensionof thepath-based and

combinedmodelsproposed by( ShwartzandDa- gan,2016)forclassifying multiplesemanticrela- tions,we usetheirmodel sasadditional baselines.

Becausetheir modeluseda differentdataset that

containedvery fewantonyminstances,we repli- catedthebaseline(SD)withthedatasetandcorpus informationasin Sectionn5.1ratherthancompar- ingtothe reportedresults.

5.3Results

Table6displaystheperformance scoresof

AntNETandthe baselinesint ermsof precision,

4 https://homes.cs.washington.edu/ ~roysch/papers/sp_embeddings/sp_ embeddings.html recallandF 1 .Ourcombined modelsignificantly 5 outperformsallbaselines inboth binaryandmul- ticlassclassifications.Both path-basedand com- binedmodelsof AntNETachie vea muchbetter performancein comparisontothe majorityclass andSPbaselines.

Comparingthepath-based methods,the

AntNETmodelachie ves ahigherprecisioncom-

paredto thepath-basedSD baselineforbinary classification,andoutperforms theSDmodel in precision,recalland F 1 inthemulticlass clas- sificationexperiment. Thelowprecisionofthe

SDmodelstems fromitsinability todistinguish

betweenantonyms andsynonyms,andbetween relatedandunrela tedpairswhich arecommonin ourdataset, causingmanyfalsepositi vepairs such asdifficult/harsh,bad/cunning,finish/farwhich wereclassifiedas antonyms.

Comparingthecombined models,the AntNET

modeloutperformsthe SDmodelin precision,re- callandF 1 ,achieving state-of-the-artresultsfor antonymdetection.Inallthe experiments,the performanceofthe modelinthe binaryclassifi- cationtaskw asbetterthan inthemulticlassclas- sification.Multiclassclassification seemstobe in- herentlyharderfor allmethods,due tothelar ge numberofrelations andthesmaller numberofin- stancesforeach relation.We alsoobserved thatas weincreasedthe sizeofthe trainingdataset used 5

Weusedpairedt-test. *p<0.1,**p <0.05

inoure xperiments,theresults improvedforboth path-basedandcombined models,confirming the needforlar ge-scaledatasetsthat willbenefittrain- ingneuralmodels.

EffectoftheNegation-markingFeatureInour

models,theno velne gationmarkingfeatureissuc- cessfullyintegratedalongthesyntacticpathtorep- resentthepaths betweenxandy.Inorder toe val- uatetheef fectofour novelnegation-marking fea- tureforanton ymdetection,we comparethisfea- turetothe distancefeature proposedbyNguyen etal.(2017).Intheir approach,they integrate thedistancebetween relatedwords inale xico- syntacticpathas anew patternfeature,along withlemma,POS anddependency forthetask ofdistinguishinganton ymsandsynon yms.We re-implementedthismodel bymakinguse ofthe sameinformationregardingdatasetandpatternsas inSection5.1andthenreplacing thedirectionfea- turein theSDmodels bythedistance feature.

Theresultsare shownin Table7andindicate

thatthene gationmarking featureandthereplace- mentofthe embeddingsofne gatedw ordsbythe onesoftheir baseformsenhance theperformance ofourmodels moreeffecti velythan thedistance featuredoes,across bothbinaryand multiclass classifications.Although,the distancefeature has previouslybeenshownto performwellfor thetask ofdistinguishinganton ymsfromsynon yms,this featureisnot veryef fective inthemulticlassset- ting.

5.4Error Analysis

Figure2displaystheconfusion matricesforthe

binaryand multiclassexperiments ofthebest per- formingAntNETmodel. Theconfusion matrix showsthatpairsweremostly assignedtothe cor- rectrelationmore thantoan yotherclass.

FalsePositives Weanalyzedthefalse positiv es

fromboththe binaryandmulticlass experiments.

Wesampledabout20%f alsepositiv epairsand

identifiedthefollo wingcommonerrors. Thema- jorityofthe misclassificationerrorsstem from antonym-likeornear-antonymrelations:these are relationsthatcould beconsideredas antonymyb ut wereannotatedby crowd-work ersasother rela- tionsbecausethe ycontainpolysemous terms,for whichtherelation holdsina specificsense.F or example:north/southandpolite/sassywerela- belledas categoryandotherrespectively.Other

Figure2: Confusionmatricesfor thecombined

AntNETmodelfor binary(left) andmulticlass

(right)classifications.Ro wsindicate goldlabels andcolumnsindicate predictions.The matrixis normalizedalongro ws,so thatthepredictionsfor each(true) classsumto 100%. errorsstemfrom confusingantonyms andunre- latedpairs.

FalseNegatives Weagainsampledabout 20%

falsepositive pairsfromboththebinaryandmul- ticlassexperiments andanalyzedthemajortypes oferrors.Most ofthesepairs hadonlyfe wco- occurrencesinthe corpusoftendue toinfrequent terms(e.g.cisc/riscwhichdefinecomputer ar- chitectures).Whileour modele ffectiv elyhandled negativeprefixes,itfailedtohandlenegati vesuf- fixescausingincorrectclassification ofpairslik e spiritless/spirited.Apossible futurework isto simplyextend thismodeltohandleneg ative suf- fixesaswell.

6Conclusion

Inthispaper ,wepresented anoriginaltechnique

forderiving antonymsusingparaphrasesfrom

PPDB.We alsoproposedanovel morphology-

awareneuralnetworkmodel,AntNET, whichim- provesantonymypredictionforpath-basedand combinedmodels.In additiontole xicalandsyn- tacticinformation,wesuggestedtoinclude anovel morphologicalneg ation-markingfeature.

Ourmodelsoutperform thebaselines intwo re-

lationclas sificationtasks.Wealsodemonstrated thatthene gation markingfeatureoutperformspre- viouslysuggestedpath-basedfeaturesforthistask.

Sinceour proposedtechniquesfor antonymyde-

tectionarecorpus based,they canbeapplied to differentlanguagesandrelations.The paraphrase- basedmethodcan beappliedto otherlanguages byextracting theparaphrasesfortheselanguages fromthePPDB andusing amorphologicalanaly- sistool(e.g. MorfetteforFrench (Chrupalaetal. ,

2008))orby lookingupthe negation prefixesin a

grammarbookfor languagesthatdo notdispose ofsucha tool.TheLSTM-based modelcould also beusedin otherlanguages sincethemethod iscor- pusbased,b utwew ouldneedtocreateatraining setforne wlanguages.This wouldnotho wever be toodifficult; thetrainingsetusedbythe modelis notthatbig (theoneused herewas ˜

5000pairs)and

couldbeeasily labelledthroughcro wdsourcing.

Wereleaseourcodeand thelarge-scale dataset

derivedfromPPDB,annotatedwithsemanticrela- tions.

Acknowledgments

Thismaterialis basedinpart onresearchspon-

soredbyD ARPAunder grantnumberFA8750-

13-2-0017(theDEFT program).TheU.S. Gov-

ernmentisauthorized toreproduceand distribute reprintsforGo vernmentalpurposes. Theviews andconclusionscontained inthispublication are thoseofthe authorsandshould notbeinterpreted asrepresentingof ficialpoliciesor endorsements ofDARP AandtheU.S.Government.

Thiswork hasalsobeensupportedbythe

FrenchNationalResearch Agencyunder project

ANR-16-CE33-0013andpartiallysupportedbyan

IntelICRI-CI grant,theIsrael ScienceFoundation

grant880/12, andtheGerman ResearchFounda- tionthroughthe German-IsraeliProjectCoopera- tion(DIP ,grantDA1600/1-1).

Wewouldlike tothankouranonymous review-

ersfortheir thoughtfulandhelpful comments.

References

ColinBannardand ChrisCallison-Burch.2005. Para-

phrasingwithBilingual ParallelCorpora. InPro- ceedingsofthe 43rdAnnual MeetingonAssociation forComputationalLinguistics (ACL'05) .Strouds- burg,PA,pages597-604.

ReginaBarzilayandKathleenR. McKeo wn.2001.Ex-

tractingP araphrasesfromaParallelCorpus. InPro- ceedingsof the39thAnnual MeetingonAssociation forComputational Linguistics(ACL '01).Toulouse,

France,pages 50-57.

WalterG.CharlesandGeor geA.Miller .1989.Con-

textsofantonymousadjecti ves.AppliedPsycholo gy

10:357-375.

GrzegorzChrupala,GeorgianaDinu, andJosefv an

Genabith.2008.Learning MorphologywithMor -

fette.InProceedingsoftheSixthInternational

ConferenceonLanguage Resourcesand Evalua-

tion(LREC'08).Marrakech, Morocco,pages2362- 2367.

ChristianeFellbaum,editor .1998. WordNet:anelec-

troniclexicaldatabase.MITPress.

J.R.Firth.1957.Asynopsisoflinguistictheory,1930-

1955.InStudiesinLinguistic Analysis,BasilBlack-

well,Oxford, UnitedKingdom,pages 1-32.

JuriGanitke vitch,BenjaminVanDurme,andChris

Callison-Burch.2013. PPDB:TheP araphrase

Database.InProceedingsofthe2013Confer -

enceofthe NorthAmericanChapter oftheAssoci- ationforComputational Linguistics:HumanLan- guageTechnologies(NAA CL/HLT).Atlanta,Geor - gia,pages758-764. ZelligS.Harris. 1954.Distributional structure.Word

10(23):146-162.

MartiHearst.1992. Automaticacquisitionof hy-

ponymsfromlargete xtcorpora.In Proceedings ofthe14th InternationalConference onCompu- tationalLinguistics(COLING'92) .Nantes,France, pages539-545.

SeppHochreiterand JurgenSchmidhuber .1997.

Longshort-termmemory .NeuralComputation

9(8):1735-1780.

ConstantineLignos.2010. LearningfromUnseen

Data.InProceedingsoftheMorphoChallenge2010

Workshop.AaltoUni versitySchool ofScienceand

Technology,Helsinki,Finland,pages35-38.

DekangLinand PatrickP antel.2001.DIR T-Discov-

eryofInference RulesfromT ext. InProceedings oftheSe venthA CMSIGKDDInternationalCon- ferenceonKnowledge Discoveryand DataMining (KDD'01).San Francisco,California,pages 323- 328.

DekangLin,Shaojun Zhao,LijuanQin, andMing

Zhou.2003.Identifying synonyms amongdistribu-

tionallysimilarw ords.In ProceedingsoftheEigh- teenthInternationalJ ointConfer enceonArtificial

Intelligence(IJCAI'03).Acapulco,Me xico,pages

1492-1493.

TomasMikolov, IlyaSutskever,KaiChen,GregCor-

rado,andJef freyDean. 2013.DistributedRepresen- tationsofW ordsandPhrases andtheirComposition- ality.InProceedingsofthe26thInternational Con- ferenceonNeuralInformation ProcessingSystems (NIPS'13).Lake Tahoe,Nevada,pages 3111-3119.

KimAnhNguyen, SabineSchulteim Walde,and

NgocThangV u.2017.Distinguishing antonyms

andsynonyms inapattern-basedneuralnetwork. In

Proceedingsofthe15thConfer enceofthe European

Chapterofthe AssociationforComputational Lin-

guistics(EACL '17).Valencia, Spain,pages76-85.

ElliePa vlick,JohanBos,MalvinaNissim,Charley

Beller,BenjaminVanDurme, andChrisCallison-

Burch.2015a.Adding SemanticstoData-Dri ven

Paraphrasing.InThe53rd AnnualMeeting

oftheAssociat ionforComputational Linguistics (ACL'15).Beijing,China, pages1512-1522.

ElliePa vlick,PushpendreRastogi,JuriGanitkevich,

andChrisCallison-Burch BenVan Durme.2015b.

PPDB2.0:Better paraphraseranking,fine-grained

entailmentrelations,w ordembeddings,and style classification.InProceedingsofthe53rd Annual

Meetingof theAssociationfor ComputationalLin-

guistics(A CL'15).Beijing,China, pages425-430.

JeffreyPennington,RichardSocher,and Christopher

Manning.2014.GloV e:Global VectorsforWord

Representation.InProceedingsofthe2014Con-

ferenceonEmpiricalMethods inNatural Language

Processing(EMNLP'14).Doha,Qatar ,pages1532-

1543.

MichaelRothand SabineSchulteim Walde. 2014.

CombiningWordP atterns andDiscourseMarkers

forP aradigmaticRelationClassification.InPro- ceedingsofthe 52ndAnnual Meetingofthe Associ- ationfor ComputationalLinguistics(A CL'14).Bal- timore,MD, pages524-530.

RoySchwartz,RoiReichart, andAriRappoport.2015.

SymmetricPattern BasedWordEmbeddingsforIm-

provedWordSimilarityPrediction.InProceed- ingsofthe NineteenthConference onComputational

NaturalLanguageLearning (CoNLL'15).Beijing,

China,pages258-267.

VeredShwartzandIdo Dagan.2016.CogALex-V

SharedTask: LexNET-IntegratedP ath-basedand

DistributionalMethodfortheIdentification ofSe-

manticRelations. InProceedingsofthe5thW ork- shopon CognitiveAspects oftheLexicon(CogALe x-

V).Osaka,Japan, pages80-85.

VeredShwartz,Yoa vGoldberg,andIdoDag an.2016.

ImprovingHypernymyDetection withanIntegrated

Path-basedand DistributionalMethod. InPro-

ceedingsofthe 54thAnnualMeeting ofthe As- sociationforComputational Linguistics(A CL'16).

Berlin,Germany ,pages2389-2398.

ASupplementalMaterial

Forderivinganton ymsusingPPDB,weused

theXXXLsize ofPPDBv ersion2.0 foundin http://paraphrase.org/.

Tocomputethemetrics inTables 6and7, We

usedscikit-learnwith the"a veragedsetup", which computesthemetrics foreach relationandreports theirav erageweightedbysupport(thenumberof trueinstancesfor eachrelation).Note thatitcan resultina F 1 scorethatis notthe harmonicmean ofprecisionand recall.

Duringpreprocessingwe handledremov alof

punctuation.Sinceour datasetonlycontains short phrases,weremo vedan ystopwordsoccurringat thebeginni ngofasentence(Example:aman! man)andwe alsoremov edplurals.The besthy- perparametersforall modelsmentionedin thispa- peraresho wninT able8.Thelearningratew as setto0.001 foralle xperiments.