Word in English Meaning in Hindi Globally Competitive ??????? ??? ?? Usages in Hindi assets by way of the above options, land
hindi meaning you have noticed in hindi head, falls with is left in truth to children: on jews as hamartomas or hindi meaning If noticed above
32 above board (adv) - ????? ??? dealing above board in all Meaning Usages in English Usages in Hindi employee will be fixed in
Meaning in Hindi Usages in English Usages in Hindi 1 Instrument Of Accession ??????? ???? The Instrument of accession of
and ma in Hindi — and his interest is stirred (Hoey, p, 1012) But what even vague meanings do fer and mit share in the examples above?
Comments: Scope of social benefits may vary and as such, it may not be possible to put a holistic definition Though both ITC and GFSM 2001 capture social
1 Name of the tree in Hindi, local language and English above Consult the dictionary to find more meanings of each of these words a still
29 jui 2020 · Candidates may also note that in respect of the above, versa or two years? experience of translation work from Hindi to English and vice
You are, therefore, requested once again to persue the matter at the Govt level immediately The Principals of defaulting colleges who have not submitted
करें। 31 above average औसत सेअधिक This year, the Bank has Meaning Usages in English Usages in Hindi budget presented a balanced
features in a sentence This is illustrated by the Hindi rendering of the above noted English sentence vygvasiyika ksetro me sadg manusya kI prakrti kg malya
This paper describes the English–Hindi Multilingual appropriate Hindi translation to an ambiguous tar- stances drawn from the BNC as described above,
1All Hindi words have been written in ITrans using http:// sanskritlibrary org/ transcodeText html showed improvements of 29 , 34 and 68 over
PDF document for free
- PDF document for free
1302_4W04_0802.pdf TheSENSEVAL-3MultilingualEnglish-HindiLexicalSampleTask
TimothyChklovski
InformationSciencesInstitute
UniversityofSouthernCalifornia
MarinadelRey,CA90292
timc@isi.eduRadaMihalcea
DepartmentofComputerScience
UniversityofNorthTexas
Dallas,TX76203
rada@cs.unt.edu
TedPedersen
DepartmentofComputerScience
UniversityofMinnesota
Duluth,MN55812
tpederse@d.umn.eduAmrutaPurandare
DepartmentofComputerScience
UniversityofMinnesota
Duluth,MN55812
pura0010@d.umn.edu
Abstract
ThispaperdescribestheEnglish-HindiMultilingual
lexicalsampletaskinSENSEVAL-3.Ratherthan tagginganEnglishwordwithasensefromanEn- glishdictionary,thistaskseekstoassignthemost appropriateHinditranslationtoanambiguoustar- getword.TrainingdatawassolicitedviatheOpen
MindWordExpert(OMWE)fromWebuserswho
arefluentinEnglishandHindi.
1Introduction
ThegoaloftheMultiLinguallexicalsampletask
istocreateaframeworkfortheevaluationofsys- temsthatperformMachineTranslation,withafo- cusonthetranslationofambiguouswords.The taskisverysimilartothelexicalsampletask,ex- ceptthatratherthanusingthesenseinventoryfrom adictionarywefollowthesuggestionof(Resnikand
Yarowsky,1999)andusethetranslationsofthetar-
getwordsintoasecondlanguage.Inthistaskfor
SENSEVAL-3,thecontextsareinEnglish,andthe
"sensetags"fortheEnglishtargetwordsaretheir translationsinHindi.
Thispaperoutlinessomeofthemajorissuesthat
aroseinthecreationofthistask,andthendescribes theparticipatingsystemsandsummarizestheirre- sults.
2OpenMindWordExpert
Theannotatedcorpusrequiredforthistaskwas
builtusingtheOpenMindWordExpertsystem (ChklovskiandMihalcea,2002),adaptedformul- tilingualannotations1.
Toovercomethecurrentlackoftaggeddataand
thelimitationsimposedbythecreationofsuchdata usingtrainedlexicographers,theOpenMindWord
1MultilingualOpenMindWordExpertcanbeaccessedat
http://teach-computers.org/word-expert/english-hindiExpertsystemenablesthecollectionofsemantically annotatedcorporaovertheWeb.Taggedexamples arecollectedusingaWeb-basedapplicationthatal- lowscontributorstoannotatewordswiththeirmean- ings.
Thetaggingexerciseproceedsasfollows.For
eachtargetwordthesystemextractsasetofsen- tencesfromalargetextualcorpus.Theseexamples arepresentedtothecontributors,togetherwithall possibletranslationsforthegiventargetword.Users areaskedtoselectthemostappropriatetranslation forthetargetwordineachsentence.Theselection ismadeusingcheck-boxes,whichlistallpossible translations,plustwoadditionalchoices,"unclear" and"noneoftheabove."Althoughusersareencour- agedtoselectonlyonetranslationperword,these- lectionoftwoormoretranslationsisalsopossible.
Theresultsoftheclassificationsubmittedbyother
usersarenotpresentedtoavoidartificialbiases.
3SenseInventoryRepresentation
Thesenseinventoryusedinthistaskisthesetof
HinditranslationsassociatedwiththeEnglishwords
inourlexicalsample.Selectinganappropriate
English-Hindidictionarywasamajordecisionearly
inthetask,anditraisedanumberofinterestingis- sues.
Wewereunabletolocateanymachinereadable
orelectronicversionsofEnglish-Hindidictionaries, soitbecameapparentthatwewouldneedtomanu- allyentertheHinditranslationsfromprintedmate- rials.WebrieflyconsideredtheuseofOpticalChar- acterRecognition(OCR),butfoundthatouravail- abletoolsdidnotsupportHindi.Evenafterdeciding toentertheHinditranslationsmanually,itwasn't clearhowthosewordsshouldbeencoded.Hindiis usuallyrepresentedinDevanagariscript,whichhas alargenumberofpossibleencodingsandnoclear
standardhasemergedasyet. Association for Computational Linguistics for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems
WedecidedthatRomanizedortransliterated
Hinditextwouldbethethemostportableencoding,
sinceitcanberepresentedinstandardASCIItext.
However,itturnedoutthatthenumberofEnglish-
Hindibilingualdictionariesismuchlessthanthe
numberofHindi-English,andthenumberthatuse transliteratedtextissmallerstill.
Still,welocatedonepromisingcandidate,the
English-HindiHippocreneDictionary(Rakerand
Shukla,1996),whichrepresentsHindiinatranslit-
eratedform.However,wefoundthatmanyEnglish wordsonlyhadtwoorthreetranslations,makingit toocoarsegrainedforourpurposes2.
IntheendweselectedtheChambersEnglish-
Hindidictionary(Awasthi,1997),whichisahigh
qualitybilingualdictionarythatusesDevanagari script.Weidentified41Englishwordsfromthe
Chambersdictionarytomakeupourlexicalsam-
ple.Thenoneofthetaskorganizers,whois fluentinEnglishandHindi,manuallytransliter- atedtheapproximately500Hinditranslationsof the41Englishwordsinourlexicalsamplefrom theChambersdictionaryintotheITRANSformat (http://www.aczone.com/itrans/).ITRANSsoftware wasusedtogenerateUnicodefordisplayinthe
OMWEinterfaces,althoughthesensetagsusedin
thetaskdataaretheHinditranslationsintransliter- atedform.
4TrainingandTestData
TheMultiLinguallexicalsampleismadeupof41
words:18nouns,15verbs,and8adjectives.This sampleincludesEnglishwordsthathavevaryingde- greesofpolysemyasreflectedinthenumberofpos- sibleHinditranslations,whichrangefromalowof
3toahighof39.
Textsamplesmadeupofseveralhundredin-
stancesforeachof31ofthe41wordsweredrawn fromtheBritishNationalCorpus,whilesamplesfor theother10wordscamefromtheSENSEVAL-2En- glishlexicalsampledata.TheBNCdataisina "raw"textform,wherethepartofspeechtagshave beenremoved.However,theSENSEVAL-2datain- cludestheEnglishsense-tagsasdeterminedbyhu- mantaggers.
Aftergatheringtheinstancesforeachwordin
thelexicalsample,wetokenizedeachinstanceand removedthosethatcontaincollocationsofthetar- getword.Forexample,thetraining/testinstances forarm.ndonotincludeexamplesforcontactarm,
2Wehavemadeavailabletranscriptionsoftheentriesfor
approximately70Hippocrenenouns,verbs,andadjectives athttp://www.d.umn.edu/˜pura0010/hindi.html,althoughthese werenotusedinthistask.pickuparm,etc.,butonlyexamplesthatrefertoarm asasinglelexicalunit(notpartofacollocation).In ourexperience,disambiguationaccuracyoncollo- cationsofthissortisclosetoperfect,andweaimed toconcentratetheannotationeffortonthemoredif- ficultcases.
ThedatawasthenannotatedwithHinditransla-
tionsbywebvolunteersusingtheOpenMindWord
Expert(bilingualedition).Atvariouspointsintime
weofferedgiftcertificatesasaprizeforthemost productivetaggerinagivenday,inordertospur participation.Atotalof40volunteerscontributedto thistask.
Tocreatethetestdatawecollectedtwoindepen-
denttagsperinstance,andthendiscardedanyin- stanceswherethetaggersdisagreed.Thus,each instancethatremainsinthetestdatahascomplete agreementbetweentwotaggers.Forthetraining data,weonlycollectedonetagperinstance,and thereforethisdatamaybenoisy.Participatingsys- temscouldchoosetoapplytheirownfilteringmeth- odstoidentifyandremovethelessreliablyanno- tatedexamples.
AftertaggingbytheWebvolunteers,therewere
twodatasetsprovidedtotaskparticipants:one wheretheEnglishsenseofthetargetwordisun- known,andanotherwhereitisknowninboththe trainingandtestdata.Thesearereferredtoasthe translationonly(t)dataandthetranslationandsense (ts)data,respectively.Thetdataismadeupofin- stancesdrawnfromtheBNCasdescribedabove, whilethetsdataismadeupoftheinstancesfrom
SENSEVAL-2.Evaluationswererunseparatelyfor
eachofthesetwodatasets,whichwerefertoasthe tandtssubtasks.
Thetdatacontains31ambiguouswords:15
nouns,10verbs,and6adjectives.Thetsdatacon- tains10ambiguouswords:3nouns,5verbs,and2 adjectives,allofwhichhavebeenusedintheEn- glishlexicalsampletaskofSENSEVAL-2.These words,thenumberofpossibletranslations,andthe numberoftrainingandtestinstancesareshownin
Table1.Thetotalnumberoftraininginstancesin
thetwosub-tasksis10,449,andthetotalnumberof testinstancesis1,535.
5ParticipatingSystems
Fiveteamsparticipatedinthetsubtask,submitting
atotalofeightsystems.Threeteams(asubsetof thosefive)participatedinthetssubtask,submitting atotaloffivesystems.Allsubmittedsystemsem- ployedsupervisedlearning,usingthetrainingex- amplesprovided.Someteamsusedadditionalre- sourcesasnotedinthemoredetaileddescriptions Table1:TargetwordsintheSENSEVAL-3English-Hinditask LexicalUnitTranslationsTrainTestLexicalUnitTranslationsTrainTestLexicalUnitTranslationsTrainTest
TRANSLATIONONLY(T-DATA)
band.n822491bank.n2133252case.n1334842 different.a432025eat.v327148field.n14300100 glass.n837913hot.a1834832line.n3936011 note.v1122012operate.v928050paper.n826473 plan.n821035produce.v726567rest.v1417210 rule.v816018shape.n832032sharp.a1624848 smell.v521017solid.a1632737substantial.a15250100 suspend.v437028table.n2137816talk.v634135 taste.n635040terrible.a420099tour.n52409 vision.n1431820volume.n930954watch.v10300100 way.n1633122TOTAL34889451336
TRANSLATIONANDSENSEONLY(TS-DATA)
bar.n1927839begin.v636015channel.n69216 green.a917526nature.n157114play.v1415210 simple.a916619treat.v710032wash.v161011 work.v2410017TOTAL1251504199 below.
5.1NUS
TheNUSteamfromtheNationalUniversityofSin-
gaporeparticipatedinboththetandtssubtasks.The tsystem(nusmlst)usesacombinationofknowledge sourcesasfeatures,andtheSupportVectorMachine (SVM)learningalgorithm.Theknowledgesources usedincludepartofspeechofneighboringwords, singlewordsinthesurroundingcontext,localcol- locations,andsyntacticrelations.Thetssystem (nusmlsts)doesthesame,butaddstheEnglishsense ofthetargetwordasaknowledgesource.
5.2LIA-LIDILEM
TheLIA-LIDILEMteamfromtheUniversit´ed'
AvignonandtheUniversit´eStendahlGrenoblehad
twosystemswhichparticipatedinboththetandts subtasks.Inthetssubtask,onlytheEnglishsense tagswereused,nottheHinditranslations.
TheFL-MIXsystemusesacombinationofthree
probabilisticmodels,whichcomputethemostprob- ablesensegivenasixwordwindowofcontext.The threemodelsareaPoissonmodel,aSemanticClas- sificationTreemodel,andaKnearestneighbors searchmodel.Thissystemalsousedapartofspeech taggerandalemmatizer.
TheFC-MIXsystemisthesameastheFL-MIX
system,butreplacescontextwordsbymoregen- eralsynonym-likeclassescomputedfromaword alignedEnglish-Frenchcorpuswhichnumberap- proximately850,000wordsineachlanguage.5.3HKUST
TheHKUSTteamfromtheHongKongUniversity
ofScienceandTechnologyhadthreesystemsthat participatedinboththetandtssubtasks
TheHKUST
metandHKUSTmetssys- temsaremaximumentropyclassifiers.The HKUST combtandHKUSTcombtssystems arevotedclassifiersthatcombineanewKernel
PCAmodelwithamaximumentropymodeland
aboosting-basedmodel.TheHKUST comb2t andHKUST comb2tsarevotedclassifiersthat combineanewKernelPCAmodelwithamaximum entropymodel,aboosting-basedmodel,anda
NaiveBayesianmodel.
5.4UMD
TheUMDteamfromtheUniversityofMarylanden-
tered(UMD-SST)inthettask.UMD-SSTisasu- pervisedsensetaggerbasedontheSupportVector
Machinelearningalgorithm,andisdescribedmore
fullyin(Cabezasetal.,2001).
5.5Duluth
TheDuluthteamfromtheUniversityofMinnesota,
Duluthhadonesystem(Duluth-ELSS)thatpartici-
patedinthettask.Thissystemisanensembleof threebaggeddecisiontrees,eachbasedonadiffer- enttypeoflexicalfeature.Thissystemwasknown asDuluth3inSENSEVAL-2,anditisdescribedmore fullyin(Pedersen,2001).
6Results
Allsystemsattemptedallofthetestinstances,so
precisionandrecallareidentical,hencewereport
Table2:tSubtaskResults
SystemAccuracy
nusmlst63.4
HKUSTcombt62.0
HKUSTcomb2t61.4
HKUSTmet60.6
FL-MIX60.3
FC-MIX60.3
UMD-SST59.4
Duluth-ELSS58.2
Baseline(majority)51.9
Table3:tsSubtaskResults
SystemAccuracy
nusmlsts67.3
FL-MIX64.1
FC-MIX64.1
HKUSTcombts63.8
HKUSTcomb2ts63.8
HKUSTmets60.8
Baseline(majority)55.8
thesingleAccuracyfigure.Tables2and3showre- sultsforthetandtssubtasks,respectively.
Wenotethattheparticipatingsystemsallex-
ceededthebaseline(majority)classifierbysome margin,suggestingthatthesensedistinctionsmade bythetranslationsareclearandprovidesufficient informationforsupervisedmethodstolearneffec- tiveclassifiers.
Interestingly,theaverageresultsonthetsdataare
higherthantheaverageresultsonthetdata,which suggeststhatsenseinformationislikelytobehelpful forthetaskoftargetedwordtranslation.Additional investigationsarehoweverrequiredtodrawsomefi- nalconclusions.
7Conclusion
TheMultilingualLexicalSampletaskin
SENSEVAL-3featuredEnglishambiguouswords
thatweretobetaggedwiththeirmostappropriate
Hinditranslation.Theobjectiveofthistaskisto
determinefeasibilityoftranslatingwordsofvarious degreesofpolysemy,focusingontranslationof specificlexicalitems.Theresultsoffiveteams thatparticipatedinthiseventtentativelysuggest thatmachinelearningtechniquescansignificantly improveoverthemostfrequentsensebaseline. Additionally,thistaskhashighlightedcreationoftestingandtrainingdatabyleveragingthe knowledgeofbilingualWebvolunteers.The trainingandtestdatasetsusedinthisexerciseare availableonlinefromhttp://www.senseval.organd http://teach-computers.org.
Acknowledgments
ManythankstoallthosewhocontributedtotheMul-
tilingualOpenMindWordExpertproject,making thistaskpossible.Wearealsogratefultoallthepar- ticipantsinthistask,fortheirhardworkandinvolve- mentinthisevaluationexercise.Withoutthem,all thesecomparativeanalyseswouldnotbepossible.
Weareparticularlygratefultoaresearchgrant
fromtheUniversityofNorthTexasthatprovidedthe fundingforcontributorprizes,andtotheNational
ScienceFoundationfortheirsupportofAmrutaPu-
randareunderaFacultyEarlyCAREERDevelop- mentAward(#0092784).
References
S.Awasthi,editor.1997.ChambersEnglish-Hindi
Dictionary.SouthAsiaBooks,Columbia,MO.
C.Cabezas,P.Resnik,andJ.Stevens.2001.Su-
pervisedsensetaggingusingSupportVectorMa- chines.InProceedingsoftheSenseval-2Work- shop,Toulouse,July.
T.ChklovskiandR.Mihalcea.2002.Buildinga
sensetaggedcorpuswiththeOpenMindWord
Expert.InProceedingsoftheACLWorkshopon
WordSenseDisambiguation:RecentSuccesses
andFutureDirections,Philadelphia.
T.Pedersen.2001.Machinelearningwithlexical
features:TheDuluthapproachtoSenseval-2.In
ProceedingsoftheSenseval-2Workshop,pages
139-142,Toulouse,July.
J.RakerandR.Shukla,editors.1996.Hip-
pocreneStandardDictionaryEnglish-Hindi
Hindi-English(WithRomanizedPronunciation).
HippocreneBooks,NewYork,NY.
P.ResnikandD.Yarowsky.1999.Distinguish-
ingsystemsanddistinguishingsenses:Neweval- uationmethodsforwordsensedisambiguation.
NaturalLanguageEngineering,5(2):113-133.
Hindi Documents PDF, PPT , Doc