Allograph-Based Categorization of Handwriting Styles
Allograph-Based Categorization of Handwriting Styles. Nils Rosengren. BA thesis in General Linguistics /. C-uppsats i allmän språkvetenskap. May 2002.
Automatic Generation of Large-scale Handwriting Fonts via Style
5 dic 2016 then automatically generate a handwriting font library in the user's personal style with huge amounts (e.g. 27533) of Chinese characters.
SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out
23 feb 2022 Specifically we propose a style bank to parameterize the specific handwriting styles as latent vectors
ReIReS
15 dic 2018 Christoph Winterer: Handwriting styles as clues for the dating of medieval manuscripts. Document reference: REIRES-WP5-D5.2b-MAINZ ...
Judges Writing Styles (And Do They Matter?)
Posner "Judges' Writing Styles (And Do They Matter?)
Handwriting Styles: Benchmarks and Evaluation Metrics
22 oct 2018 Handwriting Styles: Benchmarks and Evalua- tion Metrics. IEEE International Workshop on Deep and Transfer Learning (DTL 2018) Oct 2018
Content and Style Aware Generation of Text-line Images for
12 abr 2022 In the case of documents containing handwritten text the inter- and intra- writer variability of handwriting styles hinder the recognition ...
Early Mental Health Risk Assessment through Writing Styles Topics
25 sept 2020 rent Neural Networks (RNNs)) and an approach based on writing styles. For the second task related to early detection of depression
Handwriting Research: Style and Practice
The Concern for Handwriting Style: Manuscript and Cursive questions regarding style of writing catego- ... with which the two handwriting styles may.
Clustering Writing Styles with a Self-Organizing Map
be applied in the analysis of different handwriting styles. The analyzed handwriting samples Clusters of different personal writing styles can be found.
ClusteringWritingStyleswithaSelf-OrganizingMap
VuokkoVuori
LaboratoryofComputerandInformationScience
HelsinkiUniversityofTechnology
P.O.Box9800,FIN-02015HUT,Finland
Abstract
ThisworkshowshowaSelf-OrganizingMap(SOM)can
beappliedintheanalysisofdifferenthandwritingstyles.Theanalyzedhandwritingsampleshavebeencollectedin
on-linefashionwithspecialwritingequipmentssuchas pressuresensitivetablets.Thehandwritingstyleofanin- dividualsubjectisrepresented byavector,componentsof whichreflectthetendenciesofthewritertousecertain prototypicalstylesforisolatedalphanumericcharacters. Thisstudyshowsthatcorrelationsbetweendifferentwriting styles,bothcharacter-wiseandwriter-wisecanbefound. Clustersofdifferentpersonalwritingstylescanbefound bystudyingtheU-matrixviasualizationoftheSOMtrained withdatacollectedfromover700subjects.Anexamination ofthecomponentplanesoftheSOMrevealssomeinterest- ingcorrelationsbetweentheprototypicalcharacterstyles.1.Introduction
Inthiswork,naturalwritingstylesofseveralhundreds
ofwritersareanalyzed.Theaimofthestudyistofind arepresentationforpersonalwritingstyleswhichenables theircomparisonanddetectionofpossibleclusters.Inad- dition,correlationsbetweenthewritingstylesofcharacters ofdifferentclassesaresearchedfor.Thisworktriestofind answerstoquestionssuchas:"IfIknowhowyouwritelet- ter'a',canIinfersomethingaboutthewayyouwriteletter 'd'basedonwhatIknowaboutotherwriters?".Thiskind ofinformationmightbeusefulinautomaticrecognitionof handwrittencharacters[7]byhelpingtodistinguishconfus- ingcharacterswithoutusinganylinquisticorgeometrical contextofthecharacters,dictionary,oranyotherlanguage model.Inaddition,itmightbeusefulbyspeedingupthe recognitionprocesswhenusedinthepruningorordering oftheprototypesetrepresentingthedifferentwritingstyles ofthecharacters.Forearlierstudiesonautomaticcharac- terizationofhandwritingstyles,see[1],[3],[10],[11],and [17].Thewritingstyleofasinglewriterisrepresentedbya vector,componentsofwhichindicatethewriter'stenden- ciestousethewritingstylesidentifiedbythecharacterpro- totypes.Theprototypeshavebeenselectedbyhandfrom theresultsoffourdifferentclusteringalgorithmsappliedto adatabaseofhandwrittencharactersamplescollectedina on-linemodefromover700subjects[14].Inordertofind correlationsbetweenandwithin thewritingstylesofdiffer- entwriters,thewritingstylevectorsareanalyzedandvisu- alizedwithaSelf-OrganizingMap(SOM)[6].TheSOM- algorithmperformsanonlinearmappingwhichpreserves thelocaltopologicalpropertiesofthedataset.Clustersof thewritingstylevectorscanbefoundbystudyingtheU- matrix[12]ofaSOM.Theclusterscanbeexplainedbyex- aminingthecomponentplanesoftheSOM.Also,correlated writingstylesforisolatedcharacterscandetectedeasilyas theyproducesimilarcomponentplanes.2.Writingstylevectors
Thewritingstyleofanindividualwriterisrepresented
byavectorcalledhereawritingstylevector.Eachcom- ponentofawritingstylevectorcorrespondstoaspecific characterprototypeandindicatesthetendencyofthewriter tousethatparticularstyleforwritingcharactersoftheclass oftheprototype.Thenextsectionswillexplainindetailthe stepswhichhavebeentakeninordertoformthewriting stylevectorsforthewriters.First,thedissimilarity mea- surebetweenthecharactersamplesisdescribed.Next,the clusteringalgorithmsandthefinalprototypeselectionpro- cedureareexplained.Finally,thetransformationfroma dissimilaritymeasureintoasimilaritymeasureispresented andtheformationofthewritingstylevectorsfromaveraged similaritymeasuresisexplained.2.1.Dissimilaritymeasure
Thedissimilaritymeasureusedinthecharactercompar-
isonsisbasedontheDynamicTimeWarping(DTW)algo- rithm[9],whichisanonlinearcurvematchingmethod.The connectedpartsofadrawncurveinwhichthepenispressed downonthewritingsurfaceareconsideredasstrokes.The dissimilaritymeasureisdefinedonstrokebasissothatit isinfinitebetweentwocharactershavingdifferentnumbers ofstrokes.Thestrokesanddatapointsarematchedinthe sameorderastheywereproducedandthefirstandlastdata pointsofthetwocurvesarestrictlymatchedagainsteach other.TheDTW-algorithmfindsthepoint-to-pointcorre- spondencebetweenthecurveswhichsatisfiesthesecon- straintsandyieldstheminimumsumofthecostsassociated withthematchingsofthedatapoints.Acostformatching twodatapointsistheirsquaredEuclideandistance.Prototype-basedclassifiersusingDTW-baseddistances
havebeenshowntobewellsuitedforthehandwriting recognitiontaskbyseveralresearchers,andgoodrecogni- tionaccuraciescanbeobtainediftheprototypesethasa goodcoverageofthedifferenthandwritingstyles[15].In thiswork,theDTW-baseddissimilaritymeasureisusedin theclusteringalgorithmsasadistancemeasure.2.2.Clusteringandprototypeselection
Thecharacterdatabasewasclusteredinordertofindall
thedifferentwritingstylesforeachcharacterclassandto selectasetofprototypeswhichcapturesthewithin-class stylevariationswell.Allthecharacterclassesandstroke numbervariationsweretreatedseparately.Thisapproach doesnottakeinaccountthebetween-classvariationsand thefoundprototypesarenotoptimizedinthesenseoftheir classificationcapacity.Forsomepreviousworksonproto- typeselection,see[2],[8],and[18].Fourdifferentalgorithmswereusedfortheclusteringof
thecharactersamples:TreeClust,MinSwap,andtwovaria- tionsoftheC-meansalgorithm[4],namedhereCMeans1 andCMeans2.Allthefourclusteringalgorithmswereag- glomerativeandhierarchical.Clusterswererepresentedby prototypeswhichwerethesampleshavingtheminimum sumofdistancestotheothersamplesinthesamecluster.TreeClust,MinSwap,andCMeans2startedformasitua-
tioninwhichallthesampleswereprototypes,i.e.formed theirownclusters,whileinthebeginningoftheCMeans1- algorithm,onlyarandomsubsetofthesampleswasselected tobetheinitialprototypeset.Astheclusteringalgorithmsproceeded,thenumberof
clusterswasreducedbymergingofclusters.InTreeClust-,CMeans1-,andCMeans2-algorithmsthosetwoclusters
whoseprototypesweremostsimilartoeachotherwere mergedintoone.MinSwap-algorithmtriedseveralalterna- tivemergings,firsttheclusterswiththemostsimilarproto- typepair,thentheclusterswiththenextsimilarpairetc.Anewprototypewasselectedamongthesamples
whichbelongedtothenewcluster.Afterthat,MinSwap, CMeans1,andCMeans2reassignedthesamplesintotheclustersaccordingtotheclosestprototypesandthenres- electedtheprototypes.Thiswascontinueduntilastable divisionwasfound.MinSwapdidthesamethingbutalso calculatedhowmanyofthesampleswereswappedoutfrom thenewclusterintotheotherclusters,orviceversa,andse- lectedthealternativemergingwhichgaverisetothemini- mumnumberoftheseswappings.Thenumberofclusterswasfirstdeterminedautomati-
callybyusingtwoclusteringindices.However,itturned outthatmuchbetterresultscouldbeobtainedbyselecting theprototypesbyhandamongtheclustercentersfoundby thefourclusteringalgorithmsbecausetheresultsobtained withthedifferentclusteringalgorithmsandindicesvaried considerably[14].Thisguaranteedthateachdifferentwrit- ingstylefoundwithanyoftheclusteringalgorithmswas presentinthefinalprototypesetandthattheprototypes werenottoosimilartoeachother.Thetotalnumberof selectedprototypeswas2591.Someoftheselectedproto- typescanbeseeninFigure2.Evenifsomeoftheproto- typeslookverysimilartoeachother,saytheprototypesof letter'I'inthe5thand6throwsorprototypesofdigit'5' inthelastrow,theydohavedifferentnumbersofstrokes, differentdrawingorders,ordirectionsforthestrokes.2.3.Transformingdissimilarityintosimilarity
ThedissimilaritymeasureobtainedwiththeDTW-
algorithmhasarangefromzerotoinfinityanditdependson thenumbersofdatapointsandstrokes.Therefore,thedis- similaritiesbetweenstrokeshavebeennormalizedbythe numberofdatapointmatchingsandthetotaldissimilarities havebeendividedbythenumberofstrokes.Afterthesenor- malizations,thedissimilarities( ?)havebeentransformed intosimilaritymeasures( ?)inthefollowingway: (1)Thesimilaritymeasureisadecreasingfunctionofthe
normalizeddissimilaritymeasureanditsrangeisbetween zeroandone.Thevalueofparameter wasselectedsothatthedistributionofthesimilaritymea- suresbetweencharactersamplesandtheirbestmatching correctprototypesisapproximatelyeven.Inpractice,this wasachievedbyfittingalinearfunction,whichwasdefined byparameter ?,intheminimumsquarederrorsensetothe logarithmofthecumulativeprobabilityfunctionofthedis- similaritymeasures.2.4.Formingthewritingstylevectors
Writer'stendenciestousetheprototypicalstylesforiso- latedcharactersaremeasuredbyaveragesimilarityvalues. Theaveragesimilarityvalueofaprototypeiscalculatedby:1)evaluatingthesimilarityvaluesbetweentheprototype
andallthewriter'scharactersamplesofthesameclassand havingthesamenumberofstrokes,2)summingupthesim- ilarityvalues,and3)finallydividingthesumbythenumber ofitsterms.Theaveragesimilarityvaluesareconcatenated intoawritingstylevector.Thedimensionalityofawriting stylevectoristhesameasthesizeoftheprototypeset.Ifa subjecthadnosamplesatallforsomeclass,alltheaverage similarityvaluescorrespondingtothatclasswereconsid- eredtobemissingfromthewritingstylevectoranddidnot haveanyeffectinthetrainingoftheSOM.Ifawriterhad onlyonecharactersampleforsomeclass,hisorherten- denciestousetheprototypicalstylesofthatclasswerees- timatedbysinglesimilarityvaluesinsteadofaveragedsim- ilarityvalues.Insuchcases,thewriter'stendenciestouse theprototypicalstylesconsistingofadifferentnumberof strokesthanthecollectedsamplearezero.Inaddition,a singlesampleleadstoanassumptionthatthewriteruses onlythewritingstylecorrespondingtothebestmatching prototypeasthesimilarityvaluesbetweenthesampleand theotherprototypesareinmostcasesveryclosetozero.Forthesamereason,thesumoftheaveragesimilarityval-
uescalculatedforprototypesofthesameclassandhaving thesamenumberofstrokesisrarelyoverone.3.Data
Theexperimentswereperformedwithtwopublic
databases:IRONOFF[13]andUNIPENtrain_r01_v07[5].Onlyisolateddigitsandupperandlowercaseletterswere
usedintheexperiments.Thetwodatabaseswerecombined intoone,allthecharactersamplesweremanuallychecked andobviouslyerroneousoneswereremoved.Mostofthe erroneoussampleswereincorrectlysegmented.Intotal,3174erroneoussampleswerefound.Thetotalnumberof
samplesinthecleaneddatabasewas130831.Thesesam- pleswerewrittenby728subjects.Thesubjectswereof variousagesandfromseveralcountriesandbothhanded- nessgroupswererepresented.Inmyopinion,itisjustified toassumethatthedatabasehasarathergoodcoverageof theexistingwritingstyles.Thecharactersampleshavebeencollectedwith
pressure-sensitivedisplaysortabletswhichareableto recordthex-andy-coordinatesofamovingpenpoint.As therewereseveralcontributorsandthereforemanydifferent collectionsoftwaresanddevices,allthecharactersamples werepreprocessedsothattheirdatapointsweresimilarly distributed.Itwasdonebyfirstinterpolatingstraightlines betweentheoriginaldatapointsandthenresamplingnew datapointswhichwereequallyspacedontheestimatedpen trace.InordertomaketheDTW-basedcomparisonofthe charactersamplesreasonable,thesizeandlocationvaria-tionsofcharacterswerebenormalized.Themasscentersofthecharacterweremovedtotheoriginofthecoordinatesys-
tem.Thecharacterswerescaledsothatthelongersidesof thesmallestboxesdrawnaroundthecharactersandaligned withthecoordinateaxeshadaconstantvalue.Thescaling ofthecharacterswasperformedpriortotheresampling.No otherfeatureswereusedforrepresentingpentracesbutthe x-andy-coordinates.4.CreatingaSOMofdifferentwritingstyles
ASOMisaneuralnetworkinwhichtheneuronsarecon-
nectedtoeachothersothattheyformaregularlattice.Each neuronactsbothasaninputandoutputneuronandisasso- ciatedwithareferencevector.Thereferencevectorsare comparedwiththenetwork'sinput.Theoutputsoftheneu- ronsdependonhowsimilartheinputandreferencevectors are.Theneuron,referencevectorofwhichismostsimi- lartotheinputvector,iscalledthebest-matchingmapunit (BMU).Duringthetrainingofthenetwork,thereference vectorsoftheBMUsandtheirneighboringneuronsareup- datedsothattheybetterrepresenttheinputvectors,inthis workthewritingstylevectors.Duetosuchtraining,differ- entneuronswillspecializeinrepresentingdifferentareasof theinputspace.Inaddition,neuronsneartoeachotherin theneuronlatticetendtocorrespondtoareasclosetoeach otherintheinputspace.Therefore,aSOMcanbeseen asanonlinearmappingfromtheinputspacetothelower- dimensionallatticespace.TheSOM'sabilitytorepresent thetrainingdatafaithfullydependsonthetruedimension- alityofthedatasetandonthesizeanddimensionalityof theneuronlattice.Asthemaininterestofthisworkistofindcorrelations
betweenthewriters,allthestylesusedbyonlyasingle writerwereomittedfromthewritingstylevectors.So,all theprototypesforwhichtheaveragesimilaritywasabove0.05onlyforasinglewriterwereconsideruninteresting.
Thisway,thedimensionalityofthewritingstylevectors
wasreducedfrom2591downto1764.Thekeptprototypes wereusedby146subjectsontheaverage.Approximately11%oftheaveragesimilarityvaluesweremissingfromthe
writingstylevectors.The1764-dimensionalwritingstyle vectorswerefurtheranalyzedwithaSOMinhopeoffind- inginterestingstructuressuchasclustersofwriters.VariousalternativesfortheSOM'ssize,lattice,neigh-
borhoodfunction,trainingalgorithm,trainingparameter andepochs,initialization,andupdatingrulewereexperi- mentedwith.DifferentSOMswerecomparedwitheach otherbyusingtwoqualitymeasures:quantizationerrorand abilitytopreservethetopologyofthedata.Theformer measureistheaveragedistancebetweeneachwritingstyle vectoranditsBMU.Thelatteroneistheproportionofall datavectorsforwhichthefirstandsecondBMUsarenot adjacentunits.Figure1.U-matrixformedforthe1764-
dimensionalwritingstylevectors.ThesizeoftheSOMwasfixedto20
?10neuronunits whichisapproximately30%ofthenumberofwriters.The topologyofthemapwasselectedtobeasheetwithhexag- onallatticeandGaussianneighborhoodfunction.Alinear initializationalongthefirsttwoprincipaldirectionsofthe dataprovedtoproducebetterresultsthanarandominitial- ization.ThebatchtrainingalgorithmwasappliedwithEu- clideanmetricastheircombinationprovidedmuchfaster andreliableconvergencethananon-linetrainingalgorithm orametricbasedontheanglebetweentwovectors.Thetrainingwascarriedoutinthreephases.Inthefirst phase,roughtraining,theradiusoftheneighborhoodwas linearlydecreasedfrom10to6during10trainingepochs.Inthesecondphase,theradiuswasdecreasedfrom5to
3during50epochs.Finally,inthefine-tuningphase,the
radiuswasdecreasedfrom2to1during100epochs.An epochmeansthattheBMUsarefoundforallthetraining samplesandtotalerrorsarecalculatedforallthemapunits, bothfortheBMUsandtheirneighboringmapunitsonthe hexagonallattice.Theneighborhoodfunctionanditsradius determinehowtheerrorsaredistributedtothemapunits aroundtheBMUs.Afterfindingthetotalerrors,allthemap unitsarethenupdatedsimultaneouslyonthebasisofthe totalerrorssothattheybetterrepresentthetrainingsam- ples.Thenumberoftheepochsinthefine-tuningphaseis perhapsunnecessarilylargebuttherewasnoneedtoopti- mizeitasthebatchtrainingwasratherfasttakingaboutten minutesintotal.Theproportionofallwritingstylevectors forwhichthefirstandsecondbest-matchingmapunitswere notadjacentwas0.01.Therefore,itcanbesaidthatthemap preservesthelocaltopologicalrelationsofthewritingstyle vectorsratherwell.5.Analysisofthewritingstylemap
TheU-matrixofaSOMishelpfulindetectingclusters
onthemap.Itscoloringisbasedonthedistancesbetween neighboringmapunits.Areasinwhichtheneighboring mapunitsaresimilartoeachotherarecoloredwithdark gray,whereaslightshadesindicatethatthedifferencesbe- tweentheneighboringunitsaremoresignificant.There- fore,clustersofpersonalwritingstylescanbeseenontheU-matrixasdarkareassurroundedbylighterareas.The
SOMcanalsobevisualizedwithimagescoloredaccording
tothevaluesofthecomponentsofthereferencevectors.Theseimagesarecalledcomponentsplanes.Component
planesshowhowthetendenciestousethecorresponding prototypicalcharacterstylesvaryoverthemap.TheU-matrixandsomeinterestingcomponentplanesof
theconstructedSOMareshowninFigures1and2.Itcanbe seenfromtheU-matrixoftheSOMthatthewritingstyles canroughlybedividedintoseveralclusters.Therearesmall clustersintheleftandrightlowercornersoftheSOMsur- face,aslightlybiggeroneabovethemontheverticalmiddle lineofthemap,threesmallclustersontherightedgeofthe map,atriangular-shapedclusterneartheupperedgeofthe map,andthreeclustersontheleftedgeonandabovethe horizontalmiddlelineofthemap.Theinterestingcomponentplanesarethosewhichshow
significantvariancebetweenthemapunits.Here,thecom- ponentplaneswhoserangeisatleast0.30havebeense- lectedforfurtherexamination.Inthesecases,itcanbe claimedthattherereallyaresomedifferencesintheten-Figure2.Someinterestingcomponentplanes
withthecorrespondingprototypes.Thequotesdbs_dbs19.pdfusesText_25[PDF] hannah arendt condition de l'homme moderne pdf
[PDF] hannah arendt le système totalitaire chapitre 4
[PDF] hannah arendt les origines du totalitarisme ebook
[PDF] hannah arendt les origines du totalitarisme fnac
[PDF] hannah arendt pdf
[PDF] hannah arendt the origins of totalitarianism
[PDF] happiness is the key to success
[PDF] happy new year 2018 art
[PDF] happy new year 2018 youtube
[PDF] haraka men gov ma
[PDF] harding university 10 year reunion 2006
[PDF] harga mio injection 123 2015 second
[PDF] haricot blanc in english
[PDF] harris interactive