Cross-Domain Image Retrieval With a Dual Attribute-Aware Ranking PDF

Despite recent advances FIR still has limitations for application to real-world visual searches. The main reason for this is not only the trade-off between the

Memory-Augmented Attribute Manipulation Networks for Interactive

Fashion search with attribute manipulation. The user provides a clothing image with additional description about wanted attributes yet not included in the

Which Is Plagiarism: Fashion Image Retrieval Based on Regional

With the proposed regional attention we compare images region by region and find it is better than the typical attention for the plagiarized clothes retrieval

Generative Attribute Manipulation Scheme for Flexible Fashion Search

Fashion Search; Generative Adversarial Networks; Attribute Ma- prototype image generation and metric learning for fashion search.

Studio2Shop: from studio photo shoots to fashion articles

Shaw. 18 1439 AH new means of making fashion searchable and helping shoppers find the articles ... shop where the query picture is a photo shoot image of.

A new clothing image retrieval algorithm based on sketch

tive sketch-based clothing image retrieve system to search clothing image by using mobile visual sensors. Although sketching in mobile sensors is an expres-.

Learning Attribute Representations With Localization for Flexible

credible fashion search platform should be able to (1) find images that share the same attributes as the query image. (2) allow users to manipulate certain

Where to Buy It: Matching Street Clothing Photos in Online Shops

Given a real-world photo of a clothing item e.g. taken on the street

Personal Clothing Retrieval on Photo Collections by Color and

the query image. However visual search of dressed clothing in photo collections is a challenging task

Cross-Domain Image Retrieval With a Dual Attribute-Aware Ranking

user photo depicting a clothing image our goal is to re- image search [36] aims at identifying a product

scenarios.Inaddition, wehav ealsoobtained correspondingfine-grainedclothingattrib utes(e.g.,clothingcolor ,collar

pattern,sleev eshape,sleevelength,etc.)fromthe available onlineproductdescription, withoutsignificantannotation cost.Asdata pre-processing,inorder toremov etheimpact ofclutteredbackgrounds, whichpredominantlye xistforthe offlineimages,weemploy anenhancedR-CNN detectorto localizetheclothing areainthe image,withsome refine- mentsparticularlymade fortheclothing detectionproblem.

Foraddressingtheproblemof cross-domainretriev al,

weproposea novel DualAttribute-a wareRankingNetwork (DARN)forretrieval featurelearning.D ARNconsistsof twosub-networkswithsimilar structure.Eachofthetwo domainimagesare fedintoeach ofthetw osub-networks. Thisspecificdesign aimstodiminish thediscrepancy ofon- lineandof flineimages.

Thetwo sub-networksaredesignedtobe drivenbyse-

manticattribute learning,sowecallthemattrib ute-aware networks.Theintuitionisto createapo werfulsemantic representationofclothing ineachdomain, bylev eraging thevast amountsofdataannotatedwithfine-grained cloth- ingattributes. Tree-structurelayersareembeddedinto each sub-networkforthecomprehensiv eintegration ofattributes andtheirfull relations.Specifically, thelow-le vellayers of thesub-network aresharedforlearningthelo w-level rep- resentation.Then,a setoffully connectedlayersin atree- structureareused toconstructthe high-level component, witheachbranch modellingoneattrib ute.

Basedonthe learnedsemanticfeatures fromeach

rankobjectiv etofurtherenhancetheretrievalfeature rep- resentation.Specifically, thetripletrankinglossisused toconstrainthe featuresimilarityof triplets,i.e.,thefea- turedistancebetween theonline-offline imagepairmust be smallerthanthat ofoffline imageandan yotherdissimilar onlineimages.

Generally,theretrieval featuresfromD ARNhavesev-

eraladvantages comparedwiththedeepfeaturesof other works[

19,8].(1)By usingthedual-structure network,our

modelcanhandle thecross-domainproblem moreappro- priately.(2)Ineachsub-netw ork,thescenario-specific se- manticrepresentationof clothingiselaborately capturedby ticrepresentation,the visualsimilarityconstraint enables moreeffecti vefeaturelearningfortheretrievalproblem.

Insummary, themaincontributionsofourpaper are:

1.We collectauniquedatasetcomposedof cross-

scenarioimagepairs withfine-grainedattrib utes.The numberofonline imagesisabout 450,000,withad- ditional90,000of flinecounterpartscollected. Each imagehasabout 5-9semanticattrib utecategories, withmorethan ahundredpossible attributev alues.

Thisonline-offline imagepairdatasetprovidesa train-ing/testingplatformfor manyreal-w orldapplicationsrelatedtoclothing analytics.We areplanningto re-leasethefull datasettothe communityforresearch purposesonly.

2.We proposetheDualAttribute-Aw areRankingNet-

workwhichsimultaneouslyintegrates theattributes andvisualsimilarity constraintintothe retrieval fea- turelearning.W edesigntree-structure layerstocom- fullrelations,which providesa newinsight onmulti- labellearning.W ealsointroduce thetripletlossfunc- tionwhichperfectly fitsintothe deepnetwork training.

3.We conductextensivee xperimentsproving theeffec-

tivenessandrobustnessoftheframe workand eachone ofitscomponents fortheclothing retrieval problem.

Thetop-20retrie valaccurac yisdoubledwhenusing

theproposedD ARNotherthan usingpre-trainedCNN featureonly(0.570 vs.0.268).The proposedmethod isgeneraland couldbeapplied toothercross-domain imageretriev alproblems.

2.RelatedW ork

FashionDatasets.Recently, severaldatasetscontain-

ingawide varietyof clothingimagescaptured fromfashion websiteshav ebeencarefullyannotatedwithattributelabels

45,9,32,18].Thesedatasets areprimarilydesigned for

trainingande valuationof clothingparsingandattributees- timationalgorithms.In contrast,ourdata iscomprisedof a largesetofclothingimage pairsdepictinguser photosand correspondinggarmentsfrom onlineshopping,in addition tofine-grainedattrib utes.Notably, thisreal-worlddatais essentialtobridge thegapbetween thetwo domains.

VisualAnalysisofClothing.Many methodshavebeen

recentlyproposedfor automatedanalysisof clothingim- ages,spanninga widerangeof applicationdomains.In particular,clothingrecognitionhasbeen usedforconte xt- aidedpeopleidentification [

13],fashion stylerecognition

21],occupationrecognition [39],andsocial tribepredic-

tion[

26].Clothingparsing methods,whichproduce se-

manticlabelsfor eachpixel intheinput image,hav ere- ceivedsignificantattentioninthepastfe wyears[

45,9].In

thesurveillance domain,matchingclothingimagesacross camerasisa fundamentaltaskfor thewell-known person re-identificationproblem[

28,37].

Recently,thereisagro winginterestinmethodsforcloth- ingretriev al[

20,33,31,44]andoutfit recommendation

18].Mostof thosemethodsdo notmodelthe discrepancy

betweentheuser photosandonline clothingimages.An ex- ceptionisthe workof Liuetal [

31],whichfollo wsav ery

differentmethodologythanoursbased onpart-basedalign- mentandfeatures derived fromsparsereconstruction, and doesnote xploittherichness ofourdataobtainedbymining imagesfromcustomer reviews. 1063

VisualAttributes.Researchon attribute-basedvi-

sualrepresentationsha verecei vedrenewedattentionby thecomputervision communityinthe pastfew years

27,11,34,43].Attributes areusuallyreferredassemantic

propertiesofobjects orscenesthat aresharedacross cat- egories.Amongotherapplications,attrib uteshav ebeen usedforzero-shotlearning[

27],imageranking andretrieval

38,22,17],fine-grainedcate gorization[3],sceneunder -

standing[

35],andsentence generationfromimages [25].

Relatedtoour applicationdomain,K ovashka etal[

22]
developedasystemcalled"WhittleSearch",whichis ableto answerqueriessuch as"Show meshoeimages likethese, butsportier".Theyused theconceptof relativeattributes proposedbyP arikhandGrauman [

34]forrele vancefeed-

back.Attributes forclothinghavebeen exploredin several recentpapers[

4,5,2].They allowuserstosearchvisual

contentbasedon fine-graineddescriptions,such asa"blue stripedpolo-styleshirt".

Attribute-basedrepresentationshave alsoshown com-

pellingresultsfor matchingimagesof peopleacrossdo- mains[

37,29].Thew orkbyDonahue andGrauman[7]

demonstratesthatricher supervisionconv eyingannotator rationalesbasedon visualattributes, canbeconsidered as aformof privileged information[

42].Alongthis direction,

inourw ork,wesho wthatcross-domainimageretriev alcan benefitfromfeature learningthatsimultaneously optimizes alossfunction thattakes intoaccountvisual similarityand attributeclassification.

DeepLearning .Deepcon volutionalneural networks

haveachieveddramaticaccuracy improvementsinmanyar- easofcomputer vision[

23,14,40].Thew orkofZhang et

al[

46]combinedposelet classifiers[2]withcon volutional

netstoachie vecompelling resultsinhumanattributepre- diction.Sunet al[

40]discov eredthatattributescanbe

implicitlyencodedin high-level featuresofnetw orksfor identitydiscrimination.In ourwork, weinsteade xplicitly useattribute predictionasaregularizerin deepnetworks for cross-domainimageretrie val.

Existingapproachesfor imageretriev albasedon deep

learninghav eoutperformedpreviousmethodsbasedon otherimagerepresentations [

1].Howe ver,theyarenotde-

signedtohandle theproblemof cross-domainimagere- trieval.Severaldomainadaptationmethodsbasedon deep learninghav ebeenrecentlyproposed[

16,6].Relatedto

ourwork, Chenetal[

5]usesa double-pathnetwork with

alignmentcostlayers forattribute prediction.Incontrast, ourwork addressestheproblemofcross-domainretrie val featurelearning,proposing anov elnetwork architecture thatlearnsef fective featuresformeasuringvisualsimilar- ityacrossdomains. Wenote thatotherdomain adaptation methods[

24,15]coulde venbe appliedontopofourlearned

featurestofurther refineretriev alresults.

AttributecategoriesExamples(totalnumber)

ClothesButtonDoubleBreasted,Pullo ver, ...(12)

ClothesCategory T-shirt,Skirt,LeatherCoat... (20)

ClothesColorBlack,White,Red, Blue...(56)

ClothesLengthRegular,Long,Short...(6)

ClothesPattern Pure,Stripe,Lattice, Dot...(27)

ClothesShapeSlim,Straight,Cloak, Loose...(10)

CollarShapeRound,Lapel,V -Neck...(25)

SleeveLengthLong,Three-quarter, Sleeveless...(7)

SleeveShapePuff,Raglan,Petal,Pile... (16)

Table1.Clothingattribute categoriesand examplev alues.The numberinbrack etsisthe totalnumberofvaluesfor eachcategory . Figure2.Some examplesof online-offlineimage pairs,containing imagesofdif ferenthumanpose, illumination,andvaryingback- ground.Particularly ,theofflineimagescontainmanyselfies with highocclusion.

3.DataCollection

Wehavecollected about453,983onlineupper-clothing

imagesinhigh-resolution (about800×500onav erage) fromsev eralonline-shoppingwebsites.Generally,eachim- agecontainsa singlefrontal-view person.Fromthe sur- roundingtext ofimages,semanticattributes( e.g.,cloth- ingcolor, collarshape,sleeveshape, clothingstyle)are keycorrespondstoan attributecate gory(e.g.,color),and thevalueistheattrib utelabel( e.g.,red,black, white,etc.). Then,wemanually prunedthenoisy labels,merged similar labelsbasedon humanperception,and removed thosewith asmallnumber ofsamples.After that,9cate goriesofcloth- ingattributes areextractedandthetotal numberofattrib ute valuesis179.Asan example,there are56v aluesforthe colorattribute. Thespecifiedattrib utecategories andexampleattribute valuesarepresentedinT able

1.Thislar ge-scaledatasetan-

notatedwithfine-grained clothingattributes isusedto learn apowerful semanticrepresentationofclothing,aswe will describeinthe nextsection. Recallthatthe goalofour retrieval problemisto findthe onlineshoppingimages thatcorrespondto agiv enquery photointhe "street"domainuploaded bytheuser .To ana- lyzethediscrepanc ybetweenthe imagesintheshopping scenario(onlineimages) andstreetscenario (offlineim- 1064
0 0.5 1 1.5 2 2.5 3

3.5x 10

base shirtcotton clothescotton coatdenim jacketdown jacketformal skirtfur clotheshoodiesknitwearlace shirtleather clothesprinted skirtshirtshort dresssmall suitsweaterT-shirtvestwind coatwoolen coat

# of Images # of Online Images # of Offline Images Figure3.The distributionof online-offlineimage pairs. ages),wecollect alarge setofof flineimageswith theiron- linecounterparts.The key insighttocollect thisdatasetis thatthereare manycustomer review websiteswhereusers postphotosof theclothingthe yhav epurchased.As the linktothe correspondingclothingimages fromtheshop- pingstoreis available, itispossible tocollectalargesetof online-offlineimagepairs. Weinitiallycrawled381,975 online-offlineimage pairs ofdifferent categoriesfromthecustomerre viewpages. Then,aftera datacurationprocess, wheresev eralannota- torshelpedremo vingunsuitableimages, thedatawasre- ducedto91,390 imagepairs.F oreachof thesepairs,fine- grainedclothingattrib utesweree xtractedfromtheonline imagedescriptions.Some examplesof croppedonline- offlineimagepairsarepresented inFigure

2.Ascan be

seen,eachpair ofimagesdepict thesameclothing, butin differentscenarios,exhibitingv ariationsinpose, lighting, andbackgroundclutter .Thedistrib utionofthecollected online-offlineimagesisillustratedin Figure

3.Generally,

thenumberof imagesofdif ferentcategories inbothsce- nariosarealmost inthesame orderofmagnitude, whichis helpfulfortraining theretriev almodel.

Insummary, ourdatasetissuitabletothe clothingre-

trievalproblemforseveralreasons. First,thelar geamount ofimagesenables effectiv etrainingof retrievalmodels,es- peciallydeepneural networkmodels. Second,theinforma- tionaboutfine-grained clothingattributes allowslearning ofsemanticrepresentations ofclothing.Last butnot least, theonline-offline imagespairsbridgethegapbetween the shoppingscenarioand thestreetscenario, providingrich in- formationforreal-w orldapplications.

4.Technical Approach

Theuniquedataset introducedinthe previoussection

servesasthefuelto powerup ourattribute-dri venfeature learningapproachfor cross-domainretriev al.Next wede- scribethemain componentsofour proposedapproach,and howtheyareassembled tocreateareal-worldcross-domain clothingretriev alsystem.4.1.DualAttrib ute-awareRanking Network Inthissection, theDualAttrib ute-aware RankingNet- work(DARN)isintroduced forretrievalfeaturelearning.

Comparedtoe xistingdeepfeatures [

19,8],DARN simulta-

neouslyintegrates semanticattributeswithvisualsimilarity constraintsintothe featurelearningstage, whileatthe same timemodelingthe discrepancybetween domains.

NetworkStructure.Thestructure ofDARN isillus-

tratedinFigure

4.Tw osub-networkswithsimilarNetwork-

in-Network(NIN)models[

30]areconstructed asitsfoun-

dation.Duringtraining, theimagesfrom theonlineshop- pingdomainare fedintoone sub-network,and theimages fromthestreet domainarefed intotheother .Eachsub- networkaimstorepresentthe domain-specificinformation andgeneratehigh level comparablefeaturesas output.The NINmodelin eachsub-network consistsoffi vestack ed convolutionallayersfollowedbyMLPConv layersasde- finedin[

30],andtw ofullyconnected layers(FC1,FC2).

Toincreasetherepresentationcapability oftheintermedi- atelayer, thefourthlayer,namedCon v4,isfollo wedbytw o

MLPConvlayers.

Ontopof eachsub-network, weaddtree-structured

fully-connectedlayersto encodeinformationabout seman- ticattributes. Giventhesemanticfeatures learnedbythe twosub-networks,wefurther imposeatriplet-basedrank- inglossfunction, whichseparatesthe dissimilarimages withafix edmargin undertheframeworkof learningto rank.Thedetails ofsemanticinformation embeddingand therankingloss areintroducedne xt.

SemanticInformation Embedding.Inthe clothingdo-

main,attributes oftenrefertothespecificdescription ofcer- tainparts( e.g.,collarshape, sleeve length)orclothing (e.g., clothescolor, clothesstyle).Complementarytothevisual appearance,thisinformation canbeused toforma powerful semanticrepresentationfor theclothingretrie valproblem. structurelayersto comprehensively capturetheinformation ofattributes andtheirfullrelations.

Specifically,wetransmittheFC2 responseofeach sub-

fully-connectednetwork tomodeleachattributeseparately . Inthistree-structured network,the visualfeaturesfrom the low-levellayersaresharedamongattributes;whilethe se- manticfeaturesfrom thehigh-lev ellayersare learnedsep- arately.Theneuronnumberin theoutput-layerof each branchequalsto thenumberof correspondingattribute val- ues(seeT able

1).Sinceeach attributehas asinglev alue,

thecross-entropy lossisusedineachbranch. Notethatthe valuesofsomeattributes maybemissing forsomeclothing images.Inthis case,thegradients fromthecorresponding branchesaresimply settozero. 1065
3 227
227

Conv1:

7×7×3×96,

S=255 96

Conv2:

5×5×96×256,

S=214 256

Conv3:

3×3×256×512,

S=114

51214384

Conv4:

3×3×512×1024,

S=19696

2562565125121024 5123845125125127

Conv5:

3×3×384×512,

S=2

40964096

3×3 max

pooling3×3 max pooling3×3 max pooling

2×2 max

pooling

5×5 max

pooling 3 227
227

Conv1:

7×7×3×96,

S=255 96

Conv2:

5×5×96×256,

S=214 256

Conv3:

3×3×256×512,

S=114

51214384

Conv4:

3×3×512×1024,

S=19696

2562565125121024 5123845125125127

Conv5:

3×3×384×512,

S=2

40964096

quotesdbs_dbs9.pdfusesText_15

[PDF] clothing search site

[PDF] clothing search terms

[PDF] clothing search uk

[PDF] clothing with r

[PDF] cloud compiler c++

[PDF] cloud compiler ide

[PDF] cloud compiler ni

[PDF] cloud compiler project

[PDF] cloud security (cisco)

[PDF] cloverleaf project list

[PDF] club world cup result

[PDF] club world cup trophy ceremony

[PDF] cluster analysis book pdf

[PDF] cluster analysis everitt pdf

[PDF] cluster analysis example business

[PDF] Cross-Domain Image Retrieval With a Dual Attribute-Aware Ranking

Foraddressingtheproblemof cross-domainretriev al,

Thetwo sub-networksaredesignedtobe drivenbyse-

Basedonthe learnedsemanticfeatures fromeach

Generally,theretrieval featuresfromD ARNhavesev-

19,8].(1)By usingthedual-structure network,our

Insummary, themaincontributionsofourpaper are:

1.We collectauniquedatasetcomposedof cross-

2.We proposetheDualAttribute-Aw areRankingNet-

3.We conductextensivee xperimentsproving theeffec-

Thetop-20retrie valaccurac yisdoubledwhenusing

2.RelatedW ork

FashionDatasets.Recently, severaldatasetscontain-

45,9,32,18].Thesedatasets areprimarilydesigned for

VisualAnalysisofClothing.Many methodshavebeen

13],fashion stylerecognition

21],occupationrecognition [39],andsocial tribepredic-

26].Clothingparsing methods,whichproduce se-

45,9].In

28,37].

20,33,31,44]andoutfit recommendation

18].Mostof thosemethodsdo notmodelthe discrepancy

31],whichfollo wsav ery

VisualAttributes.Researchon attribute-basedvi-

27,11,34,43].Attributes areusuallyreferredassemantic

27],imageranking andretrieval

38,22,17],fine-grainedcate gorization[3],sceneunder -

35],andsentence generationfromimages [25].

Relatedtoour applicationdomain,K ovashka etal[

34]forrele vancefeed-

4,5,2].They allowuserstosearchvisual

Attribute-basedrepresentationshave alsoshown com-

37,29].Thew orkbyDonahue andGrauman[7]

42].Alongthis direction,

DeepLearning .Deepcon volutionalneural networks

23,14,40].Thew orkofZhang et

46]combinedposelet classifiers[2]withcon volutional

40]discov eredthatattributescanbe

Existingapproachesfor imageretriev albasedon deep

1].Howe ver,theyarenotde-

16,6].Relatedto

5]usesa double-pathnetwork with

24,15]coulde venbe appliedontopofourlearned

AttributecategoriesExamples(totalnumber)

ClothesButtonDoubleBreasted,Pullo ver, ...(12)

ClothesCategory T-shirt,Skirt,LeatherCoat... (20)

ClothesColorBlack,White,Red, Blue...(56)

ClothesLengthRegular,Long,Short...(6)

ClothesPattern Pure,Stripe,Lattice, Dot...(27)

ClothesShapeSlim,Straight,Cloak, Loose...(10)

CollarShapeRound,Lapel,V -Neck...(25)

SleeveLengthLong,Three-quarter, Sleeveless...(7)

SleeveShapePuff,Raglan,Petal,Pile... (16)

3.DataCollection

Wehavecollected about453,983onlineupper-clothing

1.Thislar ge-scaledatasetan-

3.5x 10

2.Ascan be

3.Generally,

Insummary, ourdatasetissuitabletothe clothingre-

4.Technical Approach

Theuniquedataset introducedinthe previoussection

Comparedtoe xistingdeepfeatures [

19,8],DARN simulta-

NetworkStructure.Thestructure ofDARN isillus-

4.Tw osub-networkswithsimilarNetwork-

30]areconstructed asitsfoun-

30],andtw ofullyconnected layers(FC1,FC2).

MLPConvlayers.

Ontopof eachsub-network, weaddtree-structured

SemanticInformation Embedding.Inthe clothingdo-

Specifically,wetransmittheFC2 responseofeach sub-

1).Sinceeach attributehas asinglev alue,

Conv1:

7×7×3×96,

Conv2:

5×5×96×256,

Conv3:

3×3×256×512,

51214384