Invariant Information Clustering for Unsupervised Image PDF

Invariant Information Clustering for. Unsupervised Image Classification and Segmentation. Xu Ji. University of Oxford xuji@robots.ox.ac.uk.

Invariant Information Clustering for Unsupervised Image

Invariant Information Clustering for. Unsupervised Image Classification and Segmentation: Supplementary Material. Xu Ji. University of Oxford.

Invariant Information Clustering for Unsupervised Image

22 août 2019 Invariant Information Clustering for. Unsupervised Image Classification and Segmentation. Xu Ji. University of Oxford xuji@robots.ox.ac.uk.

Invariant Information Clustering for Unsupervised Image

Invariant Information Clustering for. Unsupervised Image Classification and Segmentation. Xu Ji. University of Oxford xuji@robots.ox.ac.uk.

Invariant Information Clustering for Unsupervised Image

Invariant Information Clustering for. Unsupervised Image Classification and Segmentation. Xu Ji. University of Oxford xuji@robots.ox.ac.uk.

arXiv:1807.06653v2 [cs.CV] 21 Jul 2018

21 juil. 2018 Invariant Information Distillation for. Unsupervised Image Segmentation and Clustering. Xu Ji. University of Oxford xuji@robots.ox.ac.uk.

Unsupervised Semantic Segmentation by Contrasting Object Mask

[1] Ji et al. Invariant information clustering for unsupervised image classification and segmentation. ICCV

Deep Transformation-Invariant Clustering

Goal ? efficiently cluster images even in the wild [5] Invariant Information Clustering for Unsupervised Image Classification and Segmentation

Deep Transformation-Invariant Clustering

Goal ? efficiently cluster images even in the wild [5] Invariant Information Clustering for Unsupervised Image Classification and Segmentation

Deep Transformation-Invariant Clustering

[24] X. Ji A. Vedaldi

InvariantInformationClusteringfor

XuJi

UniversityofOxford

xuji@robots.ox.ac.ukJoãoF. Henriques

UniversityofOxford

joao@robots.ox.ac.ukAndreaVedaldi

UniversityofOxford

vedaldi@robots.ox.ac.uk

Abstract

Wepresentano velclusteringobjectivethatlearnsa neu- ralnetworkclassifierfrom scratch, givenonlyunlabelled datasamples.The modeldiscover sclusters thataccurately matchsemanticclasses,achie vingstate-of-the-artr esults ineightunsupervised clusteringbenchmarks spanningim- ageclassificationandsegmentation.Theseinclude STL10, anunsupervisedvariant ofImag eNet,andCIF AR10,where wesignificantlybeat theaccuracy ofourclosest competi- torsby6.6and9.5 absolutepercenta gepoints respectively. Themethodis notspecialisedto computervisionand op- eratesonanypaired datasetsamples;in ourexperiments weuser andomtransforms toobtainapairfrom eachim- age.Thetrainednetworkdirectlyoutputssemantic labels, ratherthanhighdimensionalr epresentationsthat needex- ternalprocessing tobeusableforsemanticclustering .The objectiveissimply tomaximisemutual informationbetween theclassassignments ofeach pair.It iseasyto implement andrigorously groundedininformationtheory, meaningwe effortlesslyavoiddegener atesolutionsthat otherclustering methodsare susceptibleto.Inadditiontothe fullyunsu- pervisedmode, wealsotesttwosemi-supervisedsettings. Thefirst achieves88.8%accuracy onSTL10classification, settingane wglobalstate-of-the-art overallexistingmeth- ods(whether supervised,semi-supervisedorunsupervised).

Thesecondshows robustness to90%r eductionsinlabel

ofsmallamounts oflabels.github.com/xu-ji/IIC

1.Introduction

Mostsuperviseddeep learningmethodsrequire large

quantitiesofmanual lylabelleddata, limitingtheirapplica- bilityinman yscenarios.This istrueforlarge-scaleim- ageclassification andevenmorefor segmentation(pix el- wiseclassification)where theannotationcost perimage isvery high[

38,21].Unsupervisedclustering, onthe

otherhand,aims togroupdata pointsintoclasses entirely Figure1:Models trainedwithIIC onentirelyunlabelled datalearnto clusterimages(top, STL10)andpatches (bottom,Potsdam-3).The raw clustersfounddirectly correspondtosemantic classes(dogs,cats, trucks, roads,ve getationetc.)withstate-of-the-artaccuracy.T rainingisend-to- endandrandomly initialised,withno heuristicsusedat anystage. withoutlabels[

25].Many authorshavesoughtto com-

binematureclustering algorithmswithdeep learning,for examplebybootstrappingnetwork trainingwithk-means styleobjectiv es[

51,24,7].Howe ver,triviallycombin-

ingclusteringand representationlearningmethods often leadstode generatesolutions[

7,51].Itis preciselyto

preventsuchdegeneracythat cumbersomepipelines in- volvingpre-training,featurepost-processing(whitening or PCA),clustering mechanismsexternaltothenetw ork haveevolved[

7,17,18,51].

Inthispaper ,weintroduce InvariantInformationClus- tering(IIC),a methodthataddresses thisissuein amore principledmanner. IICisagenericclusteringalgorithm that 1 9865

directlytrainsa randomlyinitialisedneural networkinto aclassificationfunction,end-to-end andwithoutan ylabels.Itinv olvesasimpleobjectivefunction,whichisthemutual informationbetweenthefunction"sclassificationsforpaireddatasamples.The inputdatacan beofan ymodalityand, sincetheclustering spaceisdiscrete, mutualinformationcanbecomputed exactly.

Despiteitssimplicity ,IICis intrinsicallyrobusttotwo issuesthataf fectotherm ethods.Thefirstisclusteringde- generacy,whichisthetendencyfor asinglecluster todom- inatethepredictions orforclusters todisappear(which can beobserved withk-means,especiallywhencombinedwith representationlearning[

7]).Dueto theentropy maximisa-

tioncomponentwithin mutualinformation,the lossisnot minimisedifall imagesareassigned tothesame class.At thesametime, itisoptimal forthemodel topredictfor each imageasingle classwithcertainty (i.e.one-ho t)due tothe conditionalentropy minimisation(fig.

3).Thesecond issue

isnoisydata withunknown ordistractor classes(presentin

STL10[

10]fore xample).IIC addressesthisissuebyem-

ployinganauxiliaryoutputlayer thatisparallel tothemain outputlayer, trainedtoproduceanov erclustering(i.e.same lossfunctionb utgreaternumber ofclustersthantheground truth)thatis ignoredattest time.Auxiliaryo verclustering isageneral techniquethatcould beusefulfor otheralgo- rithms.Thesetw ofeaturesof IICcontributetomakingit theonlymethod amongstourunsupervised baselinesthatis robustenoughtomake useofthe noisyunlabelledsubset of

STL10,av ersionofImageNet [

14]specificallydesigned as

abenchmarkfor unsupervisedclustering. Intherest ofthepaper,webe ginbye xplainingthedif fer- encebetweensemantic clusteringandintermediate repre- sentationlearning(section

2),which separatesourmethod

fromthemajority ofwork inunsuperviseddeep learning. Wethendescribethetheoretical foundationsofIIC insta- tisticallearning(section

3),demo nstratingthatmaximising

neckisa principledclusteringobjecti vewhich isequiv alent proposethatfor staticimages,an easyway togeneratepairs withsharedabstract contentfromunlabelled dataisto take eachimageand itsrandomtransformation, oreachpatch andaneighbour .We showthatmaximisingMI automat- icallyav oidsdegeneratesolutionsandcanbewrittenas a convolutioninthecaseofsegmentation, allowingfor effi- cientimplementationwith anydeep learninglibrary.

Weperformexperimentson alarge numberof

datasets(section

4)includingSTL, CIFAR,MNIST ,

COCO-StuffandPotsdam,settinga newstate-of-the-art on unsupervisedclusteringand segmentationin allcases,with resultsof59.6%, 61.7%and72.3% onSTL10,CIF AR10 andCOCO-Stuff-3 beatingtheclosestcompetitors(53.0%,

52.2%,54.0%)with significantmargins. Notethattrain-

CNN CNN

ObjectiveCluster

probabilities FC FC FC FC

Optional overclustering

Figure2:IIC forimageclustering. Dashedlinedenotes sharedparameters, gisarandom transformation,andIdenotesmutualinformation (eq.( 3)). ingdeepneural networksto performlarge scale,real-world highlychallengingtask withnegligible precedent.We also performanablation studyandadditionally testtwo semi- supervisedmodes,setting anew globalstate-of-the-artof

88.8%onSTL10 over allsupervised,semi- supervisedand

unsupervisedmethods,and demonstratingtherob ustnessin semi-supervisedaccuracy when90%oflabelsareremo ved.

2.Relatedw ork

Co-clusteringandmutual information.Theuseof in-

formationasa criteriontolearn representationsis notnew .

Oneofthe earliestworks todoso isbyBeck erandHin-

ton[

3].Moregenerally ,learningfrom paireddatahasbeen

exploredinco-clustering[

25,16]andin otherworks [50]

thatbuild ontheinformationbottleneckprinciple[ 20].

Severalrecentpapershaveused informationasa tool

totraindeep networksin particular.IMSA T[

28]max-

imisesmutualinformation betweendataand itsrepresenta- tionandDeepINFOMAX [

27]maximizesinformation be-

tweenspatially-preserved featuresandcompactfeatures. tionwithother criteria,whereasin ourmethod information istheonly criterionused.Further more,bothIMSA Tand

DeepINFOMAXcomputemutual informationov ercontin-

uousrandomv ariables,whichrequires complexestima- tors[

4],whereasIIC doessofor discretevariables with

simpleande xactcomputations.Finally ,DeepINFOMAX considerstheinformation I(x,f(x))betweenthefeatures xandadeterministic functionf(x)ofit,which isinprin- ciplethesame astheentrop yH(x);incontrast, inIICin- formationdoesnot triviallyreduce toentropy . Semanticclusteringv ersusintermediater epresentation learning.Insemanticclustering, thelearned function directlyoutputsdiscrete assignmentsforhigh level (i.e. 9866

Figure3:T rainingwithIIC onunlabelledMNISTinsuccessiv eepochsfrom randominitialisation(left). Thenetwork directlyoutputscluster assignmentprobabilitiesforinput images,andeach isrenderedas acoordinateby conve xcombinationof 10clusterv ertices.Thereis nocherry-pickingas theentiredatasetissho wnine verysnapshot.Groundtruthlabelling (unseenbymodel) isgivenbycolour.Ateach clusterthea verageimage ofitsassignees isshown. Withneitherlabelsnorheuristics, theclustersdisco veredby IICcorrespondperfectly touniquedigits, withone-hotcertain prediction(right).semantic)clusters.Interm ediaterepresentationlearners, ontheother hand,producecontinuous, distributed,high- dimensionalrepresentationsthat mustbepost-processed, forexample byk-means,toobtainthediscrete low-cardinalityassign mentsrequiredforunsupervisedsemanticclustering.Thelatter includesobjectiv essuchas genera-tiveautoencoderimagereconstruction[

48],triplets[ 46]and

spatial-temporalorderor contextprediction [

37,12,17],

forexample predictingpatchproximity[

30],solvingjig-

sawpuzzles[

41]andinpainting [43].Noteit alsoin-

cludesanumber ofclusteringmethods (DeepCluster[ 7], exemplars[

18])wherethe clusteringisonly auxiliary;

aclustering-styleobjecti veis usedbutdoesnotproduce groupswithsemantic correspondence.For example,Deep-

Cluster[

7]isa state-of-the-artmethodfor learninghighly-

transferableintermediatefeatures usingov erclusteringas aproxytask, butdoes notautomaticallyfind semantically meaningfulclusters.As thesemethods useauxiliaryobjec- tivesdivorcedfromthesemantic clusteringobjective,itis unsurprisingthatthe yperformw orsethanIIC(section 4), whichdirectlyoptimises forit,training thenetwork end-to- endwiththe finalclustererimplicitly wrappedinside.

Optimisingimage-to-imagedistance. Manyapproaches

todeepclustering, whethersemanticor auxiliary,utilise a distancefunction betweeninputimagesthatapproximatesa givengroupingcriterion.Agglomerativeclustering [ 2]and partiallyorderedsets [

1]ofHOG features[13]hav ebeen

usedtogroup images,ande xemplars[

18]definea group

asaset ofrandomtransformations appliedtoa singleim- age.Notethe latterdoesnot scaleeasily, inparticularto imagesegmentation whereasingle200×200imagewould callfor40k classes.DA C[

8],JULE[ 52],DeepCluster[ 7],

ADC[

24]andDEC [51]relyon theinherentvisual consis-

tencyanddisentanglingproperties[

23]ofCNNs toproduce

clusterassignments,which areprocessedand reinforcedin eachiteration.The latterthreeare basedonk-means style mechanismstorefine featurecentroids, whichisprone toquotesdbs_dbs17.pdfusesText_23

[PDF] Invariant Information Clustering for Unsupervised Image

InvariantInformationClusteringfor

UniversityofOxford

UniversityofOxford

UniversityofOxford

Abstract

Thesecondshows robustness to90%r eductionsinlabel

1.Introduction

Mostsuperviseddeep learningmethodsrequire large

38,21].Unsupervisedclustering, onthe

25].Many authorshavesoughtto com-

51,24,7].Howe ver,triviallycombin-

7,51].Itis preciselyto

7,17,18,51].

7]).Dueto theentropy maximisa-

3).Thesecond issue

STL10[

10]fore xample).IIC addressesthisissuebyem-

STL10,av ersionofImageNet [

14]specificallydesigned as

2),which separatesourmethod

3),demo nstratingthatmaximising

Weperformexperimentson alarge numberof

4)includingSTL, CIFAR,MNIST ,

52.2%,54.0%)with significantmargins. Notethattrain-

ObjectiveCluster

Optional overclustering

88.8%onSTL10 over allsupervised,semi- supervisedand

2.Relatedw ork

Co-clusteringandmutual information.Theuseof in-

Oneofthe earliestworks todoso isbyBeck erandHin-

3].Moregenerally ,learningfrom paireddatahasbeen

25,16]andin otherworks [50]

Severalrecentpapershaveused informationasa tool

28]max-

27]maximizesinformation be-

DeepINFOMAXcomputemutual informationov ercontin-

4],whereasIIC doessofor discretevariables with

48],triplets[ 46]and

37,12,17],

30],solvingjig-

41]andinpainting [43].Noteit alsoin-

18])wherethe clusteringisonly auxiliary;

Cluster[

7]isa state-of-the-artmethodfor learninghighly-

Optimisingimage-to-imagedistance. Manyapproaches

1]ofHOG features[13]hav ebeen

18]definea group

8],JULE[ 52],DeepCluster[ 7],

24]andDEC [51]relyon theinherentvisual consis-

23]ofCNNs toproduce