Domain-Adaptive Single-View 3D Reconstruction PDF

Keywords: 3D reconstruction · adversarial loss · geometric consistency. · point cloud · 3D neural network. 1 Introduction. Single-view 3D object

Few-Shot Single-View 3D Object Reconstruction with Compositional

Mateusz Michalkiewicz Sarah Parisot

Learning Category-Specific Deformable 3D Models for Object

Abstract—We address the problem of fully automatic object localization and reconstruction from a single image. This is both a very.

Hierarchical Surface Prediction

chical surface prediction (HSP) for high resolution 3D object reconstruction which is organized around the observation that only a few of the voxels are in

Online Global Non-rigid Registration for 3D Object Reconstruction

Learning 3D object models from 2D images

Learning 3D object models from 2D images HoloPose: Holistic 3D Human Reconstruction In-the-Wild A. Guler and I. Kokkinos

Domain-Adaptive Single-View 3D Reconstruction

On the other hand we impose the reconstruction to be 'realistic' by forcing it to lie on a. (learned) manifold of realistic object shapes. Our experi- ments

Amodal 3D Reconstruction for Robotic Manipulation via Stability and

Code is available at github.com/wagnew3/ARM. Keywords: 3D Reconstruction 3D Vision

3D Shape Reconstruction From 2D Images With Disentangled

Code is available at https: //github.com/junshengzhou/3DAttriFlow. *Equal contribution. †The corresponding author is Yu-Shen Liu. This work was sup- ported

SDF-2-SDF: Highly Accurate 3D Object Reconstruction

We make our data publicly available creating the first object reconstruction dataset to include ground-truth CAD models and RGB-D sequences from sensors of

PedroO.Pinheiro

ElementAINegarRostamzadeh

ElementAISungjinAhn

RutgersUniv ersity

Abstract

Single-view3Dshapereconstruction isanimportant but challengingproblem,mainlyfor tworeasons.First, as odsrely onsyntheticdata,inwhich ground-truth3D anno- tationiseasy toobtain.Howe ver,this resultsin domain adaptationproblem whenappliedtonaturalima ges.The secondchalleng eisthattherearemultipleshapes thatcan explainagiven2Dima ge. Inthispaper ,wepr oposea frameworktoimproveover thesechalleng esusingadver- sarialtraining .Ononehand,weimposedomainconfusion betweennatural andsyntheticimager epresentationsto re- ducethedistrib utiongap.On theotherhand,weimpose thereconstruction tobe'realistic"byfor cingit tolieon a (learned)manifoldof realisticobject shapes.Oure xperi- mentsshowthat theseconstraints improve performanceby alarg emarginoverbaselinereconstructionmodels. We achieveresultscompetitivewiththestate oftheartwitha muchsimplerarchitectur e.

1.Introduction

Humanscaneasi lyunderstandthe underlying3Dstruc-

tureofscenes andobjectsfrom singleimages.This isa hallmarkofa humanvisualsystem anditis anessential steptow ardshigherlevelvisualunderstanding.Thisis an extremelyill-posedproblembecause asingleimagedoes notcontainenough informationtoallo w3Dreconstruction. Therefore,amachine visionsystemneeds torely onpriors overtheshapetoinfer3Dstructure.

Efficientandeffectiv e3Dprot otypingplaysanimpor-

tantrolein manydif ferentfields,such asvirtual/augmented reality,architecture,roboticsand 3Dprintingtonamea few.Perhapsmoreimportantly,studying3D objectrep- resentationscouldbring insightsonho wthisinformation isencodedin intermediateandhigher -level visualcor- tices[

53,26].

Traditionalreconstructionmethodsrelyon multipleim- agesofsame objectinstance[

28,4,6,39,14].Thesemeth-

odspossesstw ostronglimitations duetosomekey assump- tions[

8]:(i)it requiresalar genumberof viewsto achieve

Model

Domain Confusion

Shape prior

Figure1:W eproposea frameworkfor(natural)single-

view3Dreconstructionexploiting adversarialtraining in twoways.Theseconstraints areachievedwithadditional lossterms. Weimposedomainconfusionbetween natural andrenderedimages (top)ande xploitshapepriors toforce reconstructionstolook realistic(bottom). reconstruction,(ii)the objects"appearanceare expectedto beLambertian( i.e.,non-reflectiv e)andtheiralbedosare supposedtobe non-uniform(i.e.,richof non-homogeneous textures). Anotherway toachieve3Dreconstruction istole verage knowledgefromobject"sappearance andshape.The main advantagesofrelyingonshapepriors isthatwe donotneed torelyon accuratefeaturecorrespondences acrossdifferent views.Inthiscase3D reconstructioncan,in principle,be donefroma single-view2D image(assumingthe priorsare richenough). Recently,therehasbeena growinginterest inlearning- basedapproachesto tackletheproblem ofpredictingthe canonicalshapeof anobjectfrom asingleimage [ 24,8,

16,41,54,22,48,33,44,47,49,55].Tw otechnicalad-

vanceswereresponsibleforthis surge:(i) theeasyaccess tolarge-s cale3DComputer-AidedDesign(CAD)repos- itories,suchas ShapeNet[

7],Pascal3D+ [52],Object-

Net3D[

51],Pix3D[ 40]and(ii) advances indeeplearning

techniques[ 17]. Mostofthese methodscontaina similarhigh-lev elarchi- 7638

tecturethatre gressesa3D shapefrom(rendered)images:anencodertrans formsa2D imageintoalatentrepresenta-tionanda decoderreconstructsthe 3Drepresentation.The ydifferentiateinhowconstra intsfrom3D worldareimposed,e.g.,[

8,54,44]forcemulti-vie wconsistency tolearnthe3D

representation,while[

47,49]make useof2.5Dsketches.

Theseapproachesuse alarge numberofCAD mode lsto

leverageshapepriors(eithermakingexplicit useof3D rep- resentationornot). Single-view3Dreconstructionisa veryill-posed prob- lem.Inorder tolearnstrong shapepriorsto infer3Dstruc- ture,deeplearning methodsrequire alarge amountof3D objectannotations.Ho wever ,acquiringgood3Dobjectan- notationfromnatural imagesisan extremelychallenging useofsynthetic images(whichcan berenderedeasily ifa proper3Drepresentation isgiv en).

Convolutionalneuralnetworks(CNNs)[

29]arekno wn

toperformsub-optimally whenthedata distribu tionof in- putschanges,a problemknown inthecomputer visionliter- atureasdomainshift[

43].For thisreason,CNN-based3D

reconstruction,trainedon syntheticimages,performs worse whenappliedto naturalimages.

Inthispaper ,weintroduce amethodtoimprove theper-

formanceofreconstruction modelsinnatural images,where proper3Dlabe lsarev erydifficulttoacquire.T oachiev e thisgoal,we imposetwo constraintsonthe network"s re- constructionloss(e xpressedasadditional lossterms)based onshapeprior learnedfromlar ge3DCAD repository(see

Figure

1).

First,inspiredby thedomainadaptationliterature[

9,15],

weforcethe encoded2Dfeatures tobein variantwith re- specttothe domainthey comefrom(rendered ornatural). Thisway ,adecodertrainedonsyntheticimageswillnatu- rallyperformbetter onrealimages. Second,weconstraint theencoded2D featurestolie inthemanifold ofrealistic objectsshapes.This constraintforcesthe decoded3Dre- constructiontolook morerealistic.These twoloss terms arecharacterizedthrough adversarialtraining [

18,15],an

activeresearchtopic.

Ourmaincontrib utionscanbe summarizedasfollows:

(i)wepropose amodeland alossfunction thatexploit learnedshapepriors toimprov eperformanceof naturalim- age3Dreconstructions (usingadversarial trainingintw o differentways),(ii)we showthatthismethodboost perfor- manceinboth voxel andpointcloud representations,and (iii)theproposed methodachiev esresultscompetiti vewith stateofthe artondif ferentdatasets,with amuchsimpler architecture.Moreov er,theproposedapproachisindepen- dentofthe encoder-decoderarchitecture andcanbe applied todifferent single-view3Dreconstructionmodels.

Therestof thepaperis organized asfollows: Section

presentsrelatedw ork,Section3describeshow welearntheshapepriorandleverageitintwodifferentwaysforlearningreconstruction,andSection

4describesoure xperimentsin

differentdatasets.Weconclude inSection 5.

2.RelatedW ork

Single-view3Dr econstruction.Traditionalreconstruc- tionmethodsrely onmultipleimages ofsameobject in- stancetoachie vereconstruction [

28,4,6,39,14].Re-

cently,data-drivenapproaches to3Dreconstructionfrom singleimageha veappeared. Thesemethodscanroughly bedivided intotwotypes:(i) thosethatexplicitlyuse3D structures[

16,8,48,13,19,47,50]and(ii) thosethat

useothersources ofinformationto inferthe 3Dstruc- ture[

46,24,54,22,20,6,44,55].

Theseapproaches,based ondeeplearning techniques,

usuallysharea similar(high-le vel)architecture: anen- coderthatmaps 2D(rendered)images intoalatent repre- sentationanda decoderthatmaps thisrepresentationinto a3Dobject. Theytend todiffer intheway3Dw orldcon- straintsareimposed. Forinstance, [

8,54,54,44,20,22,27]

forcemultivie wconsistencytolearnthe3Drepresentation, while[

46,24,23]lev eragekeypointsandsilhouetteanno-

tations.Otherapproaches [

47,49]lev erage2.5Dsketches

(surfacenormals,depthandsilhouette) informationtoim- proveprediction.

Morerecently, Zhang,Zhanget.al.[

56]considerspher -

icalmaps(in additionalto2.5D sketches)to learn3Drep- resentations.Contraryto mostwork onsingle-view 3Dre- construction,theproposed methoddoes notusecanonical shape:ev eryground-truth3Drepresentationisonthesame lookatreconstructing shapesforunseen classes,howe ver, itdoesnot dealwithdomain-adaptation issues. Contrarytoall thesemethods,our approachdoesnot use anyadditionalinformationbesidesRGB images.Ho wever , inadditionto renderedimages,we alsouseunlabeled nat- uralimages(which areeasyto acquire).We notethatour contributionsareindependentofthe encoderanddecoder architecture(aslong asthey aredifferentiable), andcould beappliedin manyof thesemorepo werfulencoder-decoder architectures.Ine xperiments,wesho wthatourapproach improvesperformanceovertwo baselines:asimple voxel encoder-decoderarchitectureandAtlasNet[

19],astate-of-

the-artencoder-decoder architecturebasedonpointclouds representation.

Domainadaptation.Thedifficulty toacquire3Danno-

tationsfornatural imagesforcesreconstruction modelsto learnfromrendered images.Itis wellknown intheliter - ature[

43,9]thatthe performanceofa modeldropsif ap-

pliedindata comingfroma distributiondif ferentfromthe oneusedduring training.Ganinetal.[

15]dealwith this

quotesdbs_dbs4.pdfusesText_8

[PDF] Domain-Adaptive Single-View 3D Reconstruction