Packing and Padding: Coupled Multi-index for Accurate Image PDF

Indexation dimages

Antoine MANZANERA – Cours Indexation / DEA IARFA page 2. Le sujet de ce cours est la recherche automatique de documents visuels (images.

Effective Image Retrieval via Multilinear Multi-index Fusion

27 Sept 2017 Index Terms—Image retrieval Multi-index fusion

Recent Advance in Content-based Image Retrieval: A Literature

02 Sept 2017 inverted file indexing structure for scalable image retrieval. ... [24] S. Zhang M. Yang

Indexation symbolique dimages: une approche basée sur l

06 Jan 2006 1.2 Les difficultés de l'indexation d'images . ... suivent à peu près les variations de couleurs au cours des saisons.

Advanced Image Processing for Astronomical Images

Preliminary analysis in astronomical image processing includes understanding the dimensional properties or the shape index profile of the celestial object in

Indexation par le contenu de documents Audio-Vidéo Média Image

Indexation d'images par la texture. ? Indexation d'images par la forme Parole : voir l'autre partie du cours ! ? Musique. ? Bruit. Indexation par le ...

Indexation - Extraction et Recherche dInformations dans les

Le sujet de ce cours est la recherche automatique de documents visuels. (images séquences video

ROB317 – Cours n°2 - Filtrage et Amélioration

Antoine MANZANERA - Cours ROB317 « Analyse et Indexation d'Images » - ENSTA Paris. Filtrage vs Restauration bruit additif bruit multiplicatif.

Packing and Padding: Coupled Multi-index for Accurate Image

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [22] X. Wang M. Yang

Image mining:

The Image Mining course deals with the problem of increasing the Video structuring and indexing. - Application / Case study: Satellite image mining ...

(PDF) Indexation dimages Cours Master IA&D - Academiaedu

Le sujet de ce cours est la recherche automatique de documents visuels (images séquences video) dans des bases de données de grande taille

[PDF] Indexation dune base de données images - HAL Thèses

30 avr 2011 · Ce mémoire concerne les techniques d'indexation dans des bases d'image ainsi que les méth- odes de localisation en robotique mobile

[PDF] Indexation de limage fixe/note de synthèse (L) - Enssib

Image et intelligence artificielle dans 1 *information scienti- fique et technique : cours INRIA 6-10 juin 1988 Benodet; dir par Christian Bordes

[PDF] Analyse & Indexation dImages - ENSTA Paris

d'Images Antoine Manzanera – ENSTA Paris / U2IS Cours ENSTA 3e année ROB317 “Analyse et Indexation d'Images” CARACTERISTIQUES MULTI-ECHELLES

[PDF] Indexation et recherche dimages par le contenu

d'indexation et de recherche d'images par le contenu à partir de ces connaissances cours dans le domaine de l'interaction homme-machine

[PDF] Représentation de linformation : Indexation - IRIT

Image • Couleurs formes Fournit une terminologie standard pour indexer et rechercher les documents Cours RI M Boughanem

[PDF] Indexation par le contenu de documents Audio-Vidéo Média Image

Indexation d'images par la texture ? Indexation d'images par la forme Parole : voir l'autre partie du cours ! ? Musique ? Bruit Indexation par le

[PDF] Lindexation multimédia - EBSI - Cours et horaires

En particulier quelles sont les difficultés posées par l'indexation d'un objet temporel d'une image d'un flux audiovisuel ? Quels sont les différents niveaux

[PDF] Indexation et recherche par le contenu visuel dans les documents

Indexation multimédia (image + texte) Au cours des itérations les classes vont entrer en compétition pour attirer les données

[PDF] application à la technique dindexation et recherche dimage couleur

une image au sens de la couleur dans une base de signatures ou index d'une image; en principe de d'histogrammes effectués au cours des

P acking and Padding: Coupled Multi-index for Accurate Image Retrieval

Liang Zheng

1, Shengjin Wang1, Ziqiong Liu1, and Qi Tian2

1State Key Laboratory of Intelligent Technology and Systems;

1Tsinghua National Laboratory for Information Science and Technology;

1Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

2University of Texas at San Antonio, TX, 78249, USA

zheng-l06@mails.tsinghua.edu.cn wgsgj@tsinghua.edu.cn liuziqiong@ocrserv.ee.tsinghua.edu.cn qitian@cs.utsa.edu

Abstract

In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low discriminative power, so false posi- tive matches occur prevalently. Apart from the information loss during quantization, another cause is that the SIFT feature only describes the local gradient distribution. To address this problem, this paper proposes a coupled Multi- Index (c-MI) framework to perform feature fusion at index- ing level. Basically, complementary features are coupled into a multi-dimensional inverted index. Each dimension of c-MI corresponds to one kind of feature, and the retrieval process votes for images similar in both SIFT and other fea- ture spaces. Specifically, we exploit the fusion of local col- or feature into c-MI. While the precision of visual match is greatly enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation of SIFT and color features sig- nificantly reduces the impact of false positive matches. Extensive experiments on several benchmark datasets demonstrate that c-MI improves the retrieval accuracy sig- nificantly, while consuming only half of the query time com- pared to the baseline. Importantly, we show that c-MI is well complementary to many prior techniques. Assembling these methods, we have obtained an mAP of 85.8 %and N-S score of 3.85 on Holidays and Ukbench datasets, respec- tively, which compare favorably with the state-of-the-arts.

1. Introduction

This paper considers the task of near duplicate image re- trieval in large scale databases. Specifically, given a query image, our goal is to find all images sharing similar appear- ance in real time. Many state-of-the-art image retrieval systems rely on the Query Query

Query Figure

1. Three examples of image retrieval from Ukbench (Top

andMiddle) and Holidays (Bottom) datasets. For each query (left), results obtained by the baseline (the first row) and c-MI (the second row) are demonstrated. The retrieval results start from the second image in the rank list. Bag-of-Words (BoW) representation. In this model, lo- cal features such as the SIFT descriptor [10] are extract- ed and quantized tovisual wordsusing a pre-trainedcode- book . Typically, each visual word is weighted using the tf-idfscheme [20, 28]. Then, aninverted indexis leveraged to reduce computational burden and memory requirements, enabling fast online retrieval. One crucial aspect in the BoW model concerns visual matching between images based on visual words. Howev- er, the reliance on the SIFT feature leads to an ignorance of other characteristics, such as color, of an image. This prob- 4321
lem,together with the information loss during quantization, leads to many false positive matches and thus compromises the retrieval accuracy. To enhance the discriminative power of SIFT visual words, we present a coupled Multi-Index (c-MI) frame- work to perform local feature fusion at indexing level. To the best of our knowledge, it is the first time that a multi- dimensional inverted index is employed in the field of im- age retrieval. Particularly, this paper couples" SIFT and color features into a multi-index [1], so that efficient yet ef- fective image retrieval can be achieved. The final system of this paper consists of packing" and padding" modules. In the packing" step, we construct the coupled Multi- Index by taking each of the SIFT and color features as one dimension of the multi-index. Therefore, the multi-index becomes a joint cooperation of two heterogeneous features. Since each SIFT descriptor is coupled with a color feature, its discriminative power is greatly enhanced. On the oth- er hand, to improve recall, Multiple Assignment (MA) is employed. Particularly, to make c-MI more robust to illu- mination changes, we adopt a large MA value on the side of color feature. Fig. 1 presents three sample retrieval results of our method. We observe that c-MI improves the retrieval accuracy and returns some challenging results. Moreover, in the padding" step, we further incorporate somepriortechniquestoenhanceretrievalperformance. We show in the experiments that c-MI is well compatible with methods such as rootSIFT [17], Hamming Embedding [4], burstiness weighting [5], graph fusion [25], etc. As another major contribution, we have achieved new state-of-the-art results on Holidays [4] and Ukbench [12] datasets. Namely, we obtained an mAP of 85.8 %and N-S score of 3.85 on

Holidays and Ukbench, respectively.

The remainder of this paper is organized as follows. Af- teranoverviewofrelatedworkinSection2, wedescribethe packing" of c-MI framework in Section 3. In Section 4, the padding" methods and results are presented and discussed.

Finally, we conclude in Section 5.

2. Related Work

In the image retrieval community, a myriad of works have been proposed to improve the accuracy of image re- trieval. In this section, we provide a brief review of several closely related aspects. Matching RefinementIn visual matching, a large code- book [14] typically means a high precision but low recall, while constructing a small codebook (e.g., 20K) [6] guar- antees high recall. To improve precision given high recall, some works explore contextual cues of visual words, such as spatial information [14, 19, 31, 22, 2, 27]. To name a few, Shen et al. [19] perform image retrieval and localiza- tion simultaneously by a voting-based method. Alternative-

ly, Wang et al. [22] weight visual matching based on thelocal spatial context similarity. Meanwhile, the precisionof visual matching can be also improved by embedding bi-

nary features [4, 23, 32, 9]. Specifically, methods such as Hamming Embedding [4] rebuild the discriminative abili- ty of visual words by projecting SIFT descriptor into bina- ry features. Then, efficientxoroperation between binary signatures is employed, providing further check of visual matching. Feature FusionThe fusion of multiple cues has been proven to be effective in many tasks [18, 30, 13]. Since the SIFT descriptor used in most image retrieval systems only describes the local gradient distribution, feature fusion can be performed to capture complementary information. For example, Wengert et al. [23] embed local color feature in- to the inverted index to provide local color information. To perform feature fusion between global and local features, Zhang et al. [25] combine BoW and global features by graph fusion and maximizing weighted density, while co- indexing [26] expands the inverted index according to glob- al attribute consistency. Indexing StrategyThe inverted index [20] significant- ly promotes the efficiency of BoW based image retrieval. Motivated from text retrieval framework, each entry in the inverted index stores information associated with each in- dexed feature, such as image IDs [14, 26], binary features [4, 23], etc. Recent state-of-the-art works include joint in- verted index [24] which jointly optimizes all visual words in all codebooks. The closest inspiring work to ours in- cludes the inverted multi-index [1] which addresses NN search problem by de-composing" the SIFT vector into d- ifferent dimensions of the multi-index. Our work departs from [1] in two aspects. First, the problem considered in this paper consists in the indexing level feature fusion, ap- plied in the task of large scale image retrieval. Second, we actually couple" different features into a multi-index, after which the coupled Multi-Index (c-MI)" is named.

3. Proposed Approach

This section gives a formal description of the proposed c-MI framework.

3.1. Conventional Inverted Index Revisit

A majority of works in the BoW based image retrieval community employ a ONE-dimensional inverted Index [28,

14, 12], in which each entry corresponds to a visual word

defined in the codebook of SIFT descriptor. Assume that a total ofNimages are contained in an image database, denoted asD=fIigNi=1. Each imageIihas a set of local featuresfxjgdij=1, wherediis the number of local features. Given a codebookfwigKi=1of sizeK, a conventional 1-D inverted index is represented asW=fW1;W2;:::;WKg. InW, each entryWicontains a list of indexed features, in which image ID, TF score, or other metadata [4, 31, 22] are 4322

Visual

Wordi

Visual

Wordj

Indexed

Feature

Image IDTF dataOther meta data

Indexed

FeatureIndexed

FeatureFigure

2. Conventional 1-D inverted Index. Only one kind of fea-

ture (typically the SIFT feature) is used to build the inverted Index. stored. An example of the conventional inverted Index is illustrated in Fig. 2. Given a query feature, an entryWiin the inverted index is identified after feature quantization. Then, the indexed features are taken as the candidate nearest neighbors of the query feature. In this scenario, the matching functionfq() of two local featuresxandyis defined as f q(x;y) =q(x);q(y);(1) whereq()is the quantization function maping a local fea- ture to its nearest centroid in the codebook, andis the

Kronecker delta response.

The 1-D inverted index votes for candidate images sim- ilar to the query inonefeature space, typically the SIFT descriptor space. However, the intensity-based features are unable to capture other characteristics of a local region. Moreover, due to the quantization artifacts, the SIFT visu- al word is prone to producing false positive matches: local patches, similar or not, may be mapped to the same visual word. Therefore, it is undesirable to take visual word as the only ticket to feature matching. While many previous work- s use spatial contexts [31, 22] or binary features [4] to filter out false matches, our work, instead, proposes to incorpo- rate local color feature to provide additional discriminative power via the coupled Multi-Index (c-MI).

3.2. Feature Extraction and Quantization

This paper considers the coupling of SIFT and color fea- tures. The primary reason lies in that feature fusion work- s better for features with low correlation, such as SIFT and color. In feature matching, complementary informa- tion may be of vital importance. For example, given two keypoints quantized to the same SIFT visual word, if the coupled color features are largely different, they may be considered to be a false match (see Fig. 3 for an illustra- tion). To this end, SIFT and color features are extracted and subsequently quantized as follows. SIFT extraction: Scale-invariant keypoints are detect- ed with detectors, e.g. DoG [10], Hessian-affine [14], etc.

Then, a 16

16 patch around each keypoint is considered,

from which a 128-dimensional SIFT vector is calculated. &RORU1DPHVGHVFULSWRU &RORU1DPHVGHVFULSWRUFigure

3. An example of visual match.Top:A matched SIFT pair

between two images. The Hamming distance between their 64-D SIFT Hamming signatures is 12. The 11-D color name descriptors of the two keypoints in the left ( middle) and right (bottom) im- ages are presented below. Also shown are the prototypes of the 11 basic colors (colored discs). In this example, the two local features are considered as a good match both by visual word equality and Hamming distance consistency. However, they differ a lot in color space, thus considered as a false positive match in c-MI.

Color extraction: we employ the Color Names (CN)

descriptor [18]. CN assigns a 11-D vector to each pixel, in which each entry encodes one of the eleven basic colors: black, blue, brown, grey, green, orange, pink, purple, red, white, and yellow. Around each detected keypoint, we con- sider a local patch with an area proportional to the scale of the keypoint. Then, CN vectors of each pixel in this area are calculated. We take the mean CN vector as the color descriptor coupling SIFT for the current keypoint. QuantizationFor SIFT and CN descriptors, we use the conventional quantization scheme as in [14]. Codebooks are trained using independent SIFT and CN descriptors, re- spectively. Each descriptor is quantized to the nearest cen- troid in the corresponding codebook by Approximate Near- est Neighbor (ANN) algorithm. To improve recall, Multiple Assignment (MA) is applied. Particularly, to deal with the illumination variations, MA is set large for CN feature. Binary signature calculationIn order to reduce quan- tization error, we calculate binary signatures from original descriptors. For a SIFT descriptor, we follow the method proposed in [4], resulting in a 64-D binary signature. Nevertheless, on the side of color feature, since each di- mension of the CN descriptor has explicit semantic mean- 4323

SIFT inverted index

Color inverted index

u1 u2 ui uKs... ... vKc vj v2 v1... ... word pair (ui, vj) in the query (ui, vj)

Indexed Feature

Image IDCN binary

signaturesOther meta data

Indexed Feature...Indexed FeatureFigure

4. Structure of c-MI. The codebook sizes areKsandKc

for SIFT and color features, respectively. During online retrieval, the entry of word tuple(ui;vj)is checked. ing, we employ the binarization scheme introduced in [32]. Specifically, given a CN descriptor represented as f

1;f∈;:::;f11)T, a 22-bit binary featurebcan be produced

as follows b i;bi+11) =8 :(1;1);iffi>^th1; (1 ;0);if^th∈< fi^th1; (0 ;0);iffi^th∈(2) wherebi(i= 1;2;:::;11)is theith entry of the resulting binary featureb. Thresholds^th1=g∈,^th∈=g5, where g

1;g∈;:::;g11)Tis the sorted vector of(f1;f∈;:::;f11)Tin

descending order.

3.3. Coupled MultiIndex

Structure of c-MIIn [1], the 128-D SIFT descriptor is tization [7]. The multi-index is thus organized around the codebooks of corresponding blocks. Their approach en- ables more accurate nearest neighbor (NN) search for SIFT features. In our work, however, we consider the task of im- ageretrieval, whichdiffersfrompureNNsearch. Moreover, contrary to [1] wecoupledifferent features into a multi- index, so that feature fusion is performed at indexing level. In this paper, we consider the 2-D inverted index, which is also calledsecond-orderin [1].

Let~x= [xs;xc]2 RDs+cbe a coupled feature de-

scriptor at keypointp, wherexs2 RDs;xc2 RDcareSIFT and color descriptors of dimensionDsandDc, re- spectively. For c-MI, two codebooks are trained for each feature. Specifically, for SIFT and color descriptors, code- are generated, whereKsandKcare codebook sizes, re- spectively. As a consequence, c-MI consists ofKsKc entries, denoted asW=fW11;W1∈;:::;Wij;:::;WKsKcg, i= 1;:::;Ks,j= 1;:::;Kc, as illustrated in Fig. 4. When building the multi-index, all feature tuples~x= x s;xc]are quantized into visual word pairs(ui;vj);i= 1 ;:::;K s;j= 1;:::;Kcusing codebooksUandV, so that u iandvjare the nearest centroids to featuresxsandxcin codebooksUandV, respectively. Then, in the entryWij, information (e.g. image ID, CN binary signatures and oth- er meta data) associated with the current feature tuple~xis stored continuously in memory.

Querying c-MIGiven a query feature tuple~x=

x s;xc], we first quantize it into a visual word pair(ui;vj) as in the offline phase. Then, the corresponding entryWij in c-MI is identified, and the list of indexed features are tak- en as the candidate images, similar to the classic inverted index described in Section 3.1. In essence, the matching functionf0q s;qc()of two local feature tuples~x= [xs;xc] and~y= [ys;yc]is written as f 0q s;qc(~x;~y) =qs(xs);qs(ys)qc(xc);qc(yc);(3) whereqs()andqc()are quantization functions for SIFT and CN features, respectively, andis the Kronecker delta response as in Eq. 1. As a consequence, a local match is valid only if the two feature tuples are similar both in SIFT and color feature spaces.

Moreover, the Inverse Document Frequency (IDF)

scheme can be applied in the multi-index directly. Specifi- cally, the IDF value of entryWijis defined as idf(i;j) =Nn ij;(4) whereNis the total number of images in the database, and n ijencodes the number of images containing the visual word pair(ui;vj). Furthermore, thel∈normalization can be also adopted in the 2-D case. Let an image be represent- ed as a 2-D histogramfhi;jg,i= 1;:::;Ks,j= 1;:::;Kc, wherehi;jis the term-frequency (TF) of visual word pair u i;vj)in imageI, thel∈norm is calculated as, k

Ik∈=0

K sX i =1K cX j=1h ∈i;j1 A12 :(5) Since our multi-index structure mainly works by achieving high precision, we employ Multiple Assignment (MA) to improve recall. To address illumination variations, we set 4324
arelatively large value to the color feature. In our experi- ments, we find thatl2-normalization produces slightly high- er performance than Eq. 5, which is probably due to the asymmetric structure of the coupled multi-index. Furthermore, to enhance the discriminative power of C- N visual words, we incorporate color Hamming Embed- ding (HE c) into c-MI. Two feature tuples are considered as a matchiffEq. 3 is satisfiedandthe Hamming distance d bbetween their binary signatures is below a pre-defined threshold. The matching strength is defined asexp(-d2 b 2).

Therefore,

the matching function in Eq. 3 is updated as f qs;qc(~x;~y) =(f0q s;qc(~x;~y)·exp -d2b 2 d b< ; 0 ;otherwise. (6) Then, in the framework of c-MI, the similarity score be- tween a database imageIand query imageQis defined as sim (Q;I) =P 4.

Experiments

In this section, we evaluate the proposed method on five public available datasets: the Ukbench [12], Holidays [4], DupImage [31], Mobile [22] and MIR Flickr 1M [11].

4.1. Datasets

UkbenchA total of 10200 images are contained in this dataset, divided into 2550 groups. Each image is taken as the query in turn. The performance is measured by the aver- age recall of the top four ranked images, referred to as N-S score (maximum 4). HolidaysThis dataset consists of 1491 images from per- sonal holiday photos. There are 500 queries, most of which have 1-2 ground truth images. mAP (mean average preci- sion) is employed to measure the retrieval accuracy.

DupImagesThis dataset is composed of 1104 images

divided into 33 groups of partial-duplicate images. 108 im- ages are selected as queries, and mAP is again used as the accuracy measurement.quotesdbs_dbs35.pdfusesText_40

[PDF] indexation d'images par le contenu

[PDF] recherche d'image par contenu visuel

[PDF] comment indexer une image

[PDF] indexation images

[PDF] indexation et recherche d'images

[PDF] descripteurs d'images

[PDF] la bastille paris

[PDF] la bastille 1789

[PDF] qu'est ce que la bastille

[PDF] multiplication a trou cm2

[PDF] bastille place

[PDF] la bastille aujourd'hui

[PDF] soustraction a trou cm2

[PDF] bastille arrondissement

[PDF] multiplication a trou 6eme

[PDF] Packing and Padding: Coupled Multi-index for Accurate Image

Liang Zheng

1, Shengjin Wang1, Ziqiong Liu1, and Qi Tian2

1State Key Laboratory of Intelligent Technology and Systems;

1Tsinghua National Laboratory for Information Science and Technology;

1Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

2University of Texas at San Antonio, TX, 78249, USA

Abstract

1. Introduction

Query Figure

1. Three examples of image retrieval from Ukbench (Top

Holidays and Ukbench, respectively.

Finally, we conclude in Section 5.

2. Related Work

3. Proposed Approach

3.1. Conventional Inverted Index Revisit

14, 12], in which each entry corresponds to a visual word

Visual

Visual

Indexed

Feature

Image IDTF dataOther meta data

Indexed

FeatureIndexed

FeatureFigure

2. Conventional 1-D inverted Index. Only one kind of fea-

Kronecker delta response.

3.2. Feature Extraction and Quantization

Then, a 16

16 patch around each keypoint is considered,

3. An example of visual match.Top:A matched SIFT pair

Color extraction: we employ the Color Names (CN)

SIFT inverted index

Color inverted index

Indexed Feature

Image IDCN binary

Indexed Feature...Indexed FeatureFigure

4. Structure of c-MI. The codebook sizes areKsandKc

1;f∈;:::;f11)T, a 22-bit binary featurebcan be produced

1;g∈;:::;g11)Tis the sorted vector of(f1;f∈;:::;f11)Tin

3.3. Coupled MultiIndex

Let~x= [xs;xc]2 RDs+cbe a coupled feature de-

Querying c-MIGiven a query feature tuple~x=

Moreover, the Inverse Document Frequency (IDF)

Ik∈=0

Therefore,

Experiments

4.1. Datasets

DupImagesThis dataset is composed of 1104 images