Histogram-based Image Retrieval

As introduced in Swain, Ballard ([SB91]) 3-dimensional color histograms can be applied for specific object recognition. We consider the image retrieval case, i.e. given a large database of images and a novel query image, the task is to determine the database image which is most similar to the query image. The images in the database as well as the query image are represented by their 3-dimensional color histograms. Hence, the search for the most similar image in the database is equal to find the 3-d color histogram, which matches best to the 3-d color historgram of the query image.

In this demo the Another 53 objects database is applied. The database contains for each of the 53 objects 5 images, which differ in viewpoint, rotation vertical- and horizontal alignment of the object. Actually, the dataset contains one object, for which only one image exists. This image (image215) has been removed for the test below. A sample of the image database is shown in the picture below.

Testprocedure

In order to evaluate the recognition performance of the 3-d color histogram image retrieval system a leave-one-out test procedure is applied. This means that in each of the 265 iterations one image is selected as query image and the remaining 264 images constitute the database. The recognition is considered to be correct, if the best matching image contains the same object as the query image.

Recognition performance depends on the metric used for comparing multidimensional histograms. In this demo the following distance measures are compared:

  • Euclidean

  • Pearson Correlation

  • Canberra

  • Bray-Curtis

  • Intersection

  • Bhattacharyya

  • Chi-Square

For a formal definition of these metrics see previous section.

Implementation

The following modules are applied:

import os
import numpy as np
from PIL import Image
import cv2
from scipy.spatial.distance import *
from scipy.stats import wasserstein_distance

Distance functions

The Scipy package spatial.distance contains methods for calculating the euclidean-, Pearson-Correlation-, Canberra- and Bray-Curtis-distance. The remaining distance measures as well as a corresponding normalization function are defined in this python program.

def normalize(a):
    s=np.sum(a)
    return a/s

def mindist(a,b):
    return 1.0/(1+np.sum(np.minimum(a,b)))

def bhattacharyya(a,b): # this is acutally the Hellinger distance,
                        # which is a modification of Bhattacharyya
                        # Implementation according to http://en.wikipedia.org/wiki/Bhattacharyya_distance
    anorm=normalize(a)
    bnorm=normalize(b)
    BC=np.sum(np.sqrt(anorm*bnorm))
    
    if BC > 1:
        #print BC
        return 0
    else:
        return np.sqrt(1-BC)

def chi2(a,b):
    idx=np.nonzero(a+b)
    af=a[idx]
    bf=b[idx]
    return np.sum((af-bf)**2/(af+bf))

The applied metrics are defined as a list of function names and as a list of strings:

methods=[euclidean,correlation,canberra,braycurtis,mindist,bhattacharyya,chi2,wasserstein_distance]
methodName=["euclidean","correlation","canberra","braycurtis","intersection","bhattacharyya","chi2","emd"]

Next, the number of bins per dimension B is defined. The 3-dimensional color histogram then consists of \(B\cdot B\cdot B\) bins. The images are imported from the locally saved image database, which contains the 265 .jpeg images. For calculating the 3-dimensional color histograms of the images the Numpy method histogramdd is applied. The histograms are normalized and flattened (transformed into a 1-dimensional array)

B=5
# create a list of images
path = '../Data/66obj/images'
imlist = [os.path.join(path,f) for f in os.listdir(path) if f.endswith(".JPG")]
imlist.sort()
#print(imlist)
features = np.zeros([len(imlist), B*B*B])
hists=[]
for i,f in enumerate(imlist):
    im = np.array(Image.open(f))
    # multi-dimensional histogram
    h=cv2.calcHist([im],[0,1,2],None,[B,B,B],[0,256,0,256,0,256])
    h=normalize(h) #normalization
    features[i] = h.flatten()
    hists.append(h)
print(features.shape)
N_IM=features.shape[0]
(265, 125)
import pandas as pd
resultDF=pd.DataFrame(index=methodName,columns=["Correct", "Error"])
resultDF
Correct Error
euclidean NaN NaN
correlation NaN NaN
canberra NaN NaN
braycurtis NaN NaN
intersection NaN NaN
bhattacharyya NaN NaN
chi2 NaN NaN
emd NaN NaN

The following outer loop iterates over all distance-metrics defined in list methods. For each metric first the distance for all pairs of images is calculated and stored in the 2-dimensional array dist. The entry in row i, column j of this matrix stores the distance between the histograms of the i.th and the j.th image. Applying numpy’s argsort() method on each row of the 2 dimensional array dist another 2-dimensional array s is calculated. The i.th row of this matrix contains the indicees of all images, sorted w.r.t. increasing distance to image i. I.e. in the first column of the i.th row of s is the index i and in the second column is the index of the image, which matches best with image i. In order to determine if the recognition is correct, one must just check if the index in the second column of s belongs to the same object as the index of the corresponding row. This can be done by a simple integer division, since the images are indexed such that indicees 0 to 4 belong to the same object, indicees 5 to 9 belong to the next object, and so on. The results are saved in the pandas dataframe.

values = np.arange(0,features.shape[1])
for j,m in enumerate(methods):
    dist=np.zeros((N_IM,N_IM))
    for h1 in range(N_IM):
        for h2 in range(h1,N_IM):
           if j == len(methods)-1:
            dist[h1,h2]=wasserstein_distance(values,values,features[h1,:],features[h2,:])
           else: 
            dist[h1,h2]=m(features[h1,:],features[h2,:])
           dist[h2,h1]=dist[h1,h2]
    s=np.argsort(dist,axis=1)
    correct=0
    error=0
    for i in range(N_IM):
        for a in s[i,1:2]:
            if int(a/5) != int(i/5):
                error+=1
            else:
                correct+=1
    resultDF.loc[methodName[j],"Correct"]=correct
    resultDF.loc[methodName[j],"Error"]=error
resultDF
Correct Error
euclidean 181 84
correlation 175 90
canberra 257 8
braycurtis 222 43
intersection 222 43
bhattacharyya 243 22
chi2 239 26
emd 159 106

Use pretrained CNNs for Retrieval

Convolutional Neural Networks (CNN) consist of a feature-extractor part and a classifier. The last part of the CNN (blue part in the image below) is the classifier, consisting one or more dense layers. All other layers (orange part in the image below) constitute the feature extractor part, usually realised by convolutional and pooling layers. In a well trained CNN, the feature-extractor part provides a good representation of the input image to the classifier.

In the context of image retrieval one can apply the pretrained feature-extractor part to calculate a good representation of all images - hopefully this representation is better than e.g. a global color-histogram. Instead of comparing color histograms of images in order to find the best matching image, one can compare these dnn-generated features for image retrieval.

In the remainder of this notebook we apply different pretrained deep neural networks to calculate good representations. Based on these representations best matching image-pairs are determined in the same way as above. All CNNs have been pretrained on the imagenet-dataset.

from tensorflow.keras.applications import vgg16, inception_v3, resnet50, mobilenet
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
#from tensorflow.keras.applications.imagenet_utils import decode_predictions
import matplotlib.pyplot as plt

Load pretrained CNNs without the classifier part, i.e. only the feature-extractor part:

vgg_model = vgg16.VGG16(weights='imagenet',include_top=False)
inception_model = inception_v3.InceptionV3(weights='imagenet',include_top=False)
resnet_model = resnet50.ResNet50(weights='imagenet',include_top=False)
mobilenet_model = mobilenet.MobileNet(weights='imagenet',include_top=False)
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
IMGSIZE_X=224
IMGSIZE_Y=224
image_batch=np.zeros((len(imlist),IMGSIZE_X,IMGSIZE_Y,3))
for i,filename in enumerate(imlist):
    #im = np.array(Image.open(f))
    original = load_img(filename, target_size=(IMGSIZE_X,IMGSIZE_Y))
    numpy_image = img_to_array(original)
    image_batch[i,:,:,:] = np.expand_dims(numpy_image, axis=0)
print(image_batch.shape)
(265, 224, 224, 3)
nets=[vgg16, inception_v3, resnet50, mobilenet]
models=[vgg_model, inception_model, resnet_model, mobilenet_model]
modelNames=["vgg","inception","resnet","mobilenet"]
import pandas as pd
resultDF2=pd.DataFrame(index=modelNames,columns=["Correct", "Error"])
resultDF2
Correct Error
vgg NaN NaN
inception NaN NaN
resnet NaN NaN
mobilenet NaN NaN
metric=correlation #euclidean
for net,model,name in zip(nets,models,modelNames):
    processed_image = net.preprocess_input(image_batch.copy())
    predictions = model.predict(processed_image)
    features=predictions.reshape(265,-1)
    #modelDict[name]["features"]=features
    
    dist=np.zeros((N_IM,N_IM))
    for h1 in range(N_IM):
        for h2 in range(h1,N_IM):
           dist[h1,h2]=metric(features[h1,:],features[h2,:])
           dist[h2,h1]=dist[h1,h2]
    s=np.argsort(dist,axis=1)
    correct=0
    error=0
    for i in range(N_IM):
        for a in s[i,1:2]:
            if int(a/5) != int(i/5):
                error+=1
            else:
                correct+=1
    resultDF2.loc[name,"Correct"]=correct
    resultDF2.loc[name,"Error"]=error
    #print("Name: %s: \t Correct: %3d \t Error: %3d"%(name,correct,error))
resultDF2
Correct Error
vgg 248 17
inception 248 17
resnet 260 5
mobilenet 253 12