CBCL for RGB-D Indoor Scene Classification

Ali Ayub and Alan R. Wagner

Abstract: Classifying images taken from indoor scenes is an important area of research. The development of an accurate indoor scene classifier has the potential to improve indoor localization and decision-making for domestic robots, offer new applications for wearable computer users, and generally result in better vision-based situation awareness thus impacting a wide variety of applications. Yet, high intra-class variance and low inter-class variance make indoor scene classification an extremely challenging task. To cope with this problem, we propose a clustering approach inspired by the concept learning model of the hippocampus and the neocortex, to generate clusters and centroids for different scene categories. Test images depicting different scenes are classified by using their distance to the closest centroids (concepts). Modeling of RGB-D scenes as centroids not only leads to state-of-the-art (SOTA) classification performance on benchmark datasets (SUN RGB-D and NYU Depth V2), but also offers a method for inspecting and interpreting the space of centroids. Inspection of the centroids generated by our approach on RGB-D datasets leads us to propose a method for merging conceptually similar categories, resulting in improved accuracy for all approaches.

Centroid Based Concept Learning (CBCL)

CBCL is inspired by the concept learning model of the hippocampus and the neocortex. CBCL treats each RGB-D image pair as an episode and extracts high-level features for RGB and depth modalities. CBCL uses a fixed data representation for feature extraction. After feature extraction, CBCL generates a set of concepts in the form of centroid pairs (one for each modality) for each class using a cognitively-inspired clustering approach (denoted as Agg-Var clustering). After generating the centroid pairs, to predict the label of a test RGB-D image, the distance of the feature vector pair (RGB and depth) of the test image to the n closest centroid paris is used.

Results on SUN RGB-D Dataset

Comparison with baselines and state-of-the-art methods on SUN RGB-D test set.Performance depicted as classification accuracy(%). Aug. denotes use of augmented data.

Model Interpretation and Category Merging

Using the silhouette analysis, we combined four category pairs into single categories in the SUN RGB-D dataset, reducing the total number of classes for the SUN RGB-D dataset from 19 to 15 and then reevaluated CBCL on this new dataset. The new performance was 66.2%, 6.7% higher than on the original dataset. We also tested the VGG baseline on the updated dataset as well and its accuracy increased to 55.2%, a 5.4% increase but still 11% below CBCL.

(a) Images from the category bedroom with different layouts are represented by different centroids after Agg-Varclustering. (b) Images from the scene categories classroom and lecture_theatre are represented by different centroids even though they are very similar. The distance between the corresponding centroids of the images is small, hence the classifier misclassifies these images. Sample images are from SUN RGB-D dataset.

Related Publications

A. Ayub and A. R. Wagner, “Centroid Based Concept Learning for RGB-D Indoor Scene Classification,” British Machine Vision Conference (BMVC), 2020 [Pdf] [Code] [Talk]