Details of included features

The following features are provided in this toolbox:

Color: Convert the image to color names [1,12] and extract dense overlapping patches of multiple sizes in the form of a histogram of color names. Then apply the bag-of-words + spatial pyramid pipeline explained below.
Gist: GIST descriptor describing the spatial envelope of the image [10]
Dense HOG2x2, HOG3x3: Extract HOG [9] in a dense manner (i.e. on a grid) [3] and concatenate 2x2 or 3x3 cells to obtain a descriptor at each grid location. Then apply the bag-of-words + spatial pyramid pipeline explained below. This feature is also used in [11].
LBP: Extract non-uniform Local Binary Pattern [8] descriptor (neighborhood: 8, transitions: 2), and concatenate 3 levels of spatial pyramid to obtain final feature vector.
Dense SIFT: Extract SIFT [5] descriptor in a dense manner (i.e. on a grid) at multiple patch sizes, and then apply the bag-of-words + spatial pyramid pipeline explained below.
SSIM: Extract Self-Similarity Image Matching [7] descriptor in a dense manner and apply the bag-of-words + spatial pyramid pipeline to obtain the final feature vector.

Bag-of-words pipeline: using a random sampling of the extracted features from various patches, learn a dictionary using k-means [2], and apply locality-constrained linear coding (LLC) [6] to soft-encode each patch to a some dictionary entries. Then, as shown in [6], we apply max pooling with a spatial pyramid [4] to obtain the final feature vector. We use LLC as it allows the use of a linear classifier for classification instead of using non-linear kernels.

The feature configurations can be accessed through the conf structure: c.feature_config.(feature_name), where feature_name is one of the following:

'color', 'gist', 'hog2x2', 'hog3x3', 'lbp', 'sift', 'ssim'

For the features using bag-of-words, some important feature configurations to consider are pyramid_levels and dictionary_size.

References

[1] J. van de Weijer, C. Schmid, J. Verbeek Learning Color Names from Real-World Images, CVPR 2007
[2] C. Elkan, Using the Triangle Inequality to Accelerate k-Means
[3] B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman, LabelMe: a database and web-based tool for image annotation, IJCV 2008
[4] S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR 2006
[5] D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV 2004
[6] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, CVPR 2010
[7] E. Shechtman, M. Irani, Matching Local Self-Similarities across Images and Videos, CVPR 2007
[8] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns, PAMI 2002
[9] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, CVPR 2005
[10] A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, IJCV 2001
[11] J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba, SUN database: Large-scale scene recognition from abbey to zoo, CVPR 2010
[12] R. Khan, J. van de Weijer, F. Khan, D. Muselet, C. Ducottet, C. Barat, Discriminative Color Descriptors, CVPR 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEATURES.md

FEATURES.md

Details of included features

References

Files

FEATURES.md

Latest commit

History

FEATURES.md

File metadata and controls

Details of included features

References