This lesson will show how to identify and track reliable and stable features through a sequence of images.
- Locate keypionts (point of interest, salient point)
The majority of the keypoint types is the distribution of brightness information over the image. High intensity gradient indicates the brightness changes rapidly, and those places might therefore be suitable keypoints.
Mathematically, the gradient is the partial derivative of the image intensity into both x and y direction. The approximation of gradient would be the intensity differences between neighboring pixels, divided by the distance between those pixels. We can also compute the direction as well as the magnitude of the intensity gradient vector.
Sobel
operator is one of the most famous approaches to compute the gradient. They are 3x3 kernels, as small integer-valued filters.
Before computing the intensity gradient, we should apply noise filtering to get an accurate result. Gaussian filter is shifted over the image and combined with the intensity values beneath it. Two parameters can be adjusted:
-
Standard deviation: controls the spatial extension of the filter in the image plane. Larger the standard deviation, wider the area which is covered by the filter
-
Kernel size: defines the number of pixels around the center location will contribute to the smoothing operation
Gaussian filtering works by assigning each pixel a weighted sum of the surrounding pixels based on the heigh of Gaussian curve at each point. The largest contribution will come from the center pixel itself. The steps like:
- Create a filter kernel with the desired properties (e.g. Gaussian smoothing or edge detection)
- Define the anchor point within the kernel (usually the center position) and place it on top of the first pixel of the image
- Compute the sum of the products of kernel coefficients with the corresponding image pixel values beneath
- Place the result to the location of the kernel anchor in the input image
- Repeat the process for all pixels over the entire image.
In the gaussian_smoothing.cpp, before applying cv::filter2D()
, the kernel coefficients must be normalized. (6854a85)
In the gradient_sobel.cpp, create the sobel kernels and apply the filters to x and y directions respectively. (73a5b01)
In the magnitude_sobel.cpp, the processing pipeline is: convert the image to gray scale -> smooth it using cv::GaussianBlur()
-> apply cv::filter2D()
with sobel kernels in both x and y directions -> calculated the magnitude for each pixel based on the x/y gradient. (4d5fc96)
Keypoint detector: choose points from an image based on a local maximum of a function, such as the "cornerness" metric for Harris Corner Detector.
Descriptor: a vector of values, which describes the image patch around a keypoint.
The idea of keypoint detection is to detect a unique structure in an image that can be precisely located in both coordinate directions. As discussed in the previous section, corners are ideally suited for this purpose. In order to locate a corner, we consider how the content of the window would change when shifting it by a small amount. Such change is described by the sum of squared differences (SSD).
A covariance matrix Hw
is part of the calculation result of that change. The matrix Hw
can be visualized as an eclipse, whose axis length and direction are given by its eigenvalues and eigenvectors. For Harris Detector, we can derive a corner response measure at every pixel location with the factor k being an empirical constant (0.04 ~ 0.06).
After getting Harris corner response, it is time to perform a non-maxima suppression (NMS) to:
- ensure the pixel with maximum corner response in a local neighborhood
- prevent corners from being too close to each other
In the cornerness_harris.cpp, first, the Harris Corner Response matrix is calculated. Then we locate a local maxima in the response matrix and perform NMS to it. (2776d68)
Four basic transformation types we need to think about when selecting suitable keypoint detector:
- Rotation
- Scale change
- Intensity change
- Affine transformation
Harris detector is robust under rotation and additive intensity shifts, but sensitive to scale change, multiplicative intensity shifts (i.e. change in contract), and affine transformations.
- Classic detectors
- aim at maximizing the detection accuracy
- Harris Corner Detector
- Good Features to Track (Shi-Tomasi)
- Scale Invariance Feature Transform (SIFT)
- Speeded Up Robust Features
- aim at maximizing the detection accuracy
- Modern detectors
In the detect_keypoints.cpp, cv::FastFeatureDetector::create()
a pointer to the cv::FastFeatureDetector
object, call its detect()
member function to extract the keypoints. Compare the number of keypoints, keypoints distribution, and processing speed with Shi-Tomasi detector. (4d09290)
Descriptor provides a distinctive information on the surrounding area of a keypoint. It helps assigning similar keypoints in different images to each other.
- Gradient-based descriptor: SIFT
- based on Histograms of Oriented Gradients (HOG)
- the idea is to describe the structure of an object by the distribution of its intensity gradients in a local neighborhood
- advantages
- robust at identifying objects even among clutter and under partial occlusion
- invariant to uniform changes in scale, to rotation, to changes in both brightness and contrast
- partially invariant to affine distortions
- disadvantages
- low speed due to computing the intensity gradients
- heavily patented, not free to use
- based on Histograms of Oriented Gradients (HOG)
-
Binary descriptor: BRISK
- solely rely on the intensity information, and encode the information around a keypoint in a string of binary numbers
- components
- sampling pattern: describes where sample points are located around the keypoint
- orientation compensation: removes the influence of rotation of the image patch around the keypoint
- sample-pair selection: generates pairs of sample points that are compared against each other with regard to their intensity values
-
OpenCV for detectors and descriptors
- FeatureDetector and DescriptorExtractor share a few common algorithms
- Overview of feature detection and descriptor classes
In the describe_keypoints.cpp, add HOG-based algorithm SIFT to compute the keypoints, compare the processing speed, number of keypoints, and the distribution of the keypoints with the Binary algorithm BRISK. (c51971b)
Once locating and describing a set of keypoints in a sequence of images, the next step is to find the best fit for each keypoints in successive frames. We need to implement a suitable similarity measures to uniquely assign keypoint pairs. One simple measure is to compute the distance between two descriptors.
- Sum of Absolute Differences (SAD)
- SAD norm is also referred as L1-norm
- Sum of Squared Differences (SSD)
- SSD norm is also referred as L2-norm
- better for gradient-based descriptors
- Hamming Distance (HD)
- best for binary descriptors who consist only
0
or1
- compute by using
XOR
function- return
0
if two bits are identical; return1
if two bits are different - sum of all
XOR
operations is simply the number of differing bits between both descriptors
- return
- best for binary descriptors who consist only
- Methods to find matching pairs based on distance measures
- Brutal force matching
- for a given keypoints from the first image, calculates the distances to every keypoint in the second image
- Fast library for approximate nearest neighbors (FLANN)
- build KD-tree to search for potential matching pairs efficiently instead of an exhaustive search
- Brutal force matching
Both matching algorithms accept a descriptor distance threshold T which is used to limit the number of matches to the "good" ones, and discard matches where the respective pairs are no correspondences. False matches are inevitably existing, to counteract that, we use cross checking matching. It applies the matching procedure in both directions, and keep only those matches whose best match in one direction equals the best match in the other direction.
In the descriptor_matching.cpp, implement the FLANN matching and the k-nearest-neighbor KNN selection (keeping the best k matches per keypoint). Compare the results with Brutal Force matching with nearest-neighbor NN selection (keeping only the best match). (720b82d)