One incredibly important aspect of human and animal vision is the ability to follow objects and people in our view. Whether it is a tiger chasing its prey, or you trying to catch a basketball, tracking is so integral to our everyday lives that we forget how much we rely on
In this assignment, you will be implementing an algorithm that will track an object in a video.
You will fifirst implement the Lucas-Kanade tracker, and then a more computationally effiffiffi-cient version called the Matthew-Baker (or inverse compositional) method . This method is one of the most commonly used methods in computer vision due to its simplicity and wide applicability. We have provided two video sequences: a car on a road, and a helicopter approaching a runway.
To initialize the tracker you need to defifine a template by drawing a bounding box around the object to be tracked in the fifirst frame of the video. For each of the subsequent frames the tracker will update an affiffiffine transform that warps the current frame so that the template in the fifirst frame is aligned with the warped current frame.
An image transformation or warp is an operation that acts on pixel coordinates and maps pixel values from one place to another in an image. Translation, rotation and scaling are all examples of warps. We will use the symbol W to denote warps. A warp function W has a set of parameters p associated with it and maps a pixel with coordinates x = [u v] T to x 0 = [u 0 v 0 ] T .
x 0 = W(x; p) (1)
An affiffiffine transform is a warp that can include any combination of translation, anisotropic scaling and rotations. An affiffiffine warp can be parametrized in terms of 6 parameters p = p1 p2 p3 p4 p5 p6] T . One of the convenient things about an affiffiffine transformation is that it is linear; its action on a point with coordinates x = [u v] T can be described as a matrix operation
Note that for convenience when we want to refer to the warp as a function we will use W(x; p) and when we want to refer to the matrix for an affiffiffine warp we will use W(p). Table 1 contains a summary of the variables used in the next two sections. It will be useful to keep these in mind
Lucas-Kanade: Forward Additive Alignment
A Lucas Kanade tracker maintains a warp W(x; p) which aligns a sequence of images It to a template T. We denote pixel locations by x, so I(x) is the pixel value at location x in image I. For the purposes of this derivation, I and T are treated as column vectors (think of them as unrolled image matrices). W(x; p) is the point obtained by warping x with a transform that has parameters p. W can be any transformation that is continuous in its parameters p. Examples of valid warp classes for W include translations (2 parameters), affiffiffine transforms (6 parameters) and full projective transforms (8 parameters). The Lucas Kanade tracker minimizes the pixel-wise sum of square difffference between the warped image I(W(x; p)) and the template T.