zicongwu / Kalman-Filter-Object-Tracking

2D Object Tracking Using Kalman filter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#Object Tracking Using Kalman Filter

##Shahin Khobahi

###I. Introduction

In this project, we are proposing an adaptive filter approach to track a moving object in a video. Currently, object tracking is an important issue in many applications such as video survelance, traffic management, video indexing, machine learning, artificial intelligence and many other related fields. As we will discuss in the following sections, moving object tracking can be interpreted as an estimation problem. Kalman filter is a powerful algorithm that can be used in the state estimation problems and that is the reason we used this method to estimate and predict the position of a moving object. In the first stage of this project we use the background subtraction method to detect the moving object in the video and then we use the Kalman filter ro predict and estimate the next state of the object.

###II. Problem Formulation

####A. Object Detection Using Background Subtraction

A video is composed of a series of frames each which can be considered as a 2D signal. So, a video can be seen as a two-dimensional (2D) signal through time. Moreover, there are two types of objects in a video: steady and moving objects. Steady objects do not change from frame to frame and can be considered as the background scene. The goal is to detect a moving objects from the steady ones. First let us provide an example: consider that you are looking at a wall and suddenly a birdy fly over this wall. The steady object in this scene is the wall and the moving object is the birds. The bird is in fact disturbing your observation of the wall(background), so in the context of signal processing, this bird(moving object) can be seen as a noise to that background. In other words, in a video, a moving object is like a noise to the background scene which is a fixed signal, and this moving object is adding noise to our observation of that background. Consequently, each frame of the video can be interpreted as a noisy observation of the background. Therefore, the problem is just simply the noise detection in a signal. The following model can be used for our problem:

y = x + v (1)

Where y is our noisy measurement of x (background signal), and v denotes the disturbance which in fact is our moving object disturbing that background signal x. As it was mentioned earlier, we need to extract the noise v from our noisy signal y (the video). Each frame of the video is a noisy realization of the signal y and we refer to the i-th frame of the video as ui. Further we assume that the video has N frames. Our approach of extracting the noise from the observations ui is to first obtain an estimation of the background signal ˆx, then we subtract each observation ui from the estimated signal ˆx to obtain an estimation of the noise at each frame:

ˆvi = ui − xˆ (2)

Given two deterministic random variables {x,y}, we define the least-mean-squares estimator (l.m.s.e) of x given y as the conditional expectation of x given y:

xˆ= E (x |y) = E[x|u0,u2,...,uN −1] (3)

For simplicity, we assume that we model x as an unknown constant instead of a random variable. Further we assume that we are given N frames of the video and that is all the information we have. Now, we model our problem as follows:

y(i) = x + v(i), i = 0,1,...,N − 1 (4)

and we define the column vector 𝟙 = [1,1,,1]T. Then,

y = 𝟙x +v (5)

if this is the case, according to Gauss-Markov theorem, the optimal linear estimator (m.v.u.e) of x is:

 1 N∑−1 1 N∑−1 ˆxmvue = -- y(i) =-- ui N i=0 N i=0 (6)

Namely, (6) means that the optimal linear estimator of x given {y(i)}, is simply the mean of the samples(measurements). So, in order to obtain an estimation of the background of the video, we take the average of all frames and store it as the background scene. Fig. 1 illustrates 4 frames of a sample video, these samples are in fact 4 noisy measurements of our signal, and that yellow ball which we are trying to track acts as the disturbance to the background of the video(the door and the wall). Fig. 2 provides the background scene that is the result of averaging over all of the frames. Please note that in this project we are assuming that the background does not change, so sample-mean estimator is a good estimation of the background. However, in the case of real-time tracking where the background is not changing, one can feed the average estimator as the frames arrives; evidently, in this case, estimation improves with time.

Now that we obtained ˆx, we can extract the noise from the signal(video) by subtracting each frame from the background. Fig. 3 provides four realization of the noise(moving object) at different frames. Due to the fact that in our problem we do not care about the energy of the noise - the gray level of an image(pixels) is proportional to the energy or the amount of information it contains (entropy of the image) - we can use one-bit Compressed Sensing method to store the noise. That is, we use the following model to store the noise :

zi = sgn(vi − τi), i = 0,1,...,N − 1 (7)

Where zi denotes the quantized measurement of vi with respect to the threshold τi. Namely, instead of saving the real value of the measurement vi, we only save the sign of it with respect to the defined threshold τi at that measurement. In practice, one may use an adaptive threshold but in our application we use a fixed threshold at all of the measurements. Also, it is worth to mention that in the 1-bit compressed sensing model (7) we lose all of the information about the magnitude of vi but as we mentioned earlier we do not care about the energy of the noise, so this model can be used to store the measurements and improve the speed of the tracking process and also it lowers the dimension of the calculations(an image typically has the intensity information of the colors: Red, Green, and Blue and after quantizing the measurements based on (7) we only has one dimension which is known as binary image). Fig. 4, illustrates one estimation of the noise after quantization. It can be observed that we have an anomaly at the bottom right corner of the Fig. 4, which emphesized the fact that this is not an exact realization of the noise(moving object) but instead it is an estimation of the noise and is prone to some errors. Now that we estimated the moving object, we can easily find the center of the object by inspecting the binary image of each frame and find the area that contains more 1 and choose the larger area as the object. Then we can estimate the center of the area, and store it as the position of the moving object at each frame. Eventually, in Fig. 5, you can see that we detected the moving object at each frame. The yellow circle denotes our detection.


PIC Fig. 1.  Four sample frames of the video. In other words, these are four noisy measurements of the background (which is a 2D signal).



PIC Fig. 2.  An estimation of the background signal resulted from averaging over all frames.



PIC Fig. 3.  Four realizations of the noise (moving object) at different frames.



PIC Fig. 4.  One realization of the noise after quantization.



PIC Fig. 5.  Four sample frames of the video. In other words, these are four noisy measurements of the background (which is a 2D signal).


####B. Kalman Filter

In this section we describe the formulation and system model for Kalman filter. Intutitively, Kalman filter takes the current state of your system, and makes a prediction based on the current state and current uncertainty of our measurements, and make a prediction for the next state of the system with anuncertainty. Then, it compares its prediction with the received input and correct it self upon the error. First we need to define our state for the Kalman filter. We want to predict the position of a moving object based on the current information of the object. For simplicity we assume a constant velocity model for our problem. The dynamics of a moving object in one-dimension can be described as follows:

xt = 1aT 2 + vt−1T + xt−1 2 (8)
vt = aT + vt−1 (9)

Where xt and vt denotes the position and velocity at time t, and a denotes the acceleration. So, the dynamics of a moving object in one-dimension can be modeled by the position and the first derivation of it. Without losing the generality, we can extend the one-dimensional case to a 2D object and conclude that the dynamics of a two-dimensional object can be described by x, y, , and . We define the state Xt with the following variables of interest:

 ⌊ ⌋ xt ||yt|| Xt = ⌈x˙t⌉ y˙t (10)

Next, we need to see what is the expected behaviour of our variables when we are going from one state to another. Based on Eq. (8) and (9), we define the following behaviour for the system variables:

xt = xt1 + t1T + 1 2aT2

yt = yt1 + t1T + 1 2aT2

t = t1T + aT

t = t1T + aT

So, the following model can be used to define the state transition:

⌊xt⌋ ⌊1 0 T 0⌋ ⌊xt−1⌋ ⌊1T2⌋ |yt| |0 1 0 T| |yt−1| |21T2| |⌈x˙t|⌉ = |⌈0 0 1 0|⌉ |⌈x˙t−1|⌉ + |⌈2T |⌉ .a + Wt −1 y˙t 0 0 0 1 y˙t−1 T (11)

We can formulate (11) as follows:

Xt = AXt−1 + But−1 (12)

Where But1 can be seen as the noise(or external force on the acceleration). In this project, we are observing the position of the moving object. Therefore, we define the following measurement matrix H:

 [1 0 0 0] H = 0 1 0 0 (13)

The measurement matrix is:

 ⌊ xt⌋ [xt] [1 0 0 0]| yt| yt = 0 1 0 0 |⌈ ˙xt|⌉ + Vt ˙yt (14)

Where V = [𝒩(012),𝒩(022)]T is the measurement noise. Basically, Kalman filter has three noise covariance matrices:

  • Dynamic Noise: During transition from one state to another, the system can be disturbed by an external force and add noise to the system. An external force can be modeled as a disturbance to the object acceleration in our problem. It contributes to the prediction of the next error covarinace matrix.
  • Measurement Noise: All of our sensors are prone to noise and consequently will lead to a corruption of our measurements. We refer to this disturbance as the Measurement Noise.
  • Covariance of State Variables

Assuming that the state variables are independent, we initialize the covariance matrix of state variables as follows. Please note that we can also consider this matrix as posteriori error covariance matrix.

 ⌊σ2 0 0 0 ⌋ | x0 σ2 0 0 | St = |⌈ 0 y0 σ2 0 |⌉ 0 0 ˙x0 σ2 ˙y (15)

Also, we further assume that the measurement noises are independent, then the covariance matrix of V can be described as:

 [σ2 0] cov(V ) = R = 10 σ2 2 (16)

Finally we need to define the covariance matrix of dynamic noise. As it was described earlier, this noise represents the disturbance during transition from one state to another. It can be written as:

 ⌊ σ2 0 σ 0⌋ | 0x σ2 x0˙x σ | Q = |⌈σ 0y σ2 y0˙y|⌉ x0x˙ σ x0˙ σ2 y˙y ˙y (17)

From (11), we can define Q as:

 ⌊1 4 1 3 ⌋ |4T 104 2T 10 3| Q = |⌈10 3 4T 02 2T |⌉ 2T 103 T 02 0 2T 0 T (18)

We assume that our original tracker (section II.a) is used as the input to the Kalman filter. We define the input vector as:

 [ˆxt] Yt = ˆyt (19)

We defined all of the required matrices for Kalman filter. Now we can use the Kalman filter based on the following algorithm to predict the position of the moving object based on our original tracker (section II.a) as the input to the filter. Kalman filter, has two stages: prediction and correction :

Prediction :
Xt = AXt1 + Bu
St = ASt1AT + Q
Correction :
Kt1 = St1HT(HS t1HT + R)1
Xt+1 = Xt + Kt1(Y t HXt)
St+1 = (I KtH)St

###III. Results

In order to observe the behaviour of Kalman filter under different circumstances, we considered three different cases to examine the Kalman filter in object tracking. In the following subsections, we examine each of these cases.

####A. Scenario 1: Prediction

The first scenario is the case that we are sensing the position of the object every 3 frames and we want to have a good prediction of the position of moving object based on these samples. Fig. 6, illustrates the result in four different frames. The yellow circle is our main tracker(which is used as the input to the Kalman filter every 3 frames) and the black circle is the prediction of Kalman filter. It can be observed that the Kalman filter is tracking the moving object with a very good accuracy.


PIC Fig. 6.  Scenario 1 in which the Kalman filter tracks the moving object when it is feeded every three samples.


####B. Scenario 2: Prediction In The Presence of Noise

In this scenario, we add a large noise to the input of the Kalman filter. It turns out that the Kalman filter is more robust to the noise than the original tracker. So, if we have our measurements aren corrupted by noise, one can use the Kalman filter to obtain a better estimation than each of the sensors (data fusion) because this algorithm is an adaptive filter and is more robust to the noise than each of the sensors. Fig. 7, illustrates this scenario. It can be seen that, the yellow circle is jumping around and is far from the object. However, the Kalman filter has a better estimation of the position. Please note that, a low gain will smooth out the noise but also lowers the speed of Kalman filter (it will detect the changes more slowly).


PIC Fig. 7.  Scenario 2 in which the Kalman filter tracks the moving object in the presence of a large noise.


####C. Scenario 3: Blind Prediction

In this case, we let the Kalman filter to learn for half of the frames and then we did not update the input for the filter. In (10) we defined the dynamic of the system for the constant velocity object. That is, we are not capturing the acceleration of the system. So, we should expect that the Kalman filter can not track the trajectory of the ball because the object is under the gravity and has a negative vertical acceleration. If we want to track the trajectory of the without the input, we must use a more complex system model as follows:

 ⌊x ⌋ | y| ||x˙|| X = || ˙y|| |⌈x¨|⌉ ¨y (20)

Fig. 8 provides the result of this scenario. As you can see, Kalman filter is not able to track the moving object after cutting the input and it tracks a linear path after that.


PIC Fig. 8.  Scenario 3 in which the Kalman filter blindly track the moving object.


###IV. Conclusion

In this project we designed a Kalman filter to track a moving object in a video. In fact, as it was mentioned earlier, a moving object in a video can be seen as a noise to the background scene. So, this project was simply a noise detection based on Kalman filter. The same approach can be used to estimate and cancel out the noise of other signals. As we saw in the scenario 1 and 2, Kalman filter can be used whenever we need to predict the next state of a system based on some noisy measurements. Also, it can be used for sensor fusion as well. It must be mention that this algorithm is defined for linear systems(we used linear algebra). In the case if nonlinear systems, the extended Kalman filter (EKF) which is a nonlinear version of Kalman filter can be used.

About

2D Object Tracking Using Kalman filter


Languages

Language:MATLAB 100.0%