EE3901/EE5901 Sensor Technologies
Week 3 Notes
Introduction to the Kalman filter

Profile picture
College of Science and Engineering, James Cook University
Last updated: 21 March 2023

This week we will study an algorithm called the Kalman filter. The Kalman filter is a widely used method to combine information from multiple sensors, especially in dynamic systems where the measurands are continuously changing. The method is named after Rudolf E Kalman who published a key paper in 1960, but similar ideas were also developed by others at around the same time. It became famous because it was used in the Apollo moon landing program. Today, it sees applications in many fields: navigation, aerospace, object tracking, control systems, consumer electronics, and more.

Building your intuition

The essence of the Kalman filter is summarised in Figure 1. In this example, the variable being measured is the position of a robot. The position cannot be known perfectly, so the Kalman filter represents its belief as a Gaussian probability distribution. The spread of the probability distribution indicates the uncertainty in the position.

As shown in the figure, the essence of the Kalman filter is to merge together information from a physics-based model (the prediction step) with information from the real world (the measurement step). This information is merged in such a way that the overall uncertainty is minimised.

Figure 1: A Kalman filter illustrated through the example of a little robot that needs to know its position in the world. (a) Prediction step: The robot predicts where it should be, based on its knowledge of its speed, etc. (b) Measurement step: The robot uses its sensors to measure its location. (c) Update step: The Kalman filter algorithm combines the prediction and the measurement to produce a high quality estimate of position. Zoom:

Why use a prediction step?

You may be tempted to simply rely upon measurements alone. However, there is much to be gained by also incorporating knowledge of the physics of your system. As a human, you can use your intuition to recognise when an apparent measurement doesn’t “seem right”. If you disbelieve a single measurement, your uncertainty may increase, but you won’t necessarily be fooled by measurement noise. The prediction step in the Kalman filter allows a computer system to do a similar thing.

Notice in Figure 1 that the uncertainty in the new position is lower than the uncertainty in the measurement. This is true in general. A Kalman filter produces a better result than simply using the raw measurements alone.

The Kalman filter, by example

We will teach the Kalman filter by starting from a simple example. Our example problem is that of a robot moving along a one dimensional train track. The robot has 2 measurands: position pp and velocity vv. However, the position and velocity are measured by noisy sensors.

The position and velocity can change over time, hence the robot must periodically obtain new measurements in order to keep up. Let the time interval between measurements be Δt\Delta t. We will call each of these intervals a “time step”. Time steps will be counted using integer values kk, where the elapsed time t=kΔtt = k \Delta t. It is convenient in software to count in integer steps.

At time step kk, we write our best estimate of the state of the robot in a vector

x^k=(p^kv^k),\begin{equation} \hat{\boldsymbol{x}}_{k}=\left(\begin{array}{c} \hat{p}_k\\ \hat{v}_k \end{array}\right), \end{equation}

where p^k\hat{p}_k and v^k\hat{v}_k are the estimated position and velocity at the kthk^\text{th} time step, respectively. The hat indicates that these are estimates.

Equation (1) is the state vector of the robot. It is a vector that contains all the variables that we wish to measure. The Kalman filter is an algorithm that allows us to calculate this state vector.

We will also need to define the covariance of the state vector, which we will call PP:

Pk=cov(x^kxk)=[σp2ρpvσpσvρpvσpσvσv2],\begin{equation} P_{k} = \text{cov}\left(\hat{\boldsymbol{x}}_k - \boldsymbol{x}_k\right) =\left[\begin{array}{cc} \sigma_{p}^{2} & \rho_{pv}\sigma_{p}\sigma_{v}\\ \rho_{pv}\sigma_{p}\sigma_{v} & \sigma_{v}^{2} \end{array}\right], \end{equation}

where σp\sigma_p is the standard deviation of position, σv\sigma_v is the standard deviation of velocity, and ρpv\rho_{pv} is the correlation between position and velocity. Recall that the standard deviation refers to our uncertainty in the position and velocity. If the standard deviation is small, then it means that we have high confidence. If the standard deviation is large, then we have low confidence.

Self check quiz 3.1

The estimation error is the difference between x^k\hat{\boldsymbol{x}}_k and the true state xk\boldsymbol{x}_k. Equation (2) states that PkP_k is the covariance of the estimation error. If the filter is working correctly and all assumptions are met, what would you predict to be the expected value of the estimation error?

Answer: (b)

The prediction step

Given the previous state estimate x^k1\hat{\boldsymbol{x}}_{k-1} can you predict the new state estimate x^k\hat{\boldsymbol{x}}_{k}?

Of course you can: there is a relationship between position and velocity!

pk=pk1 + Δtvk1vk=vk1\begin{alignat}{2} p_{k} & =p_{k-1} & \ +\ \Delta t\,v_{k-1}\\ v_{k} & = & v_{k-1} \end{alignat}

In words, this says that the new position is the old position plus how far we travel during this time step.For velocity, we assume that the new velocity is equal to the old velocity. Of course if you have further knowledge about the physics of the robot, then you could amend these equations. For instance, if you know about the drag or friction forces, you may have a better estimate for the new velocity. However, for now, let’s assume the simplest model where the velocity tends to remain constant. It is fine if the predictive model is not perfect, because the measurement will be used to correct for any small errors.

We can write Eqs. (3) and (4) as a matrix:

x^kk1=[1Δt01]x^k1=Fkx^k1.\begin{align} \hat{\boldsymbol{x}}_{k|k-1} & =\left[\begin{array}{cc} 1 & \Delta t\\ 0 & 1 \end{array}\right]\hat{\boldsymbol{x}}_{k-1}\\ & =F_{k}\hat{\boldsymbol{x}}_{k-1}. \end{align}

Here FkF_{k} is called the prediction matrix. It gives a mathematical model of the robot’s dynamics. The notation kk1k|k-1 means “the estimate at time step kk given information from time step k1k-1.”

To predict the new covariance, we use the variance propagation law from last week on Eq. (6). The covariance Pk1P_{k-1} must be propagated through the calculation. The result will be JPk1JTJ P_{k-1} J^T where JJ is the Jacobian of our update step (Fkx^k1F_k \hat{\boldsymbol{x}}_{k-1}). The Jacobian of a simple matrix multiplication is just the matrix itself, i.e. the Jacobian of Fkx^k1F_k \hat{\boldsymbol{x}}_{k-1} is simply FkF_k. Therefore, we are ready to write down our first version of the prediction step of the Kalman filter:

x^kk1=Fkx^k1Pkk1=FkPk1FkT.\begin{align} \hat{\boldsymbol{x}}_{k|k-1} & =F_{k}\hat{\boldsymbol{x}}_{k-1}\\ P_{k|k-1} & =F_{k}P_{k-1}F_{k}^{T}. \end{align}

To recap our progress so far: we have written down a physics-based model of the robot, and used that to write some equations for running our state vector and covariance forward in time.

Self check quiz 3.2

Suppose you have a system that evolves according to the equations:

a(t+Δt)=a(t)+0.2b(t)b(t+Δt)=b(t)0.1b(t),\begin{align*} a(t+\Delta t) & =a(t)+0.2b(t)\\ b(t+\Delta t) & =b(t)-0.1b(t), \end{align*}

where Δt\Delta t is the length of a time step, and the state variables at time tt are a(t)a(t) and b(t)b(t).

If the state vector is

x^=[a^b^],\hat{\boldsymbol{x}}=\left[\begin{array}{c} \hat{a}\\ \hat{b} \end{array}\right],

then what is the prediction matrix FF?

Answer: (d)

Control inputs

Our current model says that velocity remains unchanged, but this is not always accurate. For example, the robot might be given a command to accelerate. Our software can see the acceleration command, so we should include that information in our model.

Suppose that our motor controller is generating an acceleration aa. We can amend our prediction model to read:

pk=pk1 + Δtvk1 +12a(Δt)2vk=vk1 +aΔt.\begin{alignat}{2} p_{k} & =p_{k-1} & \ +\ \Delta t\,v_{k-1} & \ +\frac{1}{2}a\left(\Delta t\right)^{2}\\ v_{k} & = & v_{k-1} & \ +a\,\Delta t. \end{alignat}

The last term in Eq. (9) represents the change in displacement due to constant acceleration, while the last term in Eq. (10) represents the change in velocity due to the acceleration.

In general there could be multiple control inputs. Hence we introduce a control vector uk\boldsymbol{u}_{k} which contains the control inputs at timestep kk. Our little robot only has one control input, so in our case uk=a\boldsymbol{u}_{k} = a.

This enables us to convert Eq. (9) and (10) to matrix form:

x^kk1=[1Δt01]x^k1+[12(Δt)2Δt]a=Fkx^k1+Bkuk,\begin{align} \hat{\boldsymbol{x}}_{k|k-1} & =\left[\begin{array}{cc} 1 & \Delta t\\ 0 & 1 \end{array}\right]\hat{\boldsymbol{x}}_{k-1}+\left[\begin{array}{c} \frac{1}{2}\left(\Delta t\right)^{2}\\ \Delta t \end{array}\right]a\\ & =F_{k}\hat{\boldsymbol{x}}_{k-1}+B_{k}\boldsymbol{u}_{k}, \end{align}

where BkB_{k} is called the control matrix.

As an aside: if you have studied modern control theory, then you should recognise Eq. (12) as the state-space representation of the system being measured.

Self check quiz 3.3

What is the difference between the state vector xk\boldsymbol{x}_k and the control vector uk\boldsymbol{u}_k? Select all that apply.

Answer: (a), (b), (c)

Process uncertainty

What if our model is inaccurate? What if there are other random perturbations that we can’t predict, e.g. the robot being pushed by wind or its wheels slipping a bit?

Add some process covariance QkQ_{k} to represent these unknown factors.

Pkk1=FkPk1FkT+Qk.\begin{equation} P_{k|k-1}=F_{k}P_{k-1}F_{k}^{T}+Q_{k}. \end{equation}

The process covariance QkQ_{k} represents the “unknown unknowns” in the model. It tells the Kalman filter that the prediction model is never perfect. The model uncertainty grows bigger over time as we add QkQ_k onto the process covariance every single timestep.

Intuitively, this is like walking around blind-folded: in the beginning you know where you are pretty well (small PkP_k), but as more time goes by your uncertainty gets bigger. The role of the process uncertainty is to represent this trend of increasing uncertainty over time.

The only way to reduce the uncertainty is by making a measurement. We will see below how measurements allow us to reduce the state covariance.

Self check quiz 3.4

Suppose that the control vector uk\boldsymbol{u}_k is subject to some uncertainty, described by a covariance WkW_k. Use the variance propagation law (week 2) to determine the resulting process covariance. Write down the state covariance update step, assuming that there are no other sources of process uncertainty.

Hint: given a function of the form

x^kk1=Fkx^k1+b(uk),\hat{\boldsymbol{x}}_{k|k-1} = F_{k}\hat{\boldsymbol{x}}_{k-1}+b(\boldsymbol{u_k}),

where

b(uk)=Bkuk,b(\boldsymbol{u}_{k}) = B_k\boldsymbol{u}_{k},

propagate WkW_k, which is the covariance of uk\boldsymbol{u}_{k}, through the function bb to find the resulting contribution to the process covariance.

Answer: (c)

Summary of the prediction step

Self check quiz 3.5

Question 1 of 10

Select the item from the list below that best describes the yellow highlighted part of the equation.

Questions: 🟨 ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜

Answer: Page 1: f; Page 2: d; Page 3: h; Page 4: b; Page 5: c; Page 6: a; Page 7: e; Page 8: i; Page 9: g

Measurement predictions

Suppose our robot has a GNSS (global navigation satellite system) receiver giving position, an infrared rangefinder giving position, and a wheel encoder giving velocity. Note, here we have 3 sensors, but there are only 2 elements in state vector. Suppose GNSS coordinates are converted to metres but the rangefinder results in are mm.

We can predict what the measurements would be if our state estimate were correct:

zpredicted(k)=(GNSS distancerangefinder distancewheel encoder velocity)=(pk1000pkvk)=[101000001](pkvk)=Hkx^kk1,\begin{align} \boldsymbol{z}_{predicted(k)} & =\left(\begin{array}{c} \text{GNSS distance}\\ \text{rangefinder distance}\\ \text{wheel encoder velocity} \end{array}\right)\\ & =\left(\begin{array}{c} p_k\\ 1000p_k\\ v_k \end{array}\right)\\ & =\left[\begin{array}{cc} 1 & 0\\ 1000 & 0\\ 0 & 1 \end{array}\right]\left(\begin{array}{c} p_k\\ v_k \end{array}\right)\\ & =H_{k}\hat{\boldsymbol{x}}_{k|k-1}, \end{align}

where HH is called the measurement matrix.

This may seem “backwards” to you. We predict the measurements based on the state, instead of the other way around. This is a critical feature of the Kalman filter!

There does not need to be a unique 1:1 relationship between elements in the state vector and the sensors we use. We can have multiple sensors that each measure some function of the state. Mathematically, the measurement matrix HH does not need to be invertible. It doesn’t even need to be square.

We also have a predicted measurement covariance

Σpredicted=Hkx^kk1HkT.\Sigma_{predicted}=H_{k}\hat{\boldsymbol{x}}_{k|k-1}H_{k}^{T}.

Self check quiz 3.6

Select all the true statements regarding the measurement matrix HkH_k.

Answer: (a), (c), (d)

The measurement residual

Above we used our model of the system to predict the measurements that we should get, if our state estimation were perfect. However, in reality, the state estimation is probably not perfect.

Let’s finally turn on the robot’s sensors and take some measurements. Suppose the sensors provide some measurements zk\boldsymbol{z}_{k} with covariance RkR_{k}. The difference between the true measurement and the predicted measurement is called the residual:

yk=zkHkx^kk1.\begin{equation} \boldsymbol{y}_{k}=\boldsymbol{z}_{k}-H_{k}\hat{\boldsymbol{x}}_{k|k-1}. \end{equation}

There is also a covariance in the residual

Sk=HkPkk1HkT+Rk.\begin{equation} S_{k}=H_{k}P_{k|k-1}H_{k}^{T}+R_{k}. \end{equation}

If the prediction and measurement were perfect then yk\boldsymbol{y}_{k} would be zero. Any differences indicate new information that will be used to update the state.

Update step

Finally we are ready to bring it all together. We have:

  1. A predicted state x^kk1\hat{\boldsymbol{x}}_{k|k-1}, which we calculated based on the physics of our robot.
  2. A measurement residual, which tells us how much agreement there is between predicted state and the actual sensor readings.

The state estimation will be calculated as follows:

x^k=x^kk1+Kkyk\begin{equation} \hat{\boldsymbol{x}}_{k}=\hat{\boldsymbol{x}}_{k|k-1}+K_{k}\boldsymbol{y}_{k} \end{equation}

where KkK_{k} is a matrix called the “Kalman gain”. The clever part of the Kalman filter is how to calculate this matrix. Its job is to convert from a measurement residual into a state update.

The proof of the Kalman gain KkK_k is optional for students in this class. If you are happy to simply accept the result, it is:

Kk=Pkk1HkTSk1.\begin{equation} K_{k}=P_{k|k-1}H_{k}^{T}S_{k}^{-1}. \end{equation}

There is one final piece, which is the equation used to produce the final updated state covariance:

Pk=(IKkHk)Pkk1.\begin{equation} P_{k} =\left(I-K_{k}H_{k}\right)P_{k|k-1}. \end{equation}

The proof of Equations (21) and (22) is available by pressing the button below, if you are interested. The essence of the proof is to minimise the sum of the squares of the error in the state estimate. Hence, the Kalman filter is an optimal estimator in the sense that it minimises this sum of squares.

Summary of the update step

Self check quiz 3.7

Question 1 of 12

Select the item from the list below that best describes the yellow highlighted part of the equation.

Questions: 🟨 ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜

Answer: Page 1: f; Page 2: h; Page 3: i; Page 4: b; Page 5: g; Page 6: k; Page 7: c; Page 8: e; Page 9: a; Page 10: j; Page 11: d

Example implementation

The outline of a Matlab implementation is shown below.

% Run this code for every time step:

% Prediction step
xhat_prediction = F * xhat;
P_prediction = F * P * F' + Q;

% Get a measurement
z = read_sensors(); % obtain latest sensor data

% Run the update step
y = z - H*xhat_prediction; % measurement residual
S = H*P*H' + R; % measurement residual covariance
K = P * H' / S; % Kalman gain

% Calculate the new state
xhat = xhat_prediction + K*y; % new state vector
P = (eye(2) - K*H)*P_prediction; % new covariance

Click here to download the complete Matlab source code. The output from this program is shown in Figure 2. Notice how the Kalman filter produces a high quality position estimate despite the extremely noisy sensors.

Figure 2: The implemented Kalman filter for the scenario developed in these notes. Zoom:

Implementation hints

  • To design a Kalman filter you need to determine the system matrices FkF_{k}, BkB_{k}, HkH_{k}, etc. Often these are independent of time, in which case the kk subscript can be dropped.
  • Once all those matrices have been determined, you end up with a system of equations that you can simply enter into a computer.
  • The Kalman gain is written as if it contains a matrix inverse, but in Matlab, you should not use inv(S) to calculate it. You should use the mrdivide (/) operator instead. This is because explicit matrix inverses are always a bad idea.
  • In some cases you might have measurements arriving at different rates, for instance, a GNSS receiver may be quite slow in comparison to an on-board accelerometer. In this scenario, there are several options:
    1. Use some interpolation scheme to fill in the gaps between slow sensors, to pretend that you have data from all sensors at every time step; or
    2. Write a different measurement matrix for the scenario where only some of the sensors are reporting data. You can still perform the update step using partial sensor data.
  • Often the process covariance is not obvious. There are several strategies that you can consider.
    • If your system is tracking position and velocity, then the main source of error will be accelerations (changes in velocity) that you did not predict. You could treat unknown accelerations as a random variable with expected value of 0 but some assumed or measured variance. You could then propagate this variance through to the state vector update, in much the same way as you did in Self-check Quiz 3.4.
    • If you truly have no idea about what are the sources of error, a plausible starting point is a diagonal matrix:
      Q=[σx12000σx22000σx32].Q=\left[\begin{array}{cccc}\sigma_{x_{1}}^{2} & 0 & 0 & \cdots\\ 0 & \sigma_{x_{2}}^{2} & 0 & \cdots\\ 0 & 0 & \sigma_{x_{3}}^{2} & \cdots\\ \vdots & \vdots & \vdots & \ddots \end{array}\right].
      Then you can use a trial-and-error approach to adjust the variances σxn2\sigma_{x_{n}}^{2} until the filter achieves good performance. You might consider choosing each σxn\sigma_{x_{n}} to be proportional to Δt\Delta t and consider how much standard deviation you expect to accumulate per second.

Conclusion

The Kalman filter is a method of estimating the state of a system in the presence of sensor noise. It uses a predictive model of the system’s dynamics as well as measurement data to produce a high quality result.

References

Clarence W. de Silva, Sensor Systems: Fundamental and Applications, CRC Press, 2017, Chapter 7.