EE3901/EE5901 Sensor Technologies

Chapter 2 Notes
The Kalman filter

Written by A/Prof Bronson Philippa and Laurance Papale

College of Science and Engineering, James Cook University

Last updated 20 February 2025

In this chapter we will study an algorithm called the Kalman filter. The Kalman filter is a widely used method to combine information from multiple sensors, especially in dynamic systems where the measurands are continuously changing. The method is named after Rudolf E Kalman who published a key paper in 1960, but similar ideas were also developed by others at around the same time. It became famous because it was used in the Apollo moon landing program. Today, it sees applications in many fields: navigation, aerospace, object tracking, control systems, consumer electronics, and more.

Building your intuition

The essence of the Kalman filter is summarised in Figure 1. In this example, the variable being measured is the position of a robot. The position cannot be known perfectly, so the Kalman filter represents its belief as a Gaussian probability distribution. The spread of the probability distribution indicates the uncertainty in the position.

As shown in the figure, the essence of the Kalman filter is to merge together information from a physics-based model (the prediction step) with information from the real world (the measurement step). This information is merged in such a way that the overall uncertainty is minimised.

Figure 1:
A Kalman filter illustrated through the example of a little robot that needs to know its position in the world. **(a)** Prediction step: The robot predicts where it should be, based on its knowledge of its speed, etc. **(b)** Measurement step: The robot uses its sensors to measure its location. **(c)** Update step: The Kalman filter algorithm combines the prediction and the measurement to produce a high quality estimate of position.
Zoom:

Why use a prediction step?

You may be tempted to simply rely upon measurements alone. However, there is much to be gained by also incorporating knowledge of the physics of your system. As a human, you can use your intuition to recognise when an apparent measurement doesn’t “seem right”. If you disbelieve a single measurement, your uncertainty may increase, but you won’t necessarily be fooled by measurement noise. The prediction step in the Kalman filter allows a computer system to do a similar thing.

Notice in Figure 1 that the uncertainty in the new position is lower than the uncertainty in the measurement. This is true in general. A Kalman filter produces a better result than simply using the raw measurements alone.

The Kalman filter, by example

We will teach the Kalman filter by starting from a simple example. Our example problem is that of a robot moving along a one dimensional train track. The robot has 2 measurands: position and velocity . However, the position and velocity are measured by noisy sensors.

The position and velocity can change over time, hence the robot must periodically obtain new measurements in order to keep up. Let the time interval between measurements be . We will call each of these intervals a “time step”. Time steps will be counted using integer values , where the elapsed time . It is convenient in software to count in integer steps.

At time step , we write our best estimate of the state of the robot in a vector

where and are the estimated position and velocity at the time step, respectively. The hat indicates that these are estimates.

Equation is the state vector of the robot. It is a vector that contains all the variables that we wish to measure. The Kalman filter is an algorithm that allows us to calculate this state vector.

We will also need to define the covariance of the state vector, which we will call :

where is the standard deviation of position, is the standard deviation of velocity, and is the correlation between position and velocity. Recall that the standard deviation refers to our uncertainty in the position and velocity. If the standard deviation is small, then it means that we have high confidence. If the standard deviation is large, then we have low confidence.

The prediction step

Given the previous state estimate can you predict the new state estimate ?

Of course you can: there is a relationship between position and velocity!

In words, this says that the new position is the old position plus how far we travel during this time step.For velocity, we assume that the new velocity is equal to the old velocity. Of course if you have further knowledge about the physics of the robot, then you could amend these equations. For instance, if you know about the drag or friction forces, you may have a better estimate for the new velocity. However, for now, let’s assume the simplest model where the velocity tends to remain constant. It is fine if the predictive model is not perfect, because the measurement will be used to correct for any small errors.

We can write Eqs. and as a matrix:

Here is called the prediction matrix. It gives a mathematical model of the robot’s dynamics. The notation means “the estimate at time step given information from time step .”

To predict the new covariance, we use the variance propagation law from last week on Eq. . The covariance must be propagated through the calculation. The result will be where is the Jacobian of our update step (). The Jacobian of a simple matrix multiplication is just the matrix itself, i.e. the Jacobian of is simply . Therefore, we are ready to write down our first version of the prediction step of the Kalman filter:

To recap our progress so far: we have written down a physics-based model of the robot, and used that to write some equations for running our state vector and covariance forward in time.

Control inputs

Our current model says that velocity remains unchanged, but this is not always accurate. For example, the robot might be given a command to accelerate. Our software can see the acceleration command, so we should include that information in our model.

Suppose that our motor controller is generating an acceleration . We can amend our prediction model to read:

The last term in Eq. represents the change in displacement due to constant acceleration, while the last term in Eq. represents the change in velocity due to the acceleration.

In general there could be multiple control inputs. Hence we introduce a control vector which contains the control inputs at timestep . Our little robot only has one control input, so in our case .

This enables us to convert Eq. and to matrix form:

where is called the control matrix.

As an aside: if you have studied modern control theory, then you should recognise Eq. as the state-space representation of the system being measured.

Process uncertainty

What if our model is inaccurate? What if there are other random perturbations that we can’t predict, e.g. the robot being pushed by wind or its wheels slipping a bit?

Add some process covariance to represent these unknown factors.

The process covariance represents the “unknown unknowns” in the model. It tells the Kalman filter that the prediction model is never perfect. The model uncertainty grows bigger over time as we add onto the process covariance every single timestep.

Intuitively, this is like walking around blind-folded: in the beginning you know where you are pretty well (small ), but as more time goes by your uncertainty gets bigger. The role of the process uncertainty is to represent this trend of increasing uncertainty over time.

The only way to reduce the uncertainty is by making a measurement. We will see below how measurements allow us to reduce the state covariance.

Summary of the prediction step

Measurement predictions

Suppose our robot has a GNSS (global navigation satellite system) receiver giving position, an infrared rangefinder giving position, and a wheel encoder giving velocity. Note, here we have 3 sensors, but there are only 2 elements in state vector. Suppose GNSS coordinates are converted to metres but the rangefinder results in are mm.

We can predict what the measurements would be if our state estimate were correct:

where is called the measurement matrix.

This may seem “backwards” to you. We predict the measurements based on the state, instead of the other way around. This is a critical feature of the Kalman filter!

There does not need to be a unique 1:1 relationship between elements in the state vector and the sensors we use. We can have multiple sensors that each measure some function of the state. Mathematically, the measurement matrix does not need to be invertible. It doesn’t even need to be square.

We also have a predicted measurement covariance

The measurement residual

Above we used our model of the system to predict the measurements that we should get, if our state estimation were perfect. However, in reality, the state estimation is probably not perfect.

Let’s finally turn on the robot’s sensors and take some measurements. Suppose the sensors provide some measurements with covariance . The difference between the true measurement and the predicted measurement is called the residual:

There is also a covariance in the residual

If the prediction and measurement were perfect then would be zero. Any differences indicate new information that will be used to update the state.

Update step

Finally we are ready to bring it all together. We have:

A predicted state , which we calculated based on the physics of our robot.
A measurement residual, which tells us how much agreement there is between predicted state and the actual sensor readings.

The state estimation will be calculated as follows:

where is a matrix called the “Kalman gain”. The clever part of the Kalman filter is how to calculate this matrix. Its job is to convert from a measurement residual into a state update.

The proof of the Kalman gain is optional for students in this class. If you are happy to simply accept the result, it is:

There is one final piece, which is the equation used to produce the final updated state covariance:

The proof of Equations and is available by pressing the button below, if you are interested. The essence of the proof is to minimise the sum of the squares of the error in the state estimate. Hence, the Kalman filter is an optimal estimator in the sense that it minimises this sum of squares.

Show proof of Kalman gain matrix

Proof of the optimal Kalman gain matrix

We start with the definition of the state vector covariance

and substitute , rearrange, then substitute :

Here is the real measurement, which can be calculated by applying the measurement matrix to the true state and including some measurement noise term .

This gives

Since the measurement error is uncorrelated with the other terms, we can split the covariance as follows.

Now use the identity :

Recognise that and . Hence,

This gives an equation for the updated state covariance and is the first major result of this proof.

Next we need to find an expression for the Kalman gain . Since is a covariance matrix, notice that the terms on the diagonal are the variances of each variable in the state vector. We wish to minimise the sum of these diagonal elements. The sum of the diagonal elements is called the trace of a matrix. Specifically, we seek

This can be obtained using matrix calculus to solve for

Firstly, take the expression for and fully expand all terms

Notice that the term in parentheses is the measurement residual :

We can evaluate the derivative using the identities in The Matrix Cookbook by Petersen and Pedersen, section 2.5.

Proceeding term by term, we have

Above, we used the fact that the covariance matrices are symmetric.

Adding up the result, we have

From here it is simple to solve for

This is the second major result of this proof. We have found an expression for the “optimal” Kalman gain, where optimal is defined by minimising the sum of the squares of the error.

There is one result still be developed. Substituting into the last term of Eq. (A.2) we have

The last equation, which is valid only for the “optimal” Kalman gain given by Eq. (A.3), is the commonly used form for the state covariance update.

Summary of the update step

Example implementation

The outline of a Matlab implementation is shown below.

% Run this code for every time step:

% Prediction step
xhat_prediction = F * xhat;
P_prediction = F * P * F' + Q;

% Get a measurement
z = read_sensors(); % obtain latest sensor data

% Run the update step
y = z - H*xhat_prediction; % measurement residual
S = H*P*H' + R; % measurement residual covariance
K = P * H' / S; % Kalman gain

% Calculate the new state
xhat = xhat_prediction + K*y; % new state vector
P = (eye(2) - K*H)*P_prediction; % new covariance

Click here to download the complete Matlab source code. The output from this program is shown in Figure 2. Notice how the Kalman filter produces a high quality position estimate despite the extremely noisy sensors.

Figure 2:
The implemented Kalman filter for the scenario developed in these notes.
Zoom:

Implementation hints

To design a Kalman filter you need to determine the system matrices , , , etc. Often these are independent of time, in which case the subscript can be dropped.
Once all those matrices have been determined, you end up with a system of equations that you can simply enter into a computer.
The Kalman gain is written as if it contains a matrix inverse, but in Matlab, you should not use inv(S) to calculate it. You should use the mrdivide (/) operator instead. This is because explicit matrix inverses are always a bad idea.
In some cases you might have measurements arriving at different rates, for instance, a GNSS receiver may be quite slow in comparison to an on-board accelerometer. In this scenario, there are several options:
1. Use some interpolation scheme to fill in the gaps between slow sensors, to pretend that you have data from all sensors at every time step; or
2. Write a different measurement matrix for the scenario where only some of the sensors are reporting data. You can still perform the update step using partial sensor data.
Often the process covariance is not obvious. There are several strategies that you can consider.
- If your system is tracking position and velocity, then the main source of error will be accelerations (changes in velocity) that you did not predict. You could treat unknown accelerations as a random variable with expected value of 0 but some assumed or measured variance. You could then propagate this variance through to the state vector update, in much the same way as you did in Self-check Quiz 3.4.
- If you truly have no idea about what are the sources of error, a plausible starting point is a diagonal matrix:
  Then you can use a trial-and-error approach to adjust the variances until the filter achieves good performance. You might consider choosing each to be proportional to and consider how much standard deviation you expect to accumulate per second.

Recap of the Kalman filter

Let’s remind ourselves of the structure of the Kalman filter, so we can consider how to extend it for the case of non-linear systems.

To design a Kalman filter, you must determine:

Which variables go into the state vector .
Which variables go into the control vector .
Which variables go into the measurement vector .
The prediction matrix , which describes how the state vector tends to evolve over time.
The control matrix , which describes how the state vector is perturbed by control inputs.
The process covariance , which describes the growth in the uncertainty in the predicted state.
The measurement matrix , which gives the mapping from state vector to measurements.
The measurement covariance , which is the covariance in the measurements.

The Extended Kalman Filter (EKF)

Motivation

The classical Kalman filter uses a prediction model of the form

where is a matrix of numbers.

However, what if the system dynamics cannot be expressed as a matrix of numbers? Suppose instead we have some arbitrary function

How can we develop a Kalman filter for such a system?

Deriving the EKF

The key insight is as follows. There is no particular reason that the prediction step must be matrix multiplication. If we know that some arbitrary function describes the time evolution of the system, then we should just use it directly!

The EKF modifies the prediction step to read:

where is a function that steps the state vector forward in time, and is a function that calculates the influence of control inputs.

Similarly the measurement step can be updated to read:

where is a function that calculates the expected measurements based on the given state vector.

A further insight: the state covariance and measurement covariance can be calculated using the error propagation laws. Recall from week 2 that we have where is the Jacobian of .

We will use the following notation. Let be the Jacobian of and let be the Jacobian of ). Then using the variance propagation law, the predicted state covariance is

which is the same equation as for the linear Kalman filter, except that is a Jacobian matrix that will change at every timestep. Note that we assume zero covariance in the control inputs . If you believe the control inputs or control model have uncertainty then you must use the error propagation laws on the term as well.

Similarly the measurement residual covariance is:

Again the equation is the same, except that is a Jacobian matrix.

All other parts of the filter can remain the same.

Intuitively, the extended Kalman filter is simply finding a linear model that best matches the non-linear functions and . It is effectively the (multi-dimensional) “tangent to the curve” at the current state estimate.

The extended Kalman filter is not necessarily an optimal estimator, in the sense that the linear Kalman filter is mathematically proven to minimise the sum of squares of the error for linear systems. If the system state is linear then the EKF reduces to the normal Kalman filter, in which case it is optimal. For non-linear systems, the optimality cannot be guaranteed. However, in practice the EKF often works well.

Self-check quiz 2.8

What is the difference between a standard Kalman filter and the extended Kalman filter? Select all that apply.

(a)

If the prediction model and measurement model are both matrix multiplications, then the extended Kalman filter is the same as the Kalman filter.

(b)

You must switch to the extended Kalman filter when the prediction step is time-varying.

(c)

You must switch to the extended Kalman filter when the prediction step cannot be written as a matrix multiplication.

(d)

You must switch to the extended Kalman filter when the measurement model cannot be written as a matrix multiplication.

(e)

It is a requirement of the extended Kalman filter that the prediction step is differentiable.

✔️ Correct!

The standard Kalman filter can have different matrices for each time step, so time varying models are OK. The extended Kalman filter extends this to support any mathematical function (not just matrix multiplication), but the function must be differentiable because otherwise the Jacobian would not exist.

Example of an Extended Kalman Filter (EKF)

Suppose that you want to track the motion of a pendulum (Figure 3).

Figure 3:
A pendulum of mass is suspended on a rigid string of length . The force of gravity in the direction perpendicular to the string is indicated. The state of the pendulum is described by the angle as well as the angular velocity .
Zoom:

Define the state vector

By analysing the forces, we can write the differential equations that describe this state:

This is our physics model, which we will now use to develop an Extended Kalman Filter.

The prediction model

We must convert our system of continuous differential equations and into discrete time steps. There are many ways to do this. Here, we will use a method called “leapfrog integration”. Leapfrog integration is a very good scheme for classical mechanics problems, e.g. where the physics gives us an equation for acceleration () and the goal is to solve for velocity and displacement. The leapfrog integration scheme is as follows:

where is displacement (or angular displacement) and dots indicate time derivatives. In words, this method says that the new velocity should be computed based on the average of the old acceleration and the new acceleration.

Applying the leapfrog method to Eqs. -, we find:

This is our prediction model, which says how to step the state vector forward in time.

Recall that the EKF relies upon linearising the prediction model around the current operating point. In other words, we need the Jacobian of . However, the Jacobian would be quite messy to calculate by hand. Notice that you would need to substitute the first element of the vector into the second, and then apply the chain rule to get the derivatives.

To reduce the risk of a mistake, it is recommended that you use a computer algebra system to calculate the derivatives. The provided Matlab code shows you how to automatically find Jacobians.

The resulting Jacobian, copy-pasted from the provided Matlab code, is:

Measurement

Suppose that the measurement actually records the kinetic and potential energy, i.e. a non-linear function of the state.

where is the velocity of the pendulum and is the height of the pendulum above its lower-most point.

Again we need a Jacobian matrix

EKF example code

Refer to the attached Matlab script for a complete working example.

Implementation notes

A few points to note:

The Jacobian matrices and vary with each time step so you must re-calculate them inside your loop.
You will typically need to tune the process covariance to get good results. Essentially you are trading off how much the filter believes the prediction vs how much it believes the measurement. As with the linear Kalman filter, a diagonal matrix with standard deviation proportional to the time step is a plausible starting point in the absence of a more detailed error model.

The Unscented Kalman Filter (UKF)

The UKF is another method for estimating a non-linear process. Its key advantage is that it does not require the computation of Jacobians. Hence it can be applied to models that are non-differentiable (for example arbitrary computer code, or models containing with discontinuous functions where derivatives would be undefined).

The key idea is that instead of propagating just the state vector through the model, we also propagate some so-called “sigma points” around . These points are carefully chosen to sample the state covariance. You can think of them as exploring various possible states near the estimate, to test how the prediction model would behave on neighbouring states as well. This allows non-linearities in the model to be properly accounted for.

The Unscented Transform

The Unscented Transform solves this problem: given some calculation find the covariance without using a Jacobian matrix to approximate . Hence it can be applied to any function including arbitrary computer code.

The steps are as follows:

First define some parameters: let be a scaling factor of typical size (chosen empirically). Let be a factor that incorporates knowledge of the statistical distribution of the states, where is the optimal value for a Gaussian distribution. Let be the number of elements in the state vector.

Calculate

and

where is the Cholesky factorisation of the covariance matrix . The Cholesky factorisation is roughly akin to a matrix square root. It is implemented in Matlab with the chol function.

Next, expand the vector into a matrix

where performs addition/subtraction between the vector and each column in the matrix . (If you are using Matlab, you can simply write + and - because that is the behaviour of those operators.) The matrix has columns.

Each column of is one of the sigma points.

Transform each sigma point through the function to produce a new matrix , i.e.

where each column of is given by

If is arbitrary computer code, then write a for loop to process each column of separately and concatenate these columns to form .

We will construct the outputs and as a weighted sum over the columns of . For each column we define a mean weight and a covariance weight :

Form vectors by concatenating these weights

Note that other definitions of the weights are also possible. Different authors use different weights. A necessary constraint is that .

Then the expected value of the output is given by:

The covariance is given by

where is the elementwise product (i.e. the Matlab .* operator).

Applying the Unscented Transform to the Kalman filter

Prediction step

Given the previous state and covariance , compute the sigma points using the equations above and put these in the columns of the matrix .

Propagate the sigma points through the prediction model to obtain the matrix :

Following the Unscented Transform, use to calculate the predicted state and covariance .

Finally add in the process covariance to .

Measurement step

If the measurement is a linear function of the state, then you can use the normal Kalman filter from this point forwards.

If your measurement model is non-linear, then apply the Unscented Transform a second time to propagate the predicted state through the measurement model. Define a new set of sigma points based on the predicted state and predicted covariance. Then propagate these through the measurement model:

Reconstruct the predicted measurements

the measurement residual

and the measurement residual covariance

where is the measurement covariance.

Update step

If the measurement model is linear, use the standard Kalman filter update step. Otherwise, proceed as below.

The update step needs a slight adjustment because there is no measurement matrix to easily link measurement space and state space. Instead we can use the cross-correlation between state and measurement:

The Kalman gain in this case is

The updated state and covariance are given by

Example of an Unscented Kalman Filter (UKF)

Refer to the attached Matlab script for a complete working example.

Implementation notes

The initial state covariance cannot be 0 because Cholesky factorisation is only defined for positive definite matrices (those with positive eigenvalues). This is true for any legitimate covariance matrix. If the Cholesky factorisation throws an error then most likely your prediction model is diverging or you have made a mistake in the equations.
A major advantage of the UKF is the ability to handle sophisticated prediction models where derivatives are impossible or unreasonable to compute. For example, your prediction model could be an entire computer program. All you need to do is run each of the sigma points through your prediction model and you can thereby use it in your Kalman filter.

Conclusion

The Kalman filter is a method of estimating the state of a system in the presence of sensor noise. It uses a predictive model of the system’s dynamics as well as measurement data to produce a high quality result.

References

Clarence W. de Silva, Sensor Systems: Fundamental and Applications, CRC Press, 2017, Chapter 7.

Building your intuition

Why use a prediction step?

The Kalman filter, by example

Self check quiz 2.1

The prediction step

Self check quiz 2.2

Control inputs

Self check quiz 2.3

Process uncertainty

Self check quiz 2.4

Summary of the prediction step

Self check quiz 2.5

Question 1/9: 🟨 ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜

Measurement predictions

Self check quiz 2.6

The measurement residual

Update step

Proof of the optimal Kalman gain matrix

Summary of the update step

Self check quiz 2.7

Question 1/11: 🟨 ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜ ⬜

Example implementation

Implementation hints

Recap of the Kalman filter

The Extended Kalman Filter (EKF)

Motivation

Deriving the EKF

Self-check quiz 2.8

Example of an Extended Kalman Filter (EKF)

The prediction model

Measurement

EKF example code

Implementation notes

The Unscented Kalman Filter (UKF)

The Unscented Transform

Applying the Unscented Transform to the Kalman filter

Prediction step

Measurement step

Update step

Self-check quiz 2.9

Example of an Unscented Kalman Filter (UKF)

Implementation notes

Conclusion

References