EE3901/EE5901 Sensor Technologies
Week 4 Notes
Non-linear Kalman filters

Profile picture
College of Science and Engineering, James Cook University
Last updated: 24 January 2022

Last week we studied the Kalman filter, which is the optimal estimator for linear systems in the presence of Gaussian noise. This week we address the problem of non-linear systems.

Recap of the Kalman filter

Let’s remind ourselves of the structure of the Kalman filter, so we can consider how to extend it for the case of non-linear systems.

Predicted state:x^kk1=Fkx^k1+BkukPredicted covariance:Pkk1=FkPk1FkT+QkMeasurement residual:yk=zkHkx^kk1Measurement residual covariance:Sk=HkPkk1HkT+RkKalman gain:Kk=Pkk1HTSk1Updated state:x^k=x^kk1+KkykUpdated covariance:Pk=(IKkHk)Pkk1.\begin{alignat}{2} \text{Predicted state:} & \qquad & \hat{\boldsymbol{x}}_{k|k-1} & =F_{k}\hat{\boldsymbol{x}}_{k-1}+B_{k}\boldsymbol{u}_{k}\\ \text{Predicted covariance:} & & P_{k|k-1} & =F_{k}P_{k-1}F_{k}^{T}+Q_{k}\\ \text{Measurement residual:} & & \boldsymbol{y}_{k} & =\boldsymbol{z}_{k}-H_{k}\hat{\boldsymbol{x}}_{k|k-1}\\ \text{Measurement residual covariance:} & & S_{k} & =H_{k}P_{k|k-1}H_{k}^{T}+R_{k}\\ \text{Kalman gain:} & & K_{k} & =P_{k|k-1}H^{T}S_{k}^{-1}\\ \text{Updated state:} & & \hat{\boldsymbol{x}}_{k} & =\hat{\boldsymbol{x}}_{k|k-1}+K_{k}\boldsymbol{y}_{k}\\ \text{Updated covariance:} & & P_{k} & =\left(I-K_{k}H_{k}\right)P_{k|k-1}. \end{alignat}

To design a Kalman filter, you must determine:

  • Which variables go into the state vector xk\mathbf{x}_k.
  • Which variables go into the control vector uk\mathbf{u}_k.
  • Which variables go into the measurement vector zk\mathbf{z}_k.
  • The prediction matrix FkF_k, which describes how the state vector tends to evolve over time.
  • The control matrix BkB_k, which describes how the state vector is perturbed by control inputs.
  • The process covariance QkQ_k, which describes the growth in the uncertainty in the predicted state.
  • The measurement matrix HkH_k, which gives the mapping from state vector to measurements.
  • The measurement covariance RkR_k, which is the covariance in the measurements.

The Extended Kalman Filter (EKF)


The classical Kalman filter uses a prediction model of the form


where FkF_{k} is a matrix of numbers.

However, what if the system dynamics cannot be expressed as a matrix of numbers? Suppose instead we have some arbitrary function


How can we develop a Kalman filter for such a system?

Deriving the EKF

The key insight is as follows. There is no particular reason that the prediction step must be matrix multiplication. If we know that some arbitrary function f(x)f(\mathbf{x}) describes the time evolution of the system, then we should just use it directly!

The EKF modifies the prediction step to read:

EKF predicted state:x^kk1=f(x^k1)+b(uk),\begin{alignat}{2} \text{EKF predicted state:} & \qquad & \hat{\boldsymbol{x}}_{k|k-1} & =f\left(\hat{\boldsymbol{x}}_{k-1}\right)+b\left(\boldsymbol{u}_{k}\right), \end{alignat}

where f()f(\cdot) is a function that steps the state vector forward in time, and b()b(\cdot) is a function that calculates the influence of control inputs.

Similarly the measurement step can be updated to read:

EKF measurement residual:yk=zkh(x^kk1),\begin{alignat}{2} \text{EKF measurement residual:} & \qquad & \boldsymbol{y}_{k} & =\boldsymbol{z}_{k}-h\left(\hat{\boldsymbol{x}}_{k|k-1}\right), \end{alignat}

where h()h(\cdot) is a function that calculates the expected measurements based on the given state vector.

A further insight: the state covariance and measurement covariance can be calculated using the error propagation laws. Recall from week 2 that we have Σy=JΣxJT\Sigma_{y}=J\Sigma_{x}J^{T} where JJ is the Jacobian of y=f(x)\boldsymbol{y}=f(\mathbf{x}).

We will use the following notation. Let FkF_{k} be the Jacobian of f(x^k1)f(\hat{\boldsymbol{x}}_{k-1}) and let HkH_{k} be the Jacobian of h(x^kk1h(\hat{\boldsymbol{x}}_{k|k-1}). Then using the variance propagation law, the predicted state covariance is

EKF predicted covariance:Pkk1=FkPk1FkT+Qk,\begin{alignat}{2} \text{EKF predicted covariance:} & \qquad & P_{k|k-1} & =F_{k}P_{k-1}F_{k}^{T}+Q_{k}, \end{alignat}

which is the same equation as for the linear Kalman filter, except that FkF_{k} is a Jacobian matrix that will change at every timestep. Note that we assume zero covariance in the control inputs uk\boldsymbol{u}_{k}. If you believe the control inputs uk\boldsymbol{u}_{k} or control model b()b\left(\cdot\right) have uncertainty then you must use the error propagation laws on the b(uk)b\left(\boldsymbol{u}_{k}\right) term as well.

Similarly the measurement residual covariance is:

EKF measurement covariance:Sk=HkPkk1HkT+Rk.\begin{alignat}{2} \text{EKF measurement covariance:} & \qquad & S_{k} & =H_{k}P_{k|k-1}H_{k}^{T}+R_{k}. \end{alignat}

Again the equation is the same, except that HkH_{k} is a Jacobian matrix.

All other parts of the filter can remain the same.

Intuitively, the extended Kalman filter is simply finding a linear model that best matches the non-linear functions ff and hh. It is effectively the (multi-dimensional) “tangent to the curve” at the current state estimate.

The extended Kalman filter is not necessarily an optimal estimator, in the sense that the linear Kalman filter is mathematically proven to minimise the sum of squares of the error for linear systems. If the system state is linear then the EKF reduces to the normal Kalman filter, in which case it is optimal. For non-linear systems, the optimality cannot be guaranteed. However, in practice the EKF often works well.

Self check quiz 4.1

What is the difference between a standard Kalman filter and the extended Kalman filter? Select all that apply.

Answer: (a), (c), (d), (e)

Example of an Extended Kalman Filter (EKF)

Suppose that you want to track the motion of a pendulum (Figure 1).

Figure 1: A pendulum of mass mm is suspended on a rigid string of length LL. The force of gravity in the direction perpendicular to the string is indicated. The state of the pendulum is described by the angle θ\theta as well as the angular velocity θ˙\dot{\theta}. Zoom:

Define the state vector

x^=(θθ˙).\begin{equation} \boldsymbol{\hat{x}}=\left(\begin{array}{c} \theta\\ \dot{\theta} \end{array}\right). \end{equation}

By analysing the forces, we can write the differential equations that describe this state:

dθdt=θ˙dθ˙dt=gLsinθ.\begin{align} \frac{d\theta}{dt} & =\dot{\theta}\\ \frac{d\dot{\theta}}{dt} & =-\frac{g}{L}\sin\theta. \end{align}

This is our physics model, which we will now use to develop an Extended Kalman Filter.

The prediction model

We must convert our system of continuous differential equations (13)-(14) into discrete time steps. There are many ways to do this. Here, we will use a method called “leapfrog integration”. Leapfrog integration is a very good scheme for classical mechanics problems, e.g. where the physics gives us an equation for acceleration (F=maF=ma) and the goal is to solve for velocity and displacement. The leapfrog integration scheme is as follows:

xkk1=xk1+x˙k1Δt+12x¨k1(Δt)2x˙kk1=x˙k1+12(x¨k1+x¨kk1)Δt,\begin{align} x_{k|k-1} & =x_{k-1}+\dot{x}_{k-1}\Delta t+\frac{1}{2}\ddot{x}_{k-1}\left(\Delta t\right)^{2}\\ \dot{x}_{k|k-1} & =\dot{x}_{k-1}+\frac{1}{2}\left(\ddot{x}_{k-1}+\ddot{x}_{k|k-1}\right)\Delta t, \end{align}

where xx is displacement (or angular displacement) and dots indicate time derivatives. In words, this method says that the new velocity should be computed based on the average of the old acceleration and the new acceleration.

Applying the leapfrog method to Eqs. (13)-(14), we find:

x^kk1=f(x^k1)=(θk1+θ˙k1Δtg2Lsin(θk1)(Δt)2θ˙k1g2L(sinθk1+sinθkk1)Δt)\begin{equation} \hat{\boldsymbol{x}}_{k|k-1}=f\left(\hat{\boldsymbol{x}}_{k-1}\right)=\left(\begin{array}{c} \theta_{k-1}+\dot{\theta}_{k-1}\Delta t-\frac{g}{2L}\sin\left(\theta_{k-1}\right)\left(\Delta t\right)^{2}\\ \dot{\theta}_{k-1}-\frac{g}{2L}\left(\sin\theta_{k-1}+\sin\theta_{k|k-1}\right)\Delta t \end{array}\right) \end{equation}

This is our prediction model, which says how to step the state vector forward in time.

Recall that the EKF relies upon linearising the prediction model around the current operating point. In other words, we need the Jacobian of f()f(\cdot). However, the Jacobian would be quite messy to calculate by hand. Notice that you would need to substitute the first element of the vector into the second, and then apply the chain rule to get the derivatives.

To reduce the risk of a mistake, it is recommended that you use a computer algebra system to calculate the derivatives. The provided Matlab code shows you how to automatically find Jacobians:

The resulting Jacobian, copy-pasted from the provided Matlab code, is:

F=(1(Δt)2gcos(θ)2LΔtΔtg(cos(θ)σ1((Δt)2gcos(θ)2L1))2L1(Δt)2gσ12L)where    σ1=cos(θ+Δtθ˙(Δt)2gsin(θ)2L)\begin{array}{l} F=\left(\begin{array}{cc} 1-\frac{\left(\Delta t\right)^{2}\,g\,\cos\left(\theta\right)}{2\,L} & \Delta t\\ -\frac{\Delta t\,g\,{\left(\cos\left(\theta\right)-\sigma_{1}\,{\left(\frac{\left(\Delta t\right)^{2}\,g\,\cos\left(\theta\right)}{2\,L}-1\right)}\right)}}{2\,L} & 1-\frac{\left(\Delta t\right)^{2}\,g\,\sigma_{1}}{2\,L} \end{array}\right)\\ \mathrm{}\\ \textrm{where}\\ \mathrm{}\\ \;\;\sigma_{1}=\cos\left(\theta+\Delta t\,\dot{\theta}-\frac{\left(\Delta t\right)^{2}\,g\,\sin\left(\theta\right)}{2\,L}\right) \end{array}


Suppose that the measurement actually records the kinetic and potential energy, i.e. a non-linear function of the state.

h(x^)=(kinetic energypotential energy)=(12mv2mgh)=(12m(Lθ˙)2mg(LLcosθ)),\begin{equation} h\left(\hat{\boldsymbol{x}}\right)=\left(\begin{array}{c} \text{kinetic energy}\\ \text{potential energy} \end{array}\right)=\left(\begin{array}{c} \frac{1}{2}mv^{2}\\ mgh \end{array}\right)=\left(\begin{array}{c} \frac{1}{2}m\left(L\dot{\theta}\right)^{2}\\ mg\left(L-L\cos\theta\right) \end{array}\right), \end{equation}

where v=Lθ˙v=L\dot{\theta} is the velocity of the pendulum and h=LLcosθh=L-L\cos\theta is the height of the pendulum above its lower-most point.

Again we need a Jacobian matrix

H=[dh1dθdh1dθ˙dh2dθdh2dθ˙]=[0mL2θ˙mgLsinθ0].\begin{equation} H=\left[\begin{array}{cc} \frac{dh_{1}}{d\theta} & \frac{dh_{1}}{d\dot{\theta}}\\ \frac{dh_{2}}{d\theta} & \frac{dh_{2}}{d\dot{\theta}} \end{array}\right]=\left[\begin{array}{cc} 0 & mL^{2}\dot{\theta}\\ mgL\sin\theta & 0 \end{array}\right]. \end{equation}

EKF example code

Refer to the attached Matlab script for a complete working example.

Implementation notes

A few points to note:

  • The Jacobian matrices FF and HH vary with each time step so you must re-calculate them inside your loop.
  • You will typically need to tune the process covariance to get good results. Essentially you are trading off how much the filter believes the prediction vs how much it believes the measurement. As with the linear Kalman filter, a diagonal matrix with standard deviation proportional to the time step is a plausible starting point in the absence of a more detailed error model.

The Unscented Kalman Filter (UKF)

The UKF is another method for estimating a non-linear process. Its key advantage is that it does not require the computation of Jacobians. Hence it can be applied to models that are non-differentiable (for example arbitrary computer code, or models containing with discontinuous functions where derivatives would be undefined).

The key idea is that instead of propagating just the state vector x^\hat{\boldsymbol{x}} through the model, we also propagate some so-called “sigma points” around x^\hat{\boldsymbol{x}}. These points are carefully chosen to sample the state covariance. You can think of them as exploring various possible states near the estimate, to test how the prediction model would behave on neighbouring states as well. This allows non-linearities in the model to be properly accounted for.

The Unscented Transform

The Unscented Transform solves this problem: given some calculation y=f(x)\boldsymbol{y}=f(\boldsymbol{x}) find the covariance Σy\Sigma_{y} without using a Jacobian matrix to approximate ff. Hence it can be applied to any function ff including arbitrary computer code.

The steps are as follows:

First define some parameters: let α\alpha be a scaling factor of typical size α=0.001\alpha=0.001 (chosen empirically). Let β\beta be a factor that incorporates knowledge of the statistical distribution of the states, where β=2\beta=2 is the optimal value for a Gaussian distribution. Let nn be the number of elements in the state vector.


λ=α2nn\begin{equation} \lambda=\alpha^{2}n-n \end{equation}


A=n+λ chol(Σx),\begin{equation} A=\sqrt{n+\lambda}\ \text{chol}\left(\Sigma_{x}\right), \end{equation}

where chol(Σx)\text{chol}\left(\Sigma_{x}\right) is the Cholesky factorisation of the covariance matrix Σx\Sigma_{x}. The Cholesky factorisation is roughly akin to a matrix square root. It is implemented in Matlab with the chol function.

Next, expand the vector x\boldsymbol{x} into a matrix XX

X=[x(x+AT)(xAT)],\begin{equation} X=\left[\begin{array}{ccc} \boldsymbol{x} & \left(\boldsymbol{x}+A^{T}\right) & \left(\boldsymbol{x}-A^{T}\right)\end{array}\right], \end{equation}

where x±AT\boldsymbol{x}\pm A^{T} performs addition/subtraction between the vector x\boldsymbol{x} and each column in the matrix ATA^T. (If you are using Matlab, you can simply write + and - because that is the behaviour of those operators.) The matrix XX has 2n+12n+1 columns.

Each column of XX is one of the sigma points.

X=[X1X2X2n+1]\begin{equation} X=\left[\begin{array}{cccc} \vdots & \vdots & & \vdots\\ \boldsymbol{X}_{1} & \boldsymbol{X}_{2} & \cdots & \boldsymbol{X}_{2n+1}\\ \vdots & \vdots & & \vdots \end{array}\right] \end{equation}

Transform each sigma point through the function to produce a new matrix FF, i.e.

F=[F1F2F2n+1]F=\left[\begin{array}{cccc} \vdots & \vdots & & \vdots\\ \boldsymbol{F}_{1} & \boldsymbol{F}_{2} & \cdots & \boldsymbol{F}_{2n+1}\\ \vdots & \vdots & & \vdots \end{array}\right]

where each column of FF is given by


If ff is arbitrary computer code, then write a for loop to process each column of XX separately and concatenate these columns to form FF.

We will construct the outputs y\boldsymbol{y} and Σy\Sigma_{y} as a weighted sum over the columns of FF. For each column we define a mean weight wmw^{m} and a covariance weight wcw^{c}:

w0m=λn+λw0c=λn+λ+1α2+βwjm=wjc=12(n+λ),j=1,2,...,2n.\begin{align} w_{0}^{m} & =\frac{\lambda}{n+\lambda}\\ w_{0}^{c} & =\frac{\lambda}{n+\lambda}+1-\alpha^{2}+\beta\\ w_{j}^{m}=w_{j}^{c} & =\frac{1}{2(n+\lambda)},\qquad j=1,2,...,2n. \end{align}

Form vectors by concatenating these weights

wm=[w0mw1mw2nm]wc=[w0cw1cw2nc].\begin{align*} \boldsymbol{w}^{m} & =\left[\begin{array}{cccc} w_{0}^{m} & w_{1}^{m} & \cdots & w_{2n}^{m}\end{array}\right]\\ \boldsymbol{w}^{c} & =\left[\begin{array}{cccc} w_{0}^{c} & w_{1}^{c} & \cdots & w_{2n}^{c}\end{array}\right]. \end{align*}

Note that other definitions of the weights are also possible. Different authors use different weights. A necessary constraint is that jwjm=1\sum_{j}w_{j}^{m}=1.

Then the expected value of the output is given by:

y=F×(wm)T.\begin{align} \boldsymbol{y} & =F\times\left(\boldsymbol{w}^{m}\right)^{T}. \end{align}

The covariance is given by

Σy=wc[Fy][Fy]T,\begin{align} \Sigma_{y} & =\boldsymbol{w}^{c}\cdot\left[F-\boldsymbol{y}\right]\left[F-\boldsymbol{y}\right]^{T}, \end{align}

where \cdot is the elementwise product (i.e. the Matlab .* operator).

Applying the Unscented Transform to the Kalman filter

Prediction step

Given the previous state x^k1\hat{\boldsymbol{x}}_{k-1} and covariance Pk1P_{k-1}, compute the 2n+12n+1 sigma points using the equations above and put these in the columns of the matrix XX.

Propagate the sigma points through the prediction model ff to obtain the matrix FF:

F=f(X,uk).\begin{equation} F=f\left(X,\boldsymbol{u}_{k}\right). \end{equation}

Following the Unscented Transform, use FF to calculate the predicted state x^kk1\hat{\boldsymbol{x}}_{k|k-1} and covariance Pkk1P_{k|k-1}.

Finally add in the process covariance to Pkk1P_{k|k-1}.

Measurement step

If the measurement is a linear function of the state, then you can use the normal Kalman filter from this point forwards.

If your measurement model is non-linear, then apply the Unscented Transform a second time to propagate the predicted state through the measurement model. Define a new set of sigma points based on the predicted state and predicted covariance. Then propagate these through the measurement model:


Reconstruct the predicted measurements

zpred=H×(wm)T,\begin{equation} \boldsymbol{z}_{pred}=H\times\left(\boldsymbol{w}^{m}\right)^{T}, \end{equation}

the measurement residual

y=zzpred,\begin{equation} \boldsymbol{y}=\boldsymbol{z}-\boldsymbol{z}_{pred}, \end{equation}

and the measurement residual covariance

S=wc[Hzpred][Hzpred]T+Rk,\begin{equation} S=\boldsymbol{w}^{c}\cdot\left[H-\boldsymbol{z}_{pred}\right]\left[H-\boldsymbol{z}_{pred}\right]^{T}+R_{k}, \end{equation}

where RkR_{k} is the measurement covariance.

Update step

If the measurement model is linear, use the standard Kalman filter update step. Otherwise, proceed as below.

The update step needs a slight adjustment because there is no measurement matrix to easily link measurement space and state space. Instead we can use the cross-correlation between state and measurement:

T=wc[Fx^kk1][Hzpred]T.\begin{equation} T=\boldsymbol{w}^{c}\cdot\left[F-\hat{\boldsymbol{x}}_{k|k-1}\right]\left[H-\boldsymbol{z}_{pred}\right]^{T}. \end{equation}

The Kalman gain in this case is

K=TS1.\begin{equation} K=TS^{-1}. \end{equation}

The updated state and covariance are given by

x^k=x^kk1+KyPk=Pkk1KSKT.\begin{align} \hat{\boldsymbol{x}}_{k} & =\hat{\boldsymbol{x}}_{k|k-1}+K\boldsymbol{y}\\ P_{k} & =P_{k|k-1}-KSK^{T}. \end{align}

Self check quiz 4.2

What is the difference between the extended Kalman filter and the unscented Kalman filter? Select all that apply.

Answer: (a), (c)

Example of an Unscented Kalman Filter (UKF)

Refer to the attached Matlab script for a complete working example.

Implementation notes

  • The initial state covariance cannot be 0 because Cholesky factorisation is only defined for positive definite matrices (those with positive eigenvalues). This is true for any legitimate covariance matrix. If the Cholesky factorisation throws an error then most likely your prediction model is diverging or you have made a mistake in the equations.

  • A major advantage of the UKF is the ability to handle sophisticated prediction models where derivatives are impossible or unreasonable to compute. For example, your prediction model could be an entire computer program. All you need to do is run each of the sigma points through your prediction model and you can thereby use it in your Kalman filter.