Wednesday, June 22, 2005

No to EM algorithm for computing Gaussian mixtures in background modeling

The mixture of Gaussians becomes necessary for modelling the background pixels, owing to the multi-modal distribution of pixel values that can occur in scenes recorded over time. Below are some examples taken from Stauffer. C. et al. paper, where he illustrates the red and green values of a certain pixel, taken from a scene viewed over time. Note the bi-modal distribution of the pixel's values. These images are taken from Stauffer et. al. "Adaptive background mixture models for real-time tracking"



So now the question: given that we have these clusters of data sets, how do we come up with the Gaussian distributions? Normally, we would use the EM (Expectation maximization) algorithm to estimate the values of mu and sigma. A great tutorial/slides on EM can be found here.

EM achieves accuracy and "wraps" properly around the clusters after a number of iterations. In background modeling, each pixel is modeled as a mixture of gaussians. Hence, performing the EM on each pixel is a costly process, owing to the complexity of the EM algorithm. Stauffer et. al. suggests using an alternative approach as described in their paper "Adaptive background mixture models for real-time tracking". More on this, later.

Saturday, June 11, 2005

The implementation of the GaussianModeler

Implementing the data structure for the gaussian modeler required careful thought and consideration. Here my advanced course in java that i took for my M.Sc helped me come up with a good design. For most parts of the system, i want all my components to be loosely coupled. By "loosely coupled" i mean i want all the components to have a great degree of autonomy. In the sense that, if i dont like a component, i can easily replace it with another component, without making major changes to the controller/main code.

I figured that my gaussian model might change in the future, owing to the amount and different conditions that i will experimenting in - a single gaussian background model may not be sufficient.

I built a SingleGaussianBackground Model class that encapsulates in itself a structure that is illustrated below:



The RGBGaussianModeler class basically stores every information about the gaussian distribution (mean and standard deviation), and also has methods for updating the Gaussian model, for example adding a red, green or blue color to the distribution, changes the distribution. The RGBGaussianModeler doesnt store a single gaussian distribution, it stores the three separate distributions for all three channels (RGB) of the pixel.

The SingleGaussianBackground is less abstract, in the sense that it has methods that can access a pixel's RGBGaussianModel. It's constructor creates an object out of an array of images of type IplImage - openCVs image type.

Friday, June 10, 2005

Back to the old gaussian model

I was thinking too hard on the background model, and moreover my initial background was seriously flawed. After a series of experiments, I realized that my previous model picked up only the significantly differing colored pixels, as foreground. If you look at the previous posts, you will see that the system's been picking up the purple and yellow balls. I needed it to pick up humans in the scene, which seem to appear in a shade of black and grey from a distance.

My model had a fundamental problem. I had only three gaussian for the background model. One for each of the color channels (R,G and B), hence justifying its ability to properly detect differing colors.

However, this model had to be changed. Any background model should have three gaussian per pixel of the background that is to be learnt by the machine.

Here is an illustration of how this works, the system computes gaussians for each channel of each pixel of the background model



After the background is learnt (i.e. the single gaussians calculated), an image on which background subtraction is applied, undergoes the following tests as illustrated below:

Monday, June 06, 2005

Implementing W4's background model

The simple background that i have implemented doesnt seem to be much useful. However, its a first step to the considerable amount of work that needs to be accomplished to get the background model right.

My background model detector needs to take care of scenes that start out with moving leaves of trees, or if time permits, it should also be able to process and store backgrounds that start off with some moving traffic, etc. The background also needs to update itself recursively, for e.g. there could be an instance where a person could come into the scene and leave an object behind, in which case the object becomes part of the background. Or there could be illumination changes in the scene.

I came across a paper by Haritaoglu et al., titled "W4: Real time surveillance of people and their activities" (published in IEEE transactions on machine intellingence and pattern analysis, August 2000), where they describe a background model that can learn a background even if there are moving objects in the scene (e.g. traffic or leaves), and it can also adapts itself recursively using the so-called "support maps", hence allowing for objects that become part of the background later on.

W4, however, fails when there are sudden changes in illumination, in which case it "thinks" most of the background is the foreground. W4 tries to recover from such catastrophes by using a percentage indicator, i.e. if lets say more than 80% of the background is thought to be the foreground, reject the background model and build a new one immediately.

To begin with implementing W4 into my system, I required a median filter to distinguish between moving and stationary pixels. I am able to use the median filter from Intel's cv library, however, I still need to investigate how it can be used to distinguish moving and non-moving pixels.

The next step is to store a vector (5 x 1) for each pixel, that is modelled as the background. For this i required a data-structure that looked something like this:


However, when you are trying to implement such a structure in C++, it ends up looking something like this: