Wednesday, November 16, 2005

Meeting with supervisor

Today we had our third VIP research seminar. Dr. Chandrashekhara talked about how he's been going about trying to reduce the degree of freedoms/control points for his 3D model of the left ventricle. I am guessing that control points are points at which the model may be deformed. Later, Dr. Chandra told me that a control point has three degrees of freedom (i.e. 3D). He also talked about Subdivision surfaces. This concept is commonly used in computer graphics and animation. From what i understand, you try to smooth a mesh by continuous iterative refinements. The number of refinements are infinite. Here's an example.

There was also a few things said about B-splines. I was a bit embarassed to myself, for not knowing what a B-spline is. I first came across it in a computer graphics course in Toronto. Reviving myself of what it is, a spline is a special (usually complex) curve defined piecewise by polynomials. In computer graphics, splines are normally used in curve-fitting, i.e. to use a curve to approximate some complex shape. The most popular of all B-splines, is the cubic spline.

The talk by Dr. Chandra was followed by a PhD student. I couldn't quite figure out what she was saying. The paper she presented was pretty complicated, owing to the amount of heavy-duty equations it had. There was something about the EM-algorithm that the paper repeatedly referred to.

The seminar was followed by my meeting with Prof. Daniel. I quite completed what he had asked me to complete last week. I was asked to implement Cootes' statistical shape model using MATLAB. It was a trivial task, however, after looking at the approximations the model produced, there is a strong likelihood of a bug somewhere.

I also learned a few important things about modes, etc. If we train our model with n training samples (x1, .... , xn), we will have a covariance matrix which will have n eigenvectors. These n eigenvectors are our n modes. Using this "trained" model, we will now be able to produce an exact replica of an instance of our training set, xi, if we used all the n eigenvectors (i.e. modes). Note, that we use our original xi to produce an approximation for xi (xi hat). However, this is not the case (i.e. it is not an exact replica) if xi is not in the training set, for e.g. i > n. This is also not the case when we dont use all n eigenvectors (i.e. some modes). Actually, what Dr. Daniel told me later that this is a way of testing if our model is correct, i.e. if xi is in the training set, the xi produced should be identical to xi (provided all modes are used).

Hence, using all modes (all eigenvectors) explains 98% (~100%, 2% is noise) of the training set. It is interesting and worth investigating to see how this percentage varies with the number of modes. This week I will be trying to investigate the relationship between the number of modes and the percentage of data that can be explained. I will also try to see if this varies across different training set sizes.

The motivation here comes from the fact that if for smaller training set sizes, we can explain a large portion of our data using some number of modes (i am not sure how many modes), we would be able to reduce the task of manual annotation (landmarking), i.e. smaller training sizes require lesser annotations.




Tuesday, October 25, 2005

PhD :: XP programming

Extreme prorgamming (XP) is a agile software development method, which I came across in the undergraduate Software engineering course that I took at Toronto. It didn't make much sense to me, and usually it doesnt make much sense to people who haven't programmer in the real-world for the real-world. All I remember of XP was that it required programmers to program in pairs. But that is probably a lesser important characterisitic of the agile method. Just the fact that two programmers had to sit together and program in XP, probably amused everyone.

XP is a rapid application development methodology, whereby less time is spent on design. This immiediately requires a lot of refactoring (cleaning up) code, which is another important characteristic of XP. So when do we pick XP over other methods? XP is best used for developing systems which require relatively a lesser amount of design time. So for e.g. e-commerce systems built using the Java framework, would require a far less time spent on design, due to the availability of already-built frameworks (such as JavaBeans, etc.). The developers build on top of the framework, which in a way dictates the programmers to follow a certain design. I say this from my experience at BEA, where portals developed using BEA's Weblogic Workshop, followed a more-or-less standard design rules set by BEA. These rules were obviously well established, and have known to work. The rules could also, ofcourse be changed, by re-configuring a few XML files.

Thursday, October 20, 2005

PhD :: Learning how to teach and what to teach

(Written for yesterday)
Today I have been observing how the computing department here at Imperial college introduces programming to first-year computer science students. Haskell is taught to first years here, and it appears that Haskell is quite popular amongst hackers, as discovered at a programming contest for hackers. More details here. Haskell is a functional language. Not many computer science graduates would have even heard about it. I think it's quite unpopular in the programming community, and the main reason for that is there are other functional languages out there which are "more" functional than Haskell. I have noticed that MIT Scheme, a functional language we were taught at Toronto, resembles Haskell to a certain extent, except that Scheme is purely functional.

No first year computer sciencers get to cross the bridge without doing Java. At Imperial, Kenya is used to introduce students to Java. Well, now Kenya has a different story. It hides from it's learner huge loads of useless information, which for someone leanring java for the first time is really a pain. Kenya is an interface to java, where Kenya code is translated to Java code. So for example a new learner can type "void main" in kenya and this gets translated internally to "public static void main". S/He can learn what "public static ... (String[] args)" later on, and concentrate on the logic required to learn programming.

Other UK universities, who are not proponents (or not aware of) of Kenya, have started using a fairly easy-to-use IDE called the BlueJ.

This year I am tutoring for Software engineering methods and Hardware. Dr. Huth and Dr. Gillies are my sueprvisors, and it is a pleasure chit-chatting with them. I try spending informative sessions with them about the course material, and in the process trying to teach myself how to teach these courses.

My meeting with Dr. Ruckert was cancelled early morning. I am meeting him this Friday. I have spent most parts of the day trying to teach myself Haskell.

Tuesday, October 18, 2005

PhD :: Modelling shape variation using PCA

I spent a major amount of my time reading and trying to grasp the basic concepts used in Principal components analysis (PCA) in Lindslay Smith's tutorial on PCA. This is a very gentle introduction to PCA and lays out the mathematical concepts in a rather very "gentle" way. At first, I thought the paper was meant for psychology students, only later to find a chapter on the use of PCA in machine vision.

A few days earlier, I had learned the true meaning of covariance, which is really the variance measure in higher dimensions. Covariance is usually represented in matrix form, and I believe it is only useful when it is used for higher dimension (The higher dimension gives it the matrix form). Lindslay's tutorial describes covariance really well.

I later refered back to the Cootes' paper on Statistical models, which on my list of literature reviews for the next few weeks. I have understood the idea behind capturing shape variation, however, I still need to understand further the mathematical equations which Cootes has laid out. I find it amusing how a shape described by n points in d dimensions, can easily be represented by a single vector, and compared using PCA.

The fact that an equation can be obtained (using just a single parameter) for a training set, on which a PCA has been performed, is also beautiful. I have peeked further into the paper, and Cootes describes how the distribution of the parameter in the equation can be modeled (from training set) to produce plausible shapes.

I have a meeting with Dr. Daniel tomorrow, and will attempt to clarify a few things that I have been wondering about.


Wednesday, June 22, 2005

No to EM algorithm for computing Gaussian mixtures in background modeling

The mixture of Gaussians becomes necessary for modelling the background pixels, owing to the multi-modal distribution of pixel values that can occur in scenes recorded over time. Below are some examples taken from Stauffer. C. et al. paper, where he illustrates the red and green values of a certain pixel, taken from a scene viewed over time. Note the bi-modal distribution of the pixel's values. These images are taken from Stauffer et. al. "Adaptive background mixture models for real-time tracking"



So now the question: given that we have these clusters of data sets, how do we come up with the Gaussian distributions? Normally, we would use the EM (Expectation maximization) algorithm to estimate the values of mu and sigma. A great tutorial/slides on EM can be found here.

EM achieves accuracy and "wraps" properly around the clusters after a number of iterations. In background modeling, each pixel is modeled as a mixture of gaussians. Hence, performing the EM on each pixel is a costly process, owing to the complexity of the EM algorithm. Stauffer et. al. suggests using an alternative approach as described in their paper "Adaptive background mixture models for real-time tracking". More on this, later.

Saturday, June 11, 2005

The implementation of the GaussianModeler

Implementing the data structure for the gaussian modeler required careful thought and consideration. Here my advanced course in java that i took for my M.Sc helped me come up with a good design. For most parts of the system, i want all my components to be loosely coupled. By "loosely coupled" i mean i want all the components to have a great degree of autonomy. In the sense that, if i dont like a component, i can easily replace it with another component, without making major changes to the controller/main code.

I figured that my gaussian model might change in the future, owing to the amount and different conditions that i will experimenting in - a single gaussian background model may not be sufficient.

I built a SingleGaussianBackground Model class that encapsulates in itself a structure that is illustrated below:



The RGBGaussianModeler class basically stores every information about the gaussian distribution (mean and standard deviation), and also has methods for updating the Gaussian model, for example adding a red, green or blue color to the distribution, changes the distribution. The RGBGaussianModeler doesnt store a single gaussian distribution, it stores the three separate distributions for all three channels (RGB) of the pixel.

The SingleGaussianBackground is less abstract, in the sense that it has methods that can access a pixel's RGBGaussianModel. It's constructor creates an object out of an array of images of type IplImage - openCVs image type.

Friday, June 10, 2005

Back to the old gaussian model

I was thinking too hard on the background model, and moreover my initial background was seriously flawed. After a series of experiments, I realized that my previous model picked up only the significantly differing colored pixels, as foreground. If you look at the previous posts, you will see that the system's been picking up the purple and yellow balls. I needed it to pick up humans in the scene, which seem to appear in a shade of black and grey from a distance.

My model had a fundamental problem. I had only three gaussian for the background model. One for each of the color channels (R,G and B), hence justifying its ability to properly detect differing colors.

However, this model had to be changed. Any background model should have three gaussian per pixel of the background that is to be learnt by the machine.

Here is an illustration of how this works, the system computes gaussians for each channel of each pixel of the background model



After the background is learnt (i.e. the single gaussians calculated), an image on which background subtraction is applied, undergoes the following tests as illustrated below:

Monday, June 06, 2005

Implementing W4's background model

The simple background that i have implemented doesnt seem to be much useful. However, its a first step to the considerable amount of work that needs to be accomplished to get the background model right.

My background model detector needs to take care of scenes that start out with moving leaves of trees, or if time permits, it should also be able to process and store backgrounds that start off with some moving traffic, etc. The background also needs to update itself recursively, for e.g. there could be an instance where a person could come into the scene and leave an object behind, in which case the object becomes part of the background. Or there could be illumination changes in the scene.

I came across a paper by Haritaoglu et al., titled "W4: Real time surveillance of people and their activities" (published in IEEE transactions on machine intellingence and pattern analysis, August 2000), where they describe a background model that can learn a background even if there are moving objects in the scene (e.g. traffic or leaves), and it can also adapts itself recursively using the so-called "support maps", hence allowing for objects that become part of the background later on.

W4, however, fails when there are sudden changes in illumination, in which case it "thinks" most of the background is the foreground. W4 tries to recover from such catastrophes by using a percentage indicator, i.e. if lets say more than 80% of the background is thought to be the foreground, reject the background model and build a new one immediately.

To begin with implementing W4 into my system, I required a median filter to distinguish between moving and stationary pixels. I am able to use the median filter from Intel's cv library, however, I still need to investigate how it can be used to distinguish moving and non-moving pixels.

The next step is to store a vector (5 x 1) for each pixel, that is modelled as the background. For this i required a data-structure that looked something like this:


However, when you are trying to implement such a structure in C++, it ends up looking something like this:

Tuesday, May 31, 2005

Through the "eyes" of my program

I finally have results of my program. It can now detect "major" foregrounds, by learning the background. In the image below, what you see are three images: the backgroud (Top left), the foreground (the colored balls) on the background (top right), and what the program has detected (botton left).

Right now, i am using a single Gaussian distribution for the learning function. The results are not that great on real humans walking in the scene. I might need to use a more complicated Gaussian distribution. Thinking, that i might implement the background subtraction technique by Yamada et al. They use intensity (which is a linear function of R,G and B) of each pixel to plot the single Gaussian distribution, and henceforth, compares the Gaussian model to the intensity of each pixel in the image where it searches for the foreground, to differentiate between the foreground and the background.

Monday, May 30, 2005

Selecting multiple files using an open file dialog

Its a pain to program using the MFC (Microsoft Foundation classes). Even people at microsoft use the Win32 API. MFC is a wrapper for Win32 API, and it is meant to be easy to use. I haven't really had the chance to look at the Win32 API.

My task was to implement a multiple select open file dialog. I found this forum post on the net, this is the simplest way i have found so far. Other implementations involve extending the CDocManager class which can get quite complicated, especially if you are not familiar with MFC.

Earlier, I was unable to properly construct a multiple file open dialog. My file open dialog would only open a limited number of files (usually 5-6). I soon realized that it was limited by the size of CString. Including a TCHAR, which is simply a pointer to a string, much like char* solves the problem. However, the only difference is that, if you compile a TCHAR using a uni-code compiler (compiler options set to uni-code), you end up with a wchar_t (used for i18n). MSDN's documentation on TCHAR is here.

GNU MP - arithmetic without limitations

GNU MP (a.k.a GMP) is a library written in C for arbitrary precision arithmetic. I was able to compile version 4.1 of GMP successfully in the visual c++ 6.0 environment, using its cl.exe compiler.

I followed instructions that i found in this site. You also need to download the core library from that site, where you will find patch files (such as .dsw) that lets you open a project in visual c++, and from there on you compile. To link your application program to the library, you need to include the header, and some library files that are available through a download of the static GMP library (see site).

Unfortunately, the c++ wrapper wouldn't work. I was disappointed, it wouldn't compile. However, it is safe to call the C functions from C++ in GMP. I attempted to overload the arithmetic operators, by defining my own wrapper class, that wraps around a GMP float. However, i abandoned my effort, and am currently using the mpf_xxxx functions. Its quite cumbersome to use GMP's functions. Operator overloading might be required as the number of calculation grows.

links:
GNU MP's official site
GNU MP's documentation - priceless if you are thinking of integrating GMP into your application.

Saturday, May 28, 2005

Stuck with computations that dont fit

Its been good progress lately. I have implemented most of the code that was required to open multiple images. Spent some time trying to implement the multiple file open dialog in MFC. It was a pain. There are different implementations on the web, such as the ones where they extend the CDocManager class, and write their own methods.

However, I came across a much simpler version at: click here and implemented that. However, it uses a CStringList to store the filenames. This seems to be limited in terms of the file paths it can store. A reader posted that it can store at an average 40 files, depending on the file name. An alternative solution to this is using TCHAR. I haven't implemented that yet. So my multiple file select dialog is still limited in functionality.

I am able to read pixels off multiple images, which was a goal i had set for the weekend. I think i have achieved that goal. I also created a GaussianModeler class, that creates a Single Gaussian distribution out of a list of Red/Blue/Green colors. So a single gaussian model for each of the color channels. The constructor reads off from a vector, which i also spent some time implementing. It was a tough time figuring, how to iterate through a vector given that you only have a pointer to the vector - you use an iterator.

Right now, i am stuck trying to accomodate the numbers that arise in the computations that my program performs. I require data types that fit in some 20-30 digits. the maximum that an unsigned long int can hold is 0 to 4,294,967,295. After asking haseeb, he pointed out to me a few unlimited precision arithmetic math libraries that allow unlimited digits.