Home Science Fun Gaming Photos
Science Home Publications Research Downloads Teaching

Research interests and activities

My research interests and activities can roughly be grouped into these topics; for guidance, I am taking inspiration from research into human learning, especially developmental learning during infancy.

Scalable incremental learning

This activity aims at learning algorithms that are usable in a "big data" context: The focus on generative algorithms is easily explained as in real applications it is imperative to classify outliers as such, which generative methods are capable of doing but discriminative ones are not. Furthermore, for the discovery of correlations between high-dimensional data flows, a generative algorithm is beneficial as it represents the whole distribution and therefore can detect relations better than a discriminative one that just represents a hyperplane. The PROPRE algorithm, as proposed in
this paper, already fulfills a good deal of these requirements. Namely, it is generative, scalable, efficient, applicable to high dimensional data and robust to irrelevant dimensions. By changing a small detail in the learning architecture (learning when classification is WRONG instead of learning when it is correct), incremental learning becomes possible. This has been described in a preliminary work and is an ongoing activity of high priority, see this recent publication. My interest for this topic made me co-organize a special session on incremental learning.

Go to top of page

Deep Convolutional Gaussian Mixture Models

This endeavor is related to my research on incremental learning, which must, to my mind; nece"ssarily contain an element of replay. However, the standard way of achieving this through GANs has its problems, which stems from the fact that GANs do not have an associated loss function, so there is no way to know whether a GAN is currently undergoing mode collapse, a frequent problem. Replacing GANs by GMMs as generators would be the ideal solution; however the quality of sampling from vanilla GMMs is strongly inferior to GANs. Therefore, I am looking for ways to create stacked convolutional variants of GMMs that can leverage the inherent compositionality if natural images by a hierarchical structure with local receptive fields, analogous to CNNs. Extension of GMMs that allow ebetter sampling behavior, like Mixtures of Factor Analyzers (MFA), are also under investigation.
A first prominent result of these activities is an SGD-based training algorithm for GMMs that works for lage sets of natural images and it superior, both in performance and manageability, to sEM, a stochastic version of EM, the usual training algorithm for GMMs.

Object detection in context

This line of research aims at learning high-level knowledge such as "pedestrians are usually found on sidewalks and not on roofs", and translating it into lower-level descriptions that can be used to guide local pattern-based detection methods. Not only can such approaches increase the detection accuracy significantly, but also the design time is strongly reduced. I have already suceeded in showing this in the context of vehicle detection, see
the paper. I believe this kind of "common sense models" (I term them "context models") that humans have learned for more or less all types of objects in different situations, and the ability to translate them into precise and efficient search strategies, is what makes human perception so powerful.
What interests me currently is the question of how to learn situation-specific context models. As a very obvious example, consider the search for pedestrians in inner-city and highway traffic: while in the former case one might have to look preferentially at the sidewalk, while in the latter case one does not look for pedestrians at all since they are rarely encountered on highways.

Go to top of page

Multi-modal, weakly supervised learning

In object detection, training databases are usually created by human inspection of video images. therefore a sufficient number of training examples is hard to come by as the inspection ("labelling") process is very time consuming and therefore expensive. Often, semi-automatic approaches are used that employ tracking methods to reduce human effort but, especially for multi-class problems like pedestrian pose classification, the number of examples for each class is still low. What we need are therefore learning methods than can cope with absence of direct supervision in the form of crisp, symbolic labels ("pedestrian","cat","bike"). Instead, "weak supervision" signals need to be discovered in a bunch of high-dimensional data provided by a processing system into which learning is usually embedded. In this sense, a system would no longer learn that a certain pattern class is a "car" but rather that it usually co-occurs with other events, such as, e.g.$ a certain pattern in another sensor stream. An initial step in this direction has been taken in this
preliminary work, using two simulated sensor streams, between which the PRORPE learning algorithm detects correlated sub-spaces which are subsequently enhanced. A currently ongoing effort is to transfer this to real-world data coming from the KITTI vehicles detection benchmark, and where the sensor streams are based on visual and LIDAR information.

Go to top of page

Probabilistic information processing with recurrent neural hierarchies

Although no biological organism ever has all the information it needs at its disposal, we know that at least humans have the capacity to take optimal decisions even in the face of incomplete and noisy data. Here, "optimal" means that the probability for a correct decision is maximal when applying analyzing the problem using probability theory. This capacity of humans is somewhat surprising because individual neurons, or populations of them, do not seem very good candidates for performing the special kind of mathematical operations required for optimal decision making in probability theory. Basic issues are: I propose a simple way out of this dilemma by proposing that: The beauty of this approach is that it does not rely on the particular properties of a certain model, just on generic mechanisms like lateral competition. It is very probable that both spiking and non-spiking models can be parametrized to achieve the same effect. Furthermore, the probability of the leading input interpretation under the internal model is expressed by latency which can be easily transmitted and decoded by subsequent neural layers. In this way, deep hierarchies may be built which pass around latency information.This theoretical construct (see
here) has been applied to simple object recognition tasks so far (click here) . It has, in my view, the potential to scale up to real-world recognition tasks, both conceptually and computationally. Stay tuned!

Go to top of page