Home Science Fun Gaming Photos
Science Home Publications Research Downloads Teaching

Research interests and activities

My research interests and activities can roughly be grouped into these topics; for guidance, I am taking inspiration from research into human learning, especially developmental learning during infancy.

System-level integration in cognitive systems

Loading the player...

This is quite a new endeavour (see
the paper): linking different components of present-day "cognitive systems" the way our brain does! The main inspiration for this project came from studies on predictive visual attention hat guides eye movements to where an object *will* be. For the investigated objects, squash balls, this is not a simple task and requires considerable model buimding, especially when dealing with rebounding balls. Taking things simple, however, I decided to implement something much less challenging, but still complicated to do, which I term "dynamic attention priors": the improvement of a visual detector by a prediction of where objects (here: pedestrians) will be in the next frame. This knowledge comes from a second module performing trajectory analysis of detected pedestrians ("tracking"). Detection scores are boosted at places where pedestrians will presumably appear, leading to a strong gain in reliability of the detector, even where pedestrians are actually too small for the used detection windows. In particular, this technique works well where pedestrians appear before difficult backgrounds, or are partly occluded.
The more general issue here is the information exchange in cognitive systems: it seems to me that brain-like performance comes not so much from an extraordinary single algorithm, but from many ordinary algorithms which however work very closely together: the tracking talking to object detection, ground plane estimation talking to tracking, color analysis talking to object detection, inertial sensing talking to tracking a.s.o.. Enjoy the video!

Go to top of page

Object detection in context

This line of research aims at learning high-level knowledge such as "pedestrians are usually found on sidewalks and not on roofs", and translating it into lower-level descriptions that can be used to guide local pattern-based detection methods. Not only can such approaches increase the detection accuracy significantly, but also the design time is strongly reduced. I have already suceeded in showing this in the context of vehicle detection, see
the paper. I believe this kind of "common sense models" (I term them "context models") that humans have learned for more or less all types of objects in different situations, and the ability to translate them into precise and efficient search strategies, is what makes human perception so powerful.
What interests me currently is the question of how to learn situation-specific context models. As a very obvious example, consider the search for pedestrians in inner-city and highway traffic: while in the former case one might have to look preferentially at the sidewalk, while in the latter case one does not look for pedestrians at all since they are rarely encountered on highways.

Go to top of page

Multi-class classification for human-machine interaction

This activity aims at recognizing hand poses observed by a low-cost time-of-flight sensor, with the aim of obtaining a robust, solution that The interesting part here is the multi-class classification aspect, which is not really well understood theoretically. So we have proposed our own modest contribution that is very pragmatic in that is does not take sides in the fight over the correct classification or decomposition method (SVM, MLP, one-versus-one, one-versus-all,..) but proposes a simple way of improving on top of virtually any of these architectures. The basic idea is to take the graded outputs of an initial multi-class system and train a second classifier on top of that, thus exploiting any residual correlations. Studies on a very large databases of 3D hand poses have shown the efficiency and practical applicability of this approach (depicted below), see also

Go to top of page

Benchmark databases for vehicle detection

In the past few years at Honda Research Institute, I have spent considerable time on creating a benchmark database for vehicle detection that can be made publicly available for comparison purposes. In 2011, this activity resulted in the creation of the HRI RoadTraffic dataset which provides approximately 7= minutes of high-resolution/stereo/RGB video, along with ego-motion information and, above all object annotations to all interested researchers. The dataset is divided into 5 video streams recorded while driving the same route 5 times under very different environment and lighting conditions: overcast, rain, low sun, night, and snow. Annotated classes include generic obstacles, traffic signs, cars, trucks and pedestrians although the number of the latter is not large.
Highlights: This
technical report contains a description of the dataset. Based on the HRI RoadTraffic dataset, we conducted a vehicle detection benchmark using an object detection system developed at Honda Research Institute Europe GmbH.
Due to company policy at Honda, the dataset is not directly downloadable: instructions how to obtain the dataset are given in the benchmark paper we did using the HRI RoadTraffic dataset. Alternatively, a mail to me (alexander dot gepperth at ensta dot fr) will achieve the same effect.

Go to top of page

Multi-modal, weakly supervised learning

In object detection, training databases are usually created by human inspection of video images. therefore a sufficient number of training examples is hard to come by as the inspection ("labelling") process is very time consuming and therefore expensive. Often, semi-automatic approaches are used that employ tracking methods to reduce human effort but, especially for multi-class problems like pedestrian pose classification, the number of examples for each class is still low. What we need are therefore learning methods than can cope with absence of direct supervision in the form of crisp, symbolic labels ("pedestrian","cat","bike"). Instead, "weak supervision" signals need to be discovered in a bunch of high-dimensional data provided by a processing system into which learning is usually embedded. In this sense, a system would no longer learn that a certain pattern class is a "car" but rather that it usually co-occurs with other events, such as, e.g.$ a certain pattern in another sensor stream. An initial step in this direction has been taken in this
preliminary work, using two simulated sensor streams, between which the PRORPE learning algorithm detects correlated sub-spaces which are subsequently enhanced. A currently ongoing effort is to transfer this to real-world data coming from the KITTI vehicles detection benchmark, and where the sensor streams are based on visual and LIDAR information.

Go to top of page

Scalable incremental learning

This activity aims at learning algorithms that are usable in a "big data" context: The focus on generative algorithms is easily explained as in real applications it is imperative to classify outliers as such, which generative methods are capable of doing but discriminative ones are not. Furthermore, for the discovery of correlations between high-dimensional data flows, a generative algorithm is beneficial as it represents the whole distribution and therefore can detect relations better than a discriminative one that just represents a hyperplane. The PROPRE algorithm, as proposed in
this paper, already fulfills a good deal of these requirements. Namely, it is generative, scalable, efficient, applicable to high dimensional data and robust to irrelevant dimensions. By changing a small detail in the learning architecture (learning when classification is WRONG instead of learning when it is correct), incremental learning becomes possible. This has been described in a preliminary work and is an ongoing activity of high priority, see this recent publication. My interest for this topic made me co-organize a special session on incremental learning.

Go to top of page

Probabilistic information processing with recurrent neural hierarchies

Although no biological organism ever has all the information it needs at its disposal, we know that at least humans have the capacity to take optimal decisions even in the face of incomplete and noisy data. Here, "optimal" means that the probability for a correct decision is maximal when applying analyzing the problem using probability theory. This capacity of humans is somewhat surprising because individual neurons, or populations of them, do not seem very good candidates for performing the special kind of mathematical operations required for optimal decision making in probability theory. Basic issues are: I propose a simple way out of this dilemma by proposing that: The beauty of this approach is that it does not rely on the particular properties of a certain model, just on generic mechanisms like lateral competition. It is very probable that both spiking and non-spiking models can be parametrized to achieve the same effect. Furthermore, the probability of the leading input interpretation under the internal model is expressed by latency which can be easily transmitted and decoded by subsequent neural layers. In this way, deep hierarchies may be built which pass around latency information.This theoretical construct (see
here) has been applied to simple object recognition tasks so far (click here) . It has, in my view, the potential to scale up to real-world recognition tasks, both conceptually and computationally. Stay tuned!

Go to top of page