My research interests and activities can roughly be grouped into these topics; for guidance, I am taking inspiration from research into human learning, especially developmental learning during infancy.

- Efficient incremental learning in a "big data" context
- High-dimensional data description by Deep Gaussian Mixture Models
- Multi-modal, weakly supervised learning
- Object detection in context
- Probabilistic information processing by recurrent neural hierarchies

This activity aims at learning algorithms that are usable in a "big data" context:

- efficient
- scalable, with linear dependency on training samples count
- incremental
- generative
- efficiently for problems of very high dimensionality (>1000)
- robust against irrelevant dimensions

A first prominent result of these activities is an SGD-based training algorithm for GMMs that works for lage sets of natural images and it superior, both in performance and manageability, to sEM, a stochastic version of EM, the usual training algorithm for GMMs.

What interests me currently is the question of how to learn situation-specific context models. As a very obvious example, consider the search for pedestrians in inner-city and highway traffic: while in the former case one might have to look preferentially at the sidewalk, while in the latter case one does not look for pedestrians at all since they are rarely encountered on highways.

In object detection, training databases are usually created by human inspection of video images. therefore a sufficient number of training examples is hard to come by as the inspection ("labelling") process is very time consuming and therefore expensive. Often, semi-automatic approaches are used that employ tracking methods to reduce human effort but, especially for multi-class problems like pedestrian pose classification, the number of examples for each class is still low. What we need are therefore learning methods than can cope with absence of direct supervision in the form of crisp, symbolic labels ("pedestrian","cat","bike"). Instead, "weak supervision" signals need to be

- how to represent probability densities? There is a variety of proposals around (population coding, sampling), each of which has some support from biology. Furthermore, the basic issues of neural information representation are still unresolved: do neurons primarily encode information in their firing rate, or in the precise timing of correlated spike sequences? And what information is encoded: a probability, a log-probability, or something totally different?
- neurons cannot multiply but perform mostly weighted sums. However probability theory often required the multiplication of probability densities which seems to be impossible with neurons.
- Strong lateral connections, which seem to be present in all known neocortical areas in humans, can be seen to prevent the exact representation of distributions.

- neural ensembles do not represent densities but just the interpretation of the input that is most likely under an internal model (i.e., the sought-after density)
- the internal model is not expressed through neural activity but through lateral connections.
- lateral connections do not disrupt but support the representation of probabilities as they help select the input interpretation with the highest probability under the underlying density
- inputs which are probable under the internal model create activity faster than those who are less probable due to the lateral connections. This is coherent as these actually encode the internal model.
- sub-leading interpretations can be recovered in a descendingly ordered temporal sequence by applying feedback inhibition (under research)