Research interests and activities
My research interests and activities can roughly be grouped into these topics;
for guidance, I am taking inspiration from research into human learning, especially developmental learning during infancy.
Scalable incremental learning
This activity aims at learning algorithms that are usable in a "big data" context:
The focus on generative algorithms is easily explained as in real applications it is imperative to classify outliers as such, which generative methods are capable of doing but discriminative ones are not.
Furthermore, for the discovery of correlations between high-dimensional data flows, a generative algorithm is beneficial as it represents the whole distribution and therefore
can detect relations better than a discriminative one that just represents a hyperplane.
The PROPRE algorithm, as proposed in this paper
- scalable, with linear dependency on training samples count
- efficiently for problems of very high dimensionality (>1000)
- robust against irrelevant dimensions
, already fulfills a good deal of these requirements.
Namely, it is generative, scalable, efficient, applicable to high dimensional data and robust to irrelevant dimensions.
By changing a small detail in the learning architecture (learning when classification is WRONG instead of learning when it is correct),
incremental learning becomes possible. This has been described in a preliminary work
and is an ongoing activity of high priority, see this recent publication
. My interest for this topic made me co-organize a special session on incremental learning
Go to top of page
Deep Convolutional Gaussian Mixture Models
This endeavor is related to my research on incremental learning, which must, to my mind; nece"ssarily contain an element of replay.
However, the standard way of achieving this through GANs has its problems, which stems from the fact that GANs do not have an associated loss function,
so there is no way to know whether a GAN is currently undergoing mode collapse, a frequent problem.
Replacing GANs by GMMs as generators would be the ideal solution; however the quality of sampling from vanilla GMMs is strongly inferior to GANs.
Therefore, I am looking for ways to create stacked convolutional variants of GMMs that can leverage the inherent compositionality if natural images
by a hierarchical structure with local receptive fields, analogous to CNNs. Extension of GMMs that allow ebetter sampling behavior, like Mixtures of Factor Analyzers (MFA),
are also under investigation.
A first prominent result of these activities is an SGD-based training algorithm for GMMs that works for lage sets of natural images and it superior, both in performance and
manageability, to sEM, a stochastic version of EM, the usual training algorithm for GMMs.
Object detection in context
This line of research aims at learning
high-level knowledge such as "pedestrians are usually found on sidewalks and not on roofs", and translating
it into lower-level descriptions that can be used to guide local pattern-based detection methods. Not only can such approaches
increase the detection accuracy significantly, but also the design time is strongly reduced. I have already suceeded in showing this in the context
of vehicle detection, see the paper
. I believe this kind of "common sense models" (I term them "context models") that humans have learned for more or less all types of objects in different situations, and the ability
to translate them into precise and efficient search strategies, is what makes human perception so powerful.
What interests me currently is the question of how to learn situation-specific context models. As a very obvious example, consider the search for pedestrians in inner-city and highway traffic: while in the former case one might have to look preferentially at the sidewalk, while
in the latter case one does not look for pedestrians at all since they are rarely encountered on highways.
Go to top of page
Multi-modal, weakly supervised learning
In object detection, training databases are usually created by human inspection of video images. therefore a sufficient number of training examples is hard to come by as the inspection ("labelling") process is very time consuming and therefore expensive.
Often, semi-automatic approaches are used that employ tracking methods to reduce human effort but, especially for multi-class problems like pedestrian pose classification, the number of examples for each class is still low.
What we need are therefore learning methods than can cope with absence of direct supervision in the form of crisp, symbolic labels ("pedestrian","cat","bike"). Instead, "weak supervision" signals need to be discovered
in a bunch of high-dimensional data provided by a processing system into which learning is usually embedded. In this sense, a system would no longer learn that a certain pattern class is a "car" but rather that it usually co-occurs with other events, such as, e.g.$
a certain pattern in another sensor stream. An initial step in this direction has been taken in this preliminary work
, using two simulated sensor streams, between which
the PRORPE learning algorithm detects correlated sub-spaces which are subsequently enhanced.
A currently ongoing effort is to transfer this to real-world data coming from the KITTI vehicles detection benchmark, and where the sensor streams are based
on visual and LIDAR information.
Go to top of page
Probabilistic information processing with recurrent neural hierarchies
Although no biological organism ever has all the information it needs at its disposal, we know that at least humans have the capacity to take optimal decisions even in the face of incomplete and noisy data. Here, "optimal" means that the probability for a correct decision is maximal when applying analyzing the problem using probability theory. This capacity of humans is somewhat surprising because individual neurons, or populations of them, do not seem very good candidates for performing the special kind of mathematical operations required for optimal decision making in probability theory. Basic issues are:
- how to represent probability densities? There is a variety of proposals around (population coding, sampling), each of which has some support from biology. Furthermore, the basic issues of neural information representation are still unresolved: do neurons primarily encode information in their firing rate, or in the precise timing of correlated spike sequences? And what information is encoded: a probability, a log-probability, or something totally different?
- neurons cannot multiply but perform mostly weighted sums. However probability theory often required the multiplication of probability densities which seems to be impossible with neurons.
- Strong lateral connections, which seem to be present in all known neocortical areas in humans, can be seen to prevent the exact representation of distributions.
I propose a simple way out of this dilemma by proposing that:
- neural ensembles do not represent densities but just the interpretation of the input that is most likely under an internal model (i.e., the sought-after density)
- the internal model is not expressed through neural activity but through lateral connections.
- lateral connections do not disrupt but support the representation of probabilities as they help select the input interpretation with the highest probability under the underlying density
- inputs which are probable under the internal model create activity faster than those who are less probable due to the lateral connections. This is coherent as these actually encode the internal model.
- sub-leading interpretations can be recovered in a descendingly ordered temporal sequence by applying feedback inhibition (under research)
The beauty of this approach is that it does not rely on the particular properties of a certain model, just on generic mechanisms like lateral competition. It is very probable that both spiking and non-spiking models can be parametrized to achieve the same effect. Furthermore, the probability of the leading input interpretation under the internal model is expressed by latency which can be easily transmitted and decoded by subsequent neural layers. In this way, deep hierarchies may be built which pass around latency information.This theoretical construct (see here
) has been applied to simple object recognition tasks so far (click here
) . It has, in my view, the potential to scale up to real-world recognition tasks, both conceptually and computationally. Stay tuned!
Go to top of page