Designing a 'neural puppeteer' to recognize skeletal nodes 1

Imagine for a moment, that we are on a safari watching a giraffe graze. After looking away for a second, we then see the animal lower its head and sit down. But, we wonder, what happened in the meantime? Computer scientists from the University of Konstanz’s Centre for the Advanced Study of Collective Behaviour have found a way to encode an animal’s pose and appearance in order to show the intermediate motions that are statistically likely to have taken place.

One key problem in computer vision is that images are incredibly complex. A giraffe can take on an extremely wide range of poses. On a safari, it is usually no problem to miss part of a motion sequence, but, for the study of collective behavior, this information can be critical. This is where computer scientists with the new model “neural puppeteer” come in.

Predictive silhouettes based on 3D points

“One idea in computer vision is to describe the very complex space of images by encoding only as few parameters as possible,” explains Bastian Goldlücke, professor of computer vision at the University of Konstanz. One representation frequently used until now is the skeleton.

In a new paper published in the Proceedings of the 16th Asian Conference on Computer Vision, Bastian Goldlücke and doctoral researchers Urs Waldmann and Simon Giebenhain present a neural network model that makes it possible to represent motion sequences and render full appearance of animals from any viewpoint based on just a few key points. The 3D view is more malleable and precise than the existing skeleton models.

“The idea was to be able to predict 3D key points and also to be able to track them independently of texture,” says doctoral researcher Urs Waldmann. “This is why we built an AI system that predicts silhouette images from any camera perspective based on 3D key points.”