Research – Paper 359

Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions

Stephan Baier, Volker Tresp and Yunpu Ma

Research

clock_eventOctober 24, 2017, 10:50.
house Lehár 1-3
download Download paper (preprint)

Abstract

Structured scene descriptions of images are useful for the automatic processing and querying of large image databases. We show how the combination of a statistical semantic model and a visual model can improve on the task of mapping images to their associated scene description. In this paper we consider scene descriptions which are represented as a set of triples (subject, predicate, object), where each triple consists of a pair of visual objects, which appear in the image, and the relationship between them (e.g. man-riding-elephant, man-wearing-hat). We combine a standard visual model for object detection, based on convolutional neural networks, with a latent variable model for link prediction. We apply multiple state-of-the-art link prediction methods and compare their capability for visual relationship detection. One of the main advantages of link prediction methods is that they can also generalize to triples which have never been observed in the training data. Our experimental results on the recently published Stanford Visual Relationship dataset, a challenging real world dataset, show that the integration of a statistical semantic model using link prediction methods can significantly improve visual relationship detection. Our combined approach achieves superior performance compared to the state-of-the-art method from the Stanford computer vision group.

1
Leave a Reply (Click here to read the code of conduct)

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Guest
Svitlana

#dataset Stanford visual relationship detection dataset Lu et al. 2016