Jia-Bin Huang, assistant professor in the Bradley Department of Electrical and Computer Engineering and a faculty member at the Discovery Analytics Center, has received a Google Faculty Research Award to support his work in detecting human-object interaction in images and videos.
According to Huang, while significant progress has been made in classifying, detecting, and segmenting objects, representing images/videos as a collection of isolated object instances has failed to capture the information essential for understanding activity.
"By improving the model and scaling up the training, we aim to move a step further toward building socially intelligent machines," Huang said.
Given an image or a video, the goal is to localize persons and object instances, as well as recognize interaction, if any, between each pair of a person and an object.
This provides a structured representation of a visually grounded graph over the humans and the object instances they interact with.
For example: Two men are next to each other on the sidelines of a tennis court, one standing up and holding an umbrella and one sitting on a chair holding a tennis racquet and looking at a bag on the ground beside him.