We present Dense Object Nets, which build on recent developments in self-supervised dense descriptor learning, as a consistent object representation for visual understanding and manipulation.
Investigating inside the human body often requires cutting open a patient or swallowing long tubes with built-in cameras. But what if physicians could get a better glimpse in a less expensive, invasive, and time-consuming manner?
Amateur and professional musicians alike may spend hours pouring over YouTube clips to figure out exactly how to play certain parts of their favorite songs. But what if there were a way to play a video and isolate the only instrument you wanted to hear?
Getting robots to do things isn’t easy: Usually, scientists have to either explicitly program them or get them to understand how humans communicate via language.
But what if we could control robots more intuitively, using just hand gestures and brainwaves?