Powerful new technologies emerge when human experts and Artificial Intelligence (AI) collaborate. Cornell Tech Associate Dean Serge Belongie is a pioneer in this approach to innovation. Belongie helped establish the tech campus in 2013 and has been a faculty member ever since. In April, he was appointed the inaugural Andrew H. and Ann R. Tisch Professor in the Department of Computer Science

Belongie’s research in computer vision, machine learning, and augmented reality is motivated by his desire to build technology with a human dimension that serves people in beneficial ways. “That was part of the vision when the campus was created. I’ve clicked with it and I feel very at home,” he said. 

When computer vision meets bird watching

Belongie completed his Ph.D. in Electrical Engineering & Computer Sciences at UC Berkeley in 2000. He spent twelve years as faculty at UC San Diego before embarking on a collaboration with the Cornell Lab of Ornithology. The partnership grew out of Belongie’s interest in fine-grained visual categorization, a technique that combines computing and expert human knowledge to identify objects within subcategories– such as bird species– rather than at generic levels.

“To go down that path, you need to have powerful AI and really smart human experts and so a bunch of different people pointed me toward birding as the place to find that,” said Belongie.

The project, called Merlin, was a collaboration between Cornell Lab of Ornithology, UC San Diego, Northeastern University, Caltech, and UC Berkeley.  The partners had already produced an app that classified birds using a system of field guide questions, such as breast color and bill shape. Belongie’s students embedded with the team to develop a classification method based on uploaded images. “We borrowed their infrastructure of asking those questions but we provided the image and the computer vision system tried to help make it faster,” he said.

Two years into Belongie’s work on Merlin, deep learning exploded onto the scene and revolutionized the identification process. “It turned out you didn’t need to label anything other than the whole image,” he said. “We just threw it all away. No more questions, no more field guide stuff. Just put in the image, pop out the answer.”

This allowed Belongie and his team to move their focus from recognition algorithms onto methods for capturing and sharing visual expertise. The outcome was Visipedia.

Visipedia uses machine learning to harness crowdsourced expertise for the classification of visual data, such as images of flora and fauna. Communities of experts– from keen hobbyists to academics– gather and annotate data while machine learning trains and evaluates the system.

Visipedia’s approach is to go beyond basics and focus on more complex levels of categorization that are of interest to specialist communities. When expert crowds are challenged they become mobilized to contribute. “They don’t want to waste their time on tons of American robins. Their attitude might be more like, ‘Save it. My time is valuable. When you have a really tricky case, bring it to my attention,'” said Belongie.  

The system has been popularly deployed in Merlin’s Bird Photo ID app and iNaturalist which allow users to identify 1000s of species of birds, insects, and animals using smartphones. 

Future vision and entrepreneurship 

Google supported Visipedia for six years and Belongie remains a member of the Visiting Faculty Researcher program at Google Research. Belongie and the team behind Visipedia are now looking at how they can support collaborations between expert communities and big tech companies, allowing the approach to be scaled up on a level that would not be possible within merely an academic lab.

On campus, Belongie is encouraging his students to look to Augmented Reality and Virtual Reality (AR/VR). For years, computer vision has been based on data sets that are captured with cell phones or pulled off the internet, but that is about to change, he said. “That’s how things have been done for a long time but when you’re talking about AR/VR, it’s always-on wearable cameras constantly moving around the world.”

To support future research into AR/VR, computer vision, and human-computer interaction, Belongie has been involved in the creation of a new cross-campus initiative called the Mixed Reality Collaboratory.

Cornell Tech is the ideal environment for this interdisciplinary approach to human-focused technology and entrepreneurship, he said. “At any given time, I’m advising lots of little start-ups that pop out of Cornell Tech, and the Google interaction has been a fantastic way to amplify what Visipedia was doing at a small scale.”