Visit
By Grace Stanley

Andrew Owens wants computers to learn like humans do — by watching, listening, and feeling the world around them.

As an associate professor of computer science at Cornell Tech and the Cornell Bowers College of Computing and Information Science, Owens is building systems that can perceive the world through sight, sound, and touch, without needing human-labeled training data. His research has led to surprising and playful creations, from AI that generates soundtracks for silent videos to models that create visual illusions, like images that flip between a penguin and a giraffe depending on your perspective.

But beneath the whimsy is a serious goal: to make AI systems more intuitive, more autonomous, and more aligned with how humans experience reality. Owens’ work has implications for everything from robotics to misinformation detection — including recent efforts to identify AI-generated images.

A Cornell alum with a bachelor’s degree in computer science, Owens returns to the university after earning a Ph.D. from the Massachusetts Institute of Technology, completing a run as a postdoctoral scholar at the University of California, Berkeley, and serving as a faculty member at the University of Michigan. He’s a recipient of both a Sloan Research Fellowship and a National Science Foundation Faculty Early Career Development Award.

Owens is excited to be in New York City, where tech and AI communities offer ample opportunity for real-world impact. Read more about his work in the Q&A below.

What is your academic and research focus?

My research deals with creating multimodal systems that learn to see, hear, and touch without human teachers. The systems that I study mostly learn from co-occurring sensory signals — for example, the correlations between the visual and audio streams of a video. My group applies these techniques to a variety of problems, particularly in computer vision, but also in audio processing and robotics.

What motivated you to come to Cornell Tech?

I was a computer science undergraduate at Cornell and had an amazing experience, so I’m honored to be back and working with the wonderful people here. Beyond that, I’m happy to be at a place where it’s possible to do ambitious research projects in an extremely collaborative and supportive environment, with a strong connection to the community.

What are you most looking forward to about working in New York City?

I’m excited to be a part of New York’s incredibly vibrant AI and tech communities. I think that these connections will make it much easier to direct our research efforts toward important real-world problems.

What past professional work are you most proud of and why?

For many years, I’ve been developing new methods for learning from paired images and sound. For example, as part of my Ph.D., I created a computer vision system that learned about the world by generating soundtracks from silent videos of a person striking objects with a drumstick. What excites me about this direction going forward is that it provides a way to potentially learn from the vast amounts of video that are available in the world.

More recently, I’ve had a lot of fun creating methods for generating visual illusions. I’ve really enjoyed seeing all of the unexpected solutions that our models come up with when we give them seemingly impossible constraints — say, an image that looks like a penguin from one perspective, but then transforms into a giraffe when flipped upside down.

What scientific questions are you looking to answer next?

One direction that I’m excited about is creating new multimodal learning methods that can support robotics and enable systems to continue learning about the world over time without human supervision. I’m also interested in reducing some of the negative impacts of AI systems. For example, one of my lab’s major recent research directions has been developing methods for detecting AI-generated images.

Which courses are you most looking forward to teaching?

Currently, I’m teaching computer vision. Beyond the applications themselves, which I think are among the most compelling in computer science, it’s a great way to learn about applications of machine learning and signal processing. In the future, I’m looking forward to developing new courses on multimodal perception and generative models for computer vision.

Grace Stanley is the staff writer-editor for Cornell Tech.