Visit

Is sound fundamentally linked to meaning? When you hear “Kiki” what kind of shape comes to mind – one that is hard-pointed or spiky? What about “Bouba” – something more bulbous or round? This phenomenon is called “sound symbolism” and is part of the depths of human language as we process definitions as well as sounds – made famous by what is known as the Kiki-Bouba study.

Now, new Cornell Tech assistant professor of computer science Hadar Averbuch-Elor, is taking things a step further. Part of her research explores whether sound symbolism manifests in the context of artificial intelligence and machine learning. AI, after all, learns from humans.

When an iteration of the study was conducted in 2001—long before ChatGPT—researchers found more than 95% of people associated a rounded shape with Bouba and an angular shape with Kiki. Twenty years later Averbuch-Elor and her PhD student Morris Alper asked the same of machines. They set artificial intelligence models to the task of drawing a “Kiki-shaped object” or “Bouba-shaped object.” In response to the question, the models produced images that matched the human response.

“Machine learning raises many questions, including what these models learn from training on large amounts of data, such as images and associated text captions. It’s clear that these vision and language models have learned the Kiki-Bouba effect as part of their learning processes,” said Averbuch-Elor, who works at Cornell Tech through an appointment by Cornell’s Ann S. Bowers College of Computing and Information Science. “But why do people, and as a result, machine learning have these preferences?”

Although there is still debate regarding sound symbolism, Averbuch-Elor and Alper’s work supports their hypothesis that there is a connection between a sound and its meaning beyond the definition of a word, affecting the way people – and machines – construct language. These intrinsic rules about human vernacular then carry over to artificial language models as they are built and “learn” from human patterns. In the future, more research can be done to comprehend the mapping between sound and meaning in human language and computational tools like artificial intelligence.

LEARNING FROM PROFESSOR SNAVELY

During her undergraduate studies at the Technion Israel Institute of Technology, Averbuch-Elor realized she needed more than an undergraduate degree to advance in the tech sector, which eventually led her to attend Cornell Tech for her postdoctoral education.

At Cornell Tech, Averbuch-Elor conducted postdoctoral research focusing on computer graphics and vision research, working with Professor Noah Snavely’s team. Here, she enhanced her skills and ability to adapt her research to collaborate with and complement other related scholarship disciplines. Her research primarily centers on visual semantic understanding – the ability of machines to process the meaning and context of real-world information – of tasks related to images and vision. With Snavely, Averbuch-Elor was able to combine his expertise in 3D with high-level semantic understanding tasks and projects. As demonstrated with the Doppelganger research team Averbuch-Elor was on, the visual disambiguation work focused on detecting whether a pair of similar images “depict the same or distinct 3D surfaces.”

Doppelgangers are two people or entities who look exactly alike. In this research project, Averbuch-Elor and the team examined two images of the same subject matter that look the same but may observe the image from two different surfaces or perspectives. Take two digital images of Big Ben in London that, at first glance, look the same. The team’s unique approach to the problem was to identify whether the similar images observe the same surface of the Big Ben. Then, Averbuch-Elor and the research team developed a means to remove inaccurate, or doppelganger, image connections from downstream websites and reconstruction models.

“We were able to optimize and create 3D reconstructions that are a lot better and don’t have false structures. Now, future reconstruction frameworks can directly integrate our team’s solution into their pipeline. This can be a core part of what 3D reconstruction methods do to create outputs in the future,” she said.

FROM RESEARCH TO REAL WORLD

As Averbuch-Elor has held intern and research positions at Meta (formerly Facebook) and Amazon, she draws from her experience in academia and industry to advance that integration through her scholarship. Although the sectors can overlap, Averbuch-Elor says, “As a researcher in academia, it’s crucial to be aware of and enjoy what is happening in the industry to understand how it can advance our research and inform projects to meet and compete with the problems in the space.”

In alignment with current needs, real-world application of Averbuch-Elor’s research can help users create their vision through controllable models. Dealing largely with generative modeling, editing real 3D models is time-consuming, costly and requires extensive knowledge. Her research has acted as a replacement for an expert’s work, creating automatic, accessible pipelines to bring a user’s vision to fruition. “Really my 10-year-old daughter could use these pipelines to create her vision with 3D models,” Averbuch-Elor added.

INNOVATING AT CORNELL TECH

Averbuch-Elor’s career journey speaks to the draw of Cornell Tech for individuals looking to fuel a thriving technology industry. After teaching as a faculty member at Tel Aviv University for two years, she is returning to Cornell Tech as a faculty member.

“Even though New York is vast, we have a small, close community that I love on Roosevelt Island. There’s something unique about this experience that I can’t quite put into words,” she shared. “There are brilliant people here with fresh minds doing strong research, and it’s an environment that promotes cross-disciplinary research, comprehension, and innovation.”