By Louis DiPietro
Today’s artificial intelligence models can’t even tie their own shoes.
In new research that puts the latest models to test in a 3D environment, Cornell scholars found that AI fares well with untangling basic knots but can’t quite tie knots from simple loops nor convert one knot to another.
The findings suggest that, for its value in generating text- and image-based information, AI still has a long way to go with spatial reasoning and manipulation, which will prove essential in other AI-powered areas like robotics.
“With current AI, it works great with big blocks of text. Once moved to reason in the 3D world, AI breaks,” said Zoe (Zizhao) Chen, a doctoral student in the field of computer science at Cornell Tech and lead author of “Knot So Simple: A Minimalistic Environment for Spatial Reasoning,” which was presented at the Annual Conference on Neural Information Processing Systems (NeurIPS) on Dec. 5 in San Diego, California. “Most reasoning we see from AI today is text-based. That’s great, but it’s not enough.”
In the paper, Chen and Yoav Artzi, associate professor of computer science at Cornell Tech and paper co-author, present KnotGym, a 3D simulator to test different kinds of reinforcement learning models and large language models (LLM), like GPT-4, in a virtual environment.