AI program plays the long game to solve decades-old math problems

February 12, 2025

The GIST

Editors' notes

AI program plays the long game to solve decades-old math problems

by Whitney Clavin ,

Credit: AI-generated image

A game of chess requires its players to think several moves ahead, a skill that computer programs have mastered over the years. Back in 1996, an IBM supercomputer famously beat the then world chess champion Garry Kasparov. Later, in 2017, an artificial intelligence (AI) program developed by Google DeepMind, called AlphaZero, triumphed over the best computerized chess engines of the time after training itself to play the game in a matter of hours.

More recently, some mathematicians have begun to actively pursue the question of whether AI programs can also help in cracking some of the world's toughest math problems. But, whereas an average game of chess lasts about 30 to 40 moves, these research-level math problems require solutions that take a million or more steps, or moves.

In a paper on the arXiv preprint server, a team led by Caltech's Sergei Gukov, the John D. MacArthur Professor of Theoretical ��ics and Mathematics, describes developing a new type of machine-learning algorithm that can solve math problems requiring extremely long sequences of steps. The team used their new algorithm to solve families of problems related to an overarching decades-old math problem called the Andrews–Curtis conjecture. In essence, the algorithm can think farther ahead than even advanced programs like AlphaZero.

"Our program aims to find long sequences of steps that are rare and hard to find," says study first author Ali Shehper, a postdoctoral scholar at Rutgers University who will soon join Caltech as a research scientist. "It's like trying to find your way through a maze the size of Earth. These are very long paths that you have to test out, and there's only one path that works."

The use of AI to solve math problems has become increasingly popular. Google DeepMind's AlphaProof performed at the level of a silver medalist in the 2024 International Mathematical Olympiad, a high-school level math competition. And OpenAI's o3 program recently reasoned its way through benchmark problems in math, science, and computer programming.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

The Caltech-led mathematicians are focusing not on routine problems but rather the toughest in their field. In the new study, they used AI to solve two families of problems within the Andrews–Curtis conjecture, a group theory problem first proposed 60 years ago.

While they did not solve the main conjecture itself, they disproved families of problems, referred to as potential counterexamples, which had remained open for roughly 25 years; they also made significant progress on another family of counterexamples that has been open for 44 years. Counterexamples are basically mathematical cases that would disprove an original conjecture. If the counterexamples themselves are disproved, then the original conjecture may still be true.

"Ruling out some of the counterexamples gives us confidence in the validity of the original conjecture and helps build our intuition about the main problem. It gives us new ways to think about it," Shehper says.

Gukov says that navigating these math problems is like "getting from A to B" via convoluted routes that require thousands, millions, or even billions of steps. He compares the problems to solving an incredibly complex Rubik's Cube.

"Can you take this scrambled, complicated Rubik's Cube and get it back to its original state? You have to test out these very long sequences of moves, and you won't know if you are on the right path until the very end," says Gukov, who is also the director of Caltech's new Richard N. Merkin Center for Pure and Applied Mathematics.

The maximum increase in the length of a presentation relative to its initial length along the AC trivialization path. The increase is plotted as a function of the initial length of the presentation on the left and as a function of n on the right. Credit: arXiv (2024). DOI: 10.48550/arxiv.2408.15332

The team's AI program learned to come up with long sequences of moves—what the researchers termed "super moves"—that are unexpected, or what the researchers call outliers. This contrasts with how AI programs like ChatGPT operate.

"If you ask ChatGPT to write a letter, it will come up with something typical. It's unlikely to come up with anything unique and highly original. It's a good parrot," Gukov says. "Our program is good at coming up with outliers."

To train their AI program, the researchers used a machine-learning model known as reinforcement learning. First, the team showed the AI easy problems to solve, and then progressively gave it harder and harder problems.

"It tries various moves and gets rewarded for solving the problems," Shehper explains. "We encourage the program to do more of the same while still keeping some level of curiosity. In the end, it develops new strategies that are better than what humans can do. That's the magic of reinforcement learning."

At present, AI programs are typically not very good at predicting outlying, rare events that have dramatic consequences, such as financial market crashes. The team's new algorithm cannot make predictions like this either, but it may contain the seeds of what would be required to make intelligent predictions of this nature. "Basically, our program knows how to learn to learn," Gukov says. "It's thinking outside the box."

The team's new algorithm has already made a big splash in the math community.

"We made a lot of improvements in an area of math that was decades old," Gukov says. "Progress had been relatively slow, but now it's hustling and bustling."

In fact, three new mathematicians have joined the team—Lucas Fagan and Zhenghan Wang of UC Santa Barbara, and Yang Qiu of Nankai University in Tianjin, China—and the group has posted another preprint paper that reports solving even more families of potential counterfactuals belonging to the Andrews–Curtis conjecture.

Rather than scale up the AI models, the team's approach has been to find new clever tricks and strategies that do not require large amounts of computing power.

"We try to demonstrate good performance on small-scale computers, easily accessible to a small academic collaboration, so that any of our colleagues around the globe can easily reproduce these results," say the researchers.

More information: Ali Shehper et al, What makes math problems hard for reinforcement learning: a case study, arXiv (2024).

Journal information: arXiv

Provided by California Institute of Technology

��

AI program plays the long game to solve decades-old math problems

Beyond intuition: Using mathematical models to shape behavior

Mapping our emotionally divided society: Mathematical model helps explain polarization

Facebook data used to study global human migration patterns

Mathematician solves algebra's oldest problem using intriguing new number sequences

��iology-inspired networks could improve political decision-making

2.1 kids per woman might not be enough for population survival, new research reveals

Language structure shapes color-adjective links even for people born blind, study reveals

Dinosaur eggshell study confirms biogenic origin of secondary eggshell units

3D genome mapping technology sheds light on how plants regulate photosynthesis

Custom-designed polymers open new path to electrochemical separations for sustainable drug manufacturing

Virtual model of a Venusian pancake dome shows it likely formed due to elastic lithosphere and dense lava

Bedrock analysis reveals evidence for lost mountains from Antarctica's past

Monthly mapping of 1.4 million lakes shows seasonality is dominant driver of surface size changes

Analytical model clarifies exciton dynamics to improve OLED efficiency and lifespan

Astronomers discover new evidence of intermediate-mass black holes

Atomic-level view of plant cell death enzyme offers path to safer crop protection

Immune restraint through the ages: PD-1 pathway traced to sharks

Olfactory neurons use unexpected 'solid' clusters to achieve genetic precision

Get Instant Summarized Text (GIST)