ÃÈÃÃÉçÇø

February 12, 2025

AI program plays the long game to solve decades-old math problems

Credit: AI-generated image
× close
Credit: AI-generated image

A game of chess requires its players to think several moves ahead, a skill that computer programs have mastered over the years. Back in 1996, an IBM supercomputer famously beat the then world chess champion Garry Kasparov. Later, in 2017, an artificial intelligence (AI) program developed by Google DeepMind, called AlphaZero, triumphed over the best computerized chess engines of the time after training itself to play the game in a matter of hours.

More recently, some mathematicians have begun to actively pursue the question of whether AI programs can also help in cracking some of the world's toughest problems. But, whereas an average game of chess lasts about 30 to 40 moves, these research-level math problems require solutions that take a million or more steps, or moves.

In a paper on the arXiv preprint server, a team led by Caltech's Sergei Gukov, the John D. MacArthur Professor of Theoretical ÃÈÃÃÉçÇøics and Mathematics, describes developing a new type of machine-learning algorithm that can solve math problems requiring extremely long sequences of steps. The team used their to solve families of problems related to an overarching decades-old math problem called the Andrews–Curtis conjecture. In essence, the algorithm can think farther ahead than even advanced programs like AlphaZero.

"Our program aims to find long sequences of steps that are rare and hard to find," says study first author Ali Shehper, a postdoctoral scholar at Rutgers University who will soon join Caltech as a research scientist. "It's like trying to find your way through a maze the size of Earth. These are very long paths that you have to test out, and there's only one path that works."

The use of AI to solve math problems has become increasingly popular. Google DeepMind's AlphaProof performed at the level of a silver medalist in the 2024 International Mathematical Olympiad, a high-school level math competition. And OpenAI's o3 program recently reasoned its way through benchmark problems in math, science, and computer programming.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

The Caltech-led mathematicians are focusing not on routine problems but rather the toughest in their field. In the new study, they used AI to solve two families of problems within the Andrews–Curtis conjecture, a group theory problem first proposed 60 years ago.

While they did not solve the main conjecture itself, they disproved families of problems, referred to as potential counterexamples, which had remained open for roughly 25 years; they also made significant progress on another family of counterexamples that has been open for 44 years. Counterexamples are basically mathematical cases that would disprove an original conjecture. If the counterexamples themselves are disproved, then the original conjecture may still be true.

"Ruling out some of the counterexamples gives us confidence in the validity of the original conjecture and helps build our intuition about the main problem. It gives us new ways to think about it," Shehper says.

Gukov says that navigating these math problems is like "getting from A to B" via convoluted routes that require thousands, millions, or even billions of steps. He compares the problems to solving an incredibly complex Rubik's Cube.

"Can you take this scrambled, complicated Rubik's Cube and get it back to its original state? You have to test out these very long sequences of moves, and you won't know if you are on the right path until the very end," says Gukov, who is also the director of Caltech's new Richard N. Merkin Center for Pure and Applied Mathematics.

The maximum increase in the length of a presentation relative to its initial length along the AC trivialization path. The increase is plotted as a function of the initial length of the presentation on the left and as a function of n on the right. Credit: arXiv (2024). DOI: 10.48550/arxiv.2408.15332
× close
The maximum increase in the length of a presentation relative to its initial length along the AC trivialization path. The increase is plotted as a function of the initial length of the presentation on the left and as a function of n on the right. Credit: arXiv (2024). DOI: 10.48550/arxiv.2408.15332

The team's AI program learned to come up with long sequences of moves—what the researchers termed "super moves"—that are unexpected, or what the researchers call outliers. This contrasts with how AI programs like ChatGPT operate.

"If you ask ChatGPT to write a letter, it will come up with something typical. It's unlikely to come up with anything unique and highly original. It's a good parrot," Gukov says. "Our program is good at coming up with outliers."

To train their AI program, the researchers used a machine-learning model known as reinforcement learning. First, the team showed the AI easy problems to solve, and then progressively gave it harder and harder problems.

"It tries various moves and gets rewarded for solving the problems," Shehper explains. "We encourage the program to do more of the same while still keeping some level of curiosity. In the end, it develops new strategies that are better than what humans can do. That's the magic of reinforcement learning."

At present, AI programs are typically not very good at predicting outlying, rare events that have dramatic consequences, such as financial market crashes. The team's new algorithm cannot make predictions like this either, but it may contain the seeds of what would be required to make intelligent predictions of this nature. "Basically, our program knows how to learn to learn," Gukov says. "It's thinking outside the box."

The team's new algorithm has already made a big splash in the math community.

"We made a lot of improvements in an area of math that was decades old," Gukov says. "Progress had been relatively slow, but now it's hustling and bustling."

In fact, three new mathematicians have joined the team—Lucas Fagan and Zhenghan Wang of UC Santa Barbara, and Yang Qiu of Nankai University in Tianjin, China—and the group has posted another preprint paper that reports solving even more families of potential counterfactuals belonging to the Andrews–Curtis conjecture.

Rather than scale up the AI models, the team's approach has been to find new clever tricks and strategies that do not require large amounts of computing power.

"We try to demonstrate good performance on small-scale computers, easily accessible to a small academic collaboration, so that any of our colleagues around the globe can easily reproduce these results," say the researchers.

More information: Ali Shehper et al, What makes math problems hard for reinforcement learning: a case study, arXiv (2024).

Journal information: arXiv

Load comments (1)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

fact-checked
preprint
trusted source
proofread

Get Instant Summarized Text (GIST)

An AI program has been developed to tackle complex math problems requiring long sequences of steps, such as those related to the Andrews–Curtis conjecture. This algorithm, using reinforcement learning, can identify rare and difficult-to-find solutions, termed "super moves." While it hasn't solved the main conjecture, it has disproved several long-standing potential counterexamples, advancing understanding in the field. The approach emphasizes innovative strategies over scaling computational power, making it accessible for broader academic use.

This summary was automatically generated using LLM.