萌妹社区


AI model simulates 500 million years of evolution to generate a new fluorescent protein

AI model generates code for previously unknown bright fluorescent protein
Multimodal protein editing with ESM3. Credit: Science (2025). DOI: 10.1126/science.ads0018

A team of AI researchers, biologists and evolutionary specialists at EvolutionaryScale and the Arc Institute, both in the U.S., has designed and built an AI model capable of generating the code to synthesize novel proteins. In their paper in the journal Science, the group describes the factors that went into developing their new AI model, which they call ESM3, and how they used it to synthesize a previously unknown bright, fluorescent protein.

Prior research has shown that synthesizing proteins can provide unique insights into the structure and function of natural proteins. To date, most such proteins are copies of those found in nature. For this new study, the researchers used an AI model to mimic the evolutionary process of a protein that never existed naturally.

Generating offers the possibility of new avenues of research, both in better understanding the nature of proteins and their uses and developing novel applications. The research team used data about existing proteins as a basis for generating new proteins.

ESM3 is a multimodal generative language model, which means that, like its chatbot cousins, it learns about the nature of things when trained on massive amounts of data. In this case, the multimodal generative language model was trained on 771 billion tokens generated from 3.15 billion , 236 million protein structures and 539 million protein annotations.

According to the researchers, this was like giving the model 500 million years of evolutionary knowledge, which allowed it to start with basic code that evolved over virtual time into a modern virtual protein. The virtual protein was then converted to a real-world artificial protein using standard protein synthesis techniques. The result was a protein with a genetic sequence that was different from other known proteins.

The research team specifically asked their model to generate a new green 鈥攐ther such proteins, which fluoresce under , are often used as markers. The team named the new protein esmGFP. They suggest their model and others like it could be used to create new proteins for use in medicine, and a wide variety of other applications.

More information: Thomas Hayes et al, Simulating 500 million years of evolution with a language model, Science (2025).

Journal information: Science

漏 2025 Science X Network

Citation: AI model simulates 500 million years of evolution to generate a new fluorescent protein (2025, January 21) retrieved 23 May 2025 from /news/2025-01-ai-simulates-million-years-evolution.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A new tool for protein sequence generation and design

91 shares

Feedback to editors