How to Train AI to Generate Medicines and Vaccines

Scientists have developed artificial intelligence software that can create proteins that may be useful as vaccines, cancer treatments, or even tools for pulling carbon pollution out of the air. This research was led by the University of Washington School of Medicine and Harvard University.

The proteins we find in nature are amazing molecules, but designed proteins can do so much more,” said senior author David Baker, a professor of biochemistry at UW Medicine. “In this work, we show that machine learning can be used to design proteins with a wide variety of functions.

For decades, scientists have used computers to try to engineer proteins. Some proteins, such as antibodies and synthetic binding proteins, have been adapted into medicines to combat COVID-19. Others, such as enzymes, aid in industrial manufacturing. But a single protein molecule often contains thousands of bonded atoms; even with specialized scientific software, they are difficult to study and engineer. Inspired by how machine learning algorithms can generate stories or even images from prompts, the team set out to build similar software for designing new proteins. “The idea is the same: neural networks can be trained to see patterns in data. Once trained, you can give it a prompt and see if it can generate an elegant solution. Often the results are compelling — or even beautiful,” said lead author Joseph Watson, a postdoctoral scholar at UW Medicine.

The team trained multiple neural networks using information from the Protein Data Bank, which is a public repository of hundreds of thousands of protein structures from across all kingdoms of life. The neural networks that resulted have surprised even the scientists who created them.

Deep machine learning program hallucinating new ideas for vaccine molecules

The team developed two approaches for designing proteins with new functions. The first, dubbed “hallucination” is akin to DALL-E or other generative A.I. tools that produce new output based on simple prompts. The second, dubbed “inpainting,” is analogous to the autocomplete feature found in modern search bars and email clients.

Most people can come up with new images of cats or write a paragraph from a prompt if asked, but with protein design, the human brain cannot do what computers now can,” said lead author Jue Wang, a postdoctoral scholar at UW Medicine. “Humans just cannot imagine what the solution might look like, but we have set up machines that do.

To explain how the neural networkshallucinate’ a new protein, the team compares it to how it might write a book: “You start with a random assortment of words — total gibberish. Then you impose a requirement such as that in the opening paragraph, it needs to be a dark and stormy night. Then the computer will change the words one at a time and ask itself ‘Does this make my story make more sense?’ If it does, it keeps the changes until a complete story is written,” explains Wang.

Both books and proteins can be understood as long sequences of letters. In the case of proteins, each letter corresponds to a chemical building block called an amino acid. Beginning with a random chain of amino acids, the software mutates the sequence over and over until a final sequence that encodes the desired function is generated. These final amino acid sequences encode proteins that can then be manufactured and studied in the laboratory.

The research is published in the journal Science.