How to Train AI to Generate Medicines and Vaccines

Scientists have developed artificial intelligence software that can create proteins that may be useful as vaccines, cancer treatments, or even tools for pulling carbon pollution out of the air. This research was led by the University of Washington School of Medicine and Harvard University.

The proteins we find in nature are amazing molecules, but designed proteins can do so much more,” said senior author David Baker, a professor of biochemistry at UW Medicine. “In this work, we show that machine learning can be used to design proteins with a wide variety of functions.

For decades, scientists have used computers to try to engineer proteins. Some proteins, such as antibodies and synthetic binding proteins, have been adapted into medicines to combat COVID-19. Others, such as enzymes, aid in industrial manufacturing. But a single protein molecule often contains thousands of bonded atoms; even with specialized scientific software, they are difficult to study and engineer. Inspired by how machine learning algorithms can generate stories or even images from prompts, the team set out to build similar software for designing new proteins. “The idea is the same: neural networks can be trained to see patterns in data. Once trained, you can give it a prompt and see if it can generate an elegant solution. Often the results are compelling — or even beautiful,” said lead author Joseph Watson, a postdoctoral scholar at UW Medicine.

The team trained multiple neural networks using information from the Protein Data Bank, which is a public repository of hundreds of thousands of protein structures from across all kingdoms of life. The neural networks that resulted have surprised even the scientists who created them.

Deep machine learning program hallucinating new ideas for vaccine molecules

The team developed two approaches for designing proteins with new functions. The first, dubbed “hallucination” is akin to DALL-E or other generative A.I. tools that produce new output based on simple prompts. The second, dubbed “inpainting,” is analogous to the autocomplete feature found in modern search bars and email clients.

Most people can come up with new images of cats or write a paragraph from a prompt if asked, but with protein design, the human brain cannot do what computers now can,” said lead author Jue Wang, a postdoctoral scholar at UW Medicine. “Humans just cannot imagine what the solution might look like, but we have set up machines that do.

To explain how the neural networkshallucinate’ a new protein, the team compares it to how it might write a book: “You start with a random assortment of words — total gibberish. Then you impose a requirement such as that in the opening paragraph, it needs to be a dark and stormy night. Then the computer will change the words one at a time and ask itself ‘Does this make my story make more sense?’ If it does, it keeps the changes until a complete story is written,” explains Wang.

Both books and proteins can be understood as long sequences of letters. In the case of proteins, each letter corresponds to a chemical building block called an amino acid. Beginning with a random chain of amino acids, the software mutates the sequence over and over until a final sequence that encodes the desired function is generated. These final amino acid sequences encode proteins that can then be manufactured and studied in the laboratory.

The research is published in the journal Science.


AI Detects Visual Signs Of Covid-19

Zhongnan Hospital of Wuhan University in Wuhan, China, is at the heart of the outbreak of Covid-19, the disease caused by the new coronavirus SARS-CoV-2 that has shut down cities in China, South Korea, Iran, and Italy. That’s forced the hospital to become a testbed for how quickly a modern medical center can adapt to a new infectious disease epidemic.

One experiment is underway in Zhongnan’s radiology department, where staff are using artificial intelligence software to detect visual signs of the pneumonia associated with Covid-19 on lung CT scan images. Haibo Xu, professor and chair of radiology at Zhongnan Hospital, says the software helps overworked staff screen patients and prioritize those most likely to have Covid-19 for further examination and testing 

Detecting pneumonia on a scan doesn’t alone confirm a person has the disease, but Xu says doing so helps staff diagnose, isolate, and treat patients more quickly. The software “can identify typical signs or partial signs of Covid-19 pneumonia,” he wrotel. Doctors can then follow up with other examinations and lab tests to confirm a diagnosis of the disease. Xu says his department was quickly overwhelmed as the virus spread through Wuhan in January.

The software in use at Zhongnan was created by Beijing startup Infervision, which says  its Covid-19 tool has been deployed at 34 hospitals in China and used to review more than 32,000 cases. The startup, founded in 2015 with funding from investors including early Google backer Sequoia Capital, is an example of how China has embraced applying artificial intelligence to medicine.

China’s government has urged development of AI tools for healthcare as part of sweeping national investments in artificial intelligence. China’s relatively lax rules on privacy allow companies such as Infervision to gather medical data to train machine learning algorithms in tasks like reading scans more easily than US or European rivals.

Infervision created its main product, software that flags possible lung problems on CT scans, using hundreds of thousands of lung images collected from major Chinese hospitals. The software is in use at hospitals in China, and being evaluated by clinics in Europe, and the US, primarily to detect potentially cancerous lung nodulesInfervision began work on its Covid-19 detector early in the outbreak after noticing a sudden shift in how existing customers were using its lung-scan-reading software. In mid-January, not long after the US Centers for Disease Control advised against travel to Wuhan due to the new disease, hospitals in Hubei Province began employing a previously little-used feature of Infervision’s software that looks for evidence of pneumonia, says CEO Kuan Chen. “We realized it was coming from the outbreak,” he says.


AI Classify Chest X-Rays With Human-Level Accuracy

Analyzing chest X-ray images with machine learning algorithms is easier said than done. That’s because typically, the clinical labels required to train those algorithms are obtained with rule-based natural language processing or human annotation, both of which tend to introduce inconsistencies and errors. Additionally, it’s challenging to assemble data sets that represent an adequately diverse spectrum of cases, and to establish clinically meaningful and consistent labels given only images.

In an effort to move forward the goalpost with respect to X-ray image classification, researchers at Google devised AI models to spot four findings on human chest X-rays: pneumothorax (collapsed lungs), nodules and masses, fractures, and airspace opacities (filling of the pulmonary tree with material). In a paper published in the journal Nature, the team claims the model family, which was evaluated using thousands of images across data sets with high-quality labels, demonstrated “radiologist-levelperformance in an independent review conducted by human experts.

The study’s publication comes months after Google AI and Northwestern Medicine scientists created a model capable of detecting lung cancer from screening tests better than human radiologists with an average of eight years experience, and roughly a year after New York University used Google’s Inception v3 machine learning model to detect lung cancer. AI also underpins the tech giant’s advances in diabetic retinopathy diagnosis through eye scans, as well as Alphabet subsidiary DeepMind’s AI that can recommend the proper line of treatment for 50 eye diseases with 94% accuracy.

This newer work tapped over 600,000 images sourced from two de-identified data sets, the first of which was developed in collaboration with Apollo Hospitals and which consists of X-rays collected over years from multiple locations. As for the second corpus, it’s the publicly available ChestX-ray14 image set released by the National Institutes of Health, which has historically served as a resource for AI efforts but which suffers shortcomings in accuracy.

The researchers developed a text-based system to extract labels using radiology reports associated with each X-ray, which they then applied to provide labels for over 560,000 images from the Apollo Hospitals data set. To reduce errors introduced by the text-based label extraction and provide the relevant labels for a number of ChestX-ray14 images, they recruited radiologists to review approximately 37,000 images across the two corpora.

Google notes that while the models achieved expert-level accuracy overall, performance varied across corpora. For example, the sensitivity for detecting pneumothorax among radiologists was approximately 79% for the ChestX-ray14 images, but was only 52% for the same radiologists on the other data set.

Chest X-ray depicting a pneumothorax identified by Google’s AI model and the panel of radiologists, but missed by individual radiologists. On the left is the original image, and on the right is the same image with the most important regions for the model prediction highlighted in orange

The performance differences between datasets … emphasize the need for standardized evaluation image sets with accurate reference standards in order to allow comparison across studies,” wrote Google research scientist Dr. David Steiner and Google Health technical lead Shravya Shetty in a blog post, both of whom contributed to the paper. “[Models] often identified findings that were consistently missed by radiologists, and vice versa. As such, strategies that combine the unique ‘skills’ of both the [AI] systems and human experts are likely to hold the most promise for realizing the potential of AI applications in medical image interpretation.”

The research team hopes to lay the groundwork for superior methods with a corpus of the adjudicated labels for the ChestX-ray14 data set, which they’ve made available in open source. It contains 2,412 training and validation set images and 1,962 test set images, or 4,374 images in total.

We hope that these labels will facilitate future machine learning efforts and enable better apples-to-apples comparisons between machine learning models for chest X-ray interpretation,” wrote Steiner and Shetty.