It is the holy grail of modern medicine: identifying the alterations in the genome that cause the appearance of diseases of genetic origin. The task is not easy, there are thousands of mutations in each person regarding the genetic information that he inherited from his parents. Most are benign, but there is a percentage that can be pathogenic. Now, researchers from Google DeepMind, Alphabet’s artificial intelligence company, have cataloged 71 million of these mutations. The program was also able to classify them, finding that a third could modify the functioning of the proteins, causing serious pathologies.
DNA contains the instructions for the development of every living thing.. This book contains each of his recipes for creating cells, organs and functions in the form of sequences of their basic components. These basic components, the building blocks of life, are proteins. They are made up of series of amino acids, sometimes hundreds, which in turn are made up of trios of nucleotides, the letters of the genetic alphabet. When one of these nucleotides is replaced by another in one type of mutation, it is called a nonsense variant. For the most part, these variants do not affect the function of the protein. But in other cases, the mutation is catastrophic, degenerating into pathologies with genetically based amyotrophic lateral sclerosis (ALS) or sickle cell anemia.
Until now, about 4 million of these antisense variants had been identified (missense variants, in English) in the 19,233 proteins that make up each human being. But only in 2% of cases had they been noted, that is, when it is known if they are benign (the majority) or if they can be a source of diseases. Now, artificial intelligence (AI) has multiplied by 18 the number of known variants and classified most of them by potential impact on protein function.
The authors of this achievement, published in the prestigious scientific journal Science, are DeepMind scientists. It is the same group that developed AlphaFold a few years ago, a AI program capable of predicting the structure of almost all proteins and considered one of the greatest advances in computational biology. What they have done now has been to redesign and reorient it to detect antisense mutations in protein expression. Furthermore, in its training, the new tool, AlphaMissense, classifies with high probability the impact that this variant may have on the function of the protein.
Deep Mind researcher Jung Chen, first author of the study, explains what AlphaMissense does: “We knew that AlphaFold was a very good model for predicting the three-dimensional structure of proteins from a massive sequence. We also knew that this 3D structure of proteins is very important for their function, basically revealing what it is,” explains Chen. If its function can be deduced from the structure, any alteration in that structure could be the result of a mutation. And another fundamental piece is AlphaMissense’s ability to learn from the evolutionary limitations of related sequences. That is, evolution has shaped what the structure of a protein can be and how it should not be if you do not want problems. To improve its knowledge of this aspect, the system was trained with the structures of human and primate proteins. “Through training, you see millions of protein sequences and learn what a normal protein sequence looks like. And when we are given one with a mutation, it can tell us if it is bad or not,” he adds.
Cheng ends up making a comparison: “This is very similar to human language. If we substitute a word in an English sentence, a person who is familiar with the language can immediately see whether this word substitution will change the meaning of the sentence or not.” His AlphaMissense was able to classify 89% of the 71 million missense variants he identified. Of them, 57% were probably benign and a third were probably pathogenic. Of the remaining 11%, AI would not know its impact. “The model assigns a score between zero and one to each of the variants and indicates the probability that the variant is pathogenic. By pathogen, we mean that our pathogenic variant is more likely to be associated with or cause a disease,” the scientist details.
Cheng’s clarifications highlight both the strength of AlphaMissense, its very high capacity to classify variants, and one of its weaknesses: the percentages refer to probabilities. Until the era of powerful computers and AI, characterizing the structure of a protein, or its mutations, was a titanic job. Before the arrival of these technologies, the structure of about 200,000 proteins had been determined, a work that took 60 years and the participation of thousands of scientists. To do so required many laboratory hours or the use of particle accelerators. But they were real observations, of the real structure of a real protein. In the case of computational biology, they are virtual proteins and variants, which must then be confirmed. In the case of AlphaMissense, the precision achieved for its calculations is 90%.
“Understanding the disease”
Regarding possible applications, Žiga Avsec, also from DeepMind and senior co-author of the study, said in an online conference that, “the first step in finding treatments is to try to understand the disease well and for both complex and rare diseases , that means finding genes associated with them.” For Avsec, tools like AlphaMissense “can help us try, better identify variants, help us discover potentially new genes; By better understanding genetics, we will be able to have stronger opinions about some genes that before we may not have been sure were related to the disease.” “That’s the general idea, through better genetics, finding new genes, getting additional statistical power to detect new associations, but that won’t directly lead to new drugs as such,” she added.
A few days ago, the analysis of the 200 million proteins discovered by AlphaFold last year. The Spanish bioinformatician, Íñigo Barrio, participated in this key analysis. “AlphaFold changed the world,” says Barrio, who is not so enthusiastic about AlphaMissense. “It is relevant, it is a new way to evaluate variants and it could be used to monitor rare diseases. But there are already other prediction software.” Barrio also highlights one of the limitations of this artificial intelligence. AlphaMissense catalogs antisense variants individually, but many of the pathologies with a genetic basis “are the product of the combination of several of these mutations,” he recalls.
A similar opinion is expressed by biologist José Antonio Márquez, who directs the Crystallography Platform of the European Molecular Biology Laboratory: “It is one of the applications of the method (AlphaFold), perhaps it is not so relevant at a scientific level, but it is in the sense of begin to transfer a discovery into possible applications.” Among these applications, Márquez highlights its use to accelerate “research in genetic diseases and particularly rare diseases, since it helps generate hypotheses about the mechanism that causes the disease.”