captionless image

Reference

How the team at DeepMind are changing the world

The team at Google’s DeepMind are one of the impactful groups in the field of AI today. From beating the world champion of Go (A very hard Chinese board game), to beating the world champion of StarCraft, this team is truly ready to overcome any challenge. While beating up nerds is a very honourable cause, DeepMind has also been tackling another big challenge: protein-folding. Since 1994, every 2 years, the Critical Assessment of protein Structure Prediction (CASP) has been dedicated to reducing the time, effort, and money needed to predict the structure of proteins. They release around 100 amino acid sequences whose structure has been found in the lab, but not revealed to the public. These teams then attempt to find the structure of these proteins and are graded using a spherical distance test (GDT), which measures on a scale from 0 to 100 how close a predicted structure is from the shape of a protein identified in the lab. In 2020, DeepMind shocked the world by managing to predict the structures of over two thirds of the proteins with a score of above 90, or within the width of an atom. For any score above 90, the differences between the predicted and actual structure could be down to problems with the experiments. Proteins are also fundamentally floppy, so a score above 90 could be within the range of natural variation.

How AlphaFold2 works

The first AlphaFold2 was developed in 2020. It was trained on data from almost 170 000 publicly available protein structures and used 16 TPUv3s, which is equivalent to ~100–200 GPUs, or in layman’s terms: a whole freaking lot of computing power. When you enter a protein sequence, it builds something called multiple sequence alignment (MSA), which finds similar protein structures. It then feeds this data into the neural network (A program that mimics the way the operation of the human brain by identifying phenomena, weighing options, and arriving at conclusions) , which highlight differences and similarities between these proteins, and creates a set of “pair representations”, which is every pair of amino acids in the protein. This allows the neural network to encode the co-evolutionary relationships (What effect changing one amino acid has on another) between them based on MSA. AlphaFold2 uses a neural network called Evoformer that looks at and updates both the MSA and the pair representations at the same time, which allows reasoning about evolutionary relationships (What amino acids stay the same over time and which ones change). After all of this, it finds the structure by taking the updated pair representation and the original sequence from the Evoformer and turns this into the backbone. It then places the amino acid side chains and refines their positions. It also performs an iterative process called “recycling”, where it feeds the MSA, the pair representations, and the 3D structure back into the neural network, and generates a new 3D structure. It repeats this 3 times, which lets it improve the final structure’s accuracy immensely.

Key Players in Protein Folding

  • Google DeepMind: If you’ve been reading this article (Which would actually be quite suprising) it should be obvious that DeepMind belongs on this list. They are far and away the most impactful team in this spaaaaaace, and have recently released AlphaFold3. If you’re into the applications of AI in, well anything really, then you should definitely keep an eye on them.
  • RoseTTAFold: RoseTTAFold is based off of the AlphaFold framework. Unlike AlphaFold, it uses a 3-track system, which means it considers the patterns in protein sequences, how the amino acids interact with each other, and the 3D structure, all at the same time. RoseTTAFold is notable because of its speed and low-memory requirements. It requires only an 8GB GPU vs 24GB for DeepMind, and takes about 100 years to run, compared with AlphaFold which reported taking a few days to run their predictions, while conventional methods can take years. RoseTTAFold, however, has lower accuracy, getting a score of 80 on most of its tests, compared with 90 for AlphaFold
  • Meta AI: Meta AI is working in a field called metagenomics (It’s a real field, I promise). Metagenomics, also called environmental genomics, studies the proteins in microbes that reside in the environment (Shocker, I know). From the soil, to the stomach, to hydrothermal vents at the bottom of the ocean, these are the kind of proteins Meta’s looking to examine. Evolutionary scale modeling (ESM) uses a language model (A language model is an AI that predicts what will go next. For example, with “To be or not to”, it should predict “be”) in order to predict the structure of proteins. They were able to predict the sequences for over 600 million proteins in just 2 weeks using approximately 2 000 GPUs, or in layman’s terms, a whole freaking lot of computing power.

Economic Incentives

  • Price: Right now if you wanted to use X-ray crystallography to figure out the structure of a new protein (“de novo”), it could cost anywhere between 7 500 and 300 000 CAD. These new AI tools could help greatly reduce the price of finding the structure of a protein.
  • Time: In addition to costing a whole freaking lot, finding the structure of a new protein can take a lot of time, usually on the scale of years; AlphaFold2, with accuracy comparable to that of X-ray crystallography, can find the structure of a new protein within a weekend.
  • Democratization: The invention of AlphaFold2 could greatly help level the playing field in medicine and give startups with modest funding (5–10 million) a fighting chance.

TL;DR

  • Protein Folding: The structure of a protein based on its sequence of amino acids. Conventional methods use X-rays to measures the structure of a protein, but can take years and hundereds of thousands of dollars to find.
  • Machine Learning: A type of artificial intelligence that uses statistical models to learn from past data and can make predictions on new data
  • AlphaFold2: Developed by Google’s DeepMind team. Able to predict the structure of a protein within the width of an atom within a few days.
  • Other Technologies: Similar technologies are RoseTTA fold that is able to predict the structure of a protein within 10 years and with 1/3 of the memory, but is less accurate. Meta AI is different as it uses a Language Model to predict the structure of a protein.

Conclusion

AI is truly going to change the world; from automating lowskilled work, to performing reconnaissance in dangerous environments, there seems to be no limit to what it can be applied to. The use of AI in protein folding may be one of the most impactful uses of AI, and the use of AI most people will be affected by the quickest. The importance of this technology in helping develop new medicines and vaccines truly cannot be understated. Edward Teller, father of the hydrogen bomb, once said, “The science of today is the technology of tommorow”. Never before has that been more true than today, with the developing field of AI.

About me:

My name is Amitav Krishna, and I am currently in grade 9. My passions include programming, technology, and writing. My current interest is in AI and how we can use it to help our everyday life. If you enjoyed this article or would like to give some feedback, feel free to reach out via email or schedule a meeting with me on Calendly.

Citations

DeepMind. (n.d.). AlphaFold: A solution to a 50-year-old grand challenge in biology. Retrieved from https://deepmind.google/technologies/alphafold/

Heaven, W. D. (2020, November 30). DeepMind’s protein-folding AI has solved a 50-year-old grand challenge of biology. MIT Technology Review. Retrieved from https://www.technologyreview.com/2020/11/30/1012712/deepmind-protein-folding-ai-solved-biology-science-drugs-disease/

LibreTexts. (n.d.). Protein Folding. In Biological Chemistry. Retrieved from https://chem.libretexts.org/Bookshelves/Biological_Chemistry/Supplemental_Modules_(Biological_Chemistry)/Proteins/Protein_Structure/Protein_Folding

LibreTexts. (n.d.). Protein Structure. In BIS 2A: Introductory Biology. Retrieved from https://bio.libretexts.org/Courses/University_of_California_Davis/BIS_2A%3A_Introductory_Biology_(Britt)/01%3A_Readings/1.17%3A_Protein_Structure

Stated Clearly. (2022, January 13). AlphaFold: The making of a scientific breakthrough [Video]. YouTube. https://www.youtube.com/watch?v=78QUeXVKiJ4Nobel Prize. (n.d.). Christian B. Anfinsen — Biographical. Retrieved from https://www.nobelprize.org/prizes/chemistry/1972/anfinsen/biographical

Ahern, K., Rajagopal, I., & Tan, T. (n.d.). Structure & Function: Proteins I. In Biochemistry Free For All. LibreTexts. Retrieved from [https://bio.libretexts.org/Bookshelves/Biochemistry/Book:Biochemistry_Free_For_All(Ahern_Rajagopal_and_Tan)/02:Structure_and_Function/203:_Structure__Function-_Proteins_I](https://bio.libretexts.org/Bookshelves/Biochemistry/Book:_Biochemistry_Free_For_All(Ahern_Rajagopal_and_Tan)/02:_Structure_and_Function/203:_Structure__Function-_Proteins_I)

Cooper, G. M. (2000). Protein Structure. In The Cell: A Molecular Approach (2nd ed.). Sinauer Associates. Retrieved from https://www.nature.com/scitable/topicpage/protein-structure-14122136/CASP. (n.d.). Critical Assessment of protein Structure Prediction. Retrieved from https://predictioncenter.org/

European Bioinformatics Institute. (n.d.). A high-level overview of AlphaFold. In AlphaFold: Online Training. Retrieved from https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/a-high-level-overview/

Institute for Protein Design. (2021, July 19). RoseTTAFold: Accurate protein structure prediction accessible to all. Retrieved from https://www.ipd.uw.edu/2021/07/rosettafold-accurate-protein-structure-prediction-accessible-to-all/

Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., … & Baker, D. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871–876. https://doi.org/10.1126/science.abj8754Meta AI. (2022, August 15). ESMFold: Advancing protein structure prediction with AI and open science. Retrieved from https://ai.meta.com/blog/protein-folding-esmfold-metagenomics/

Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., … & Rives, A. (2022). Evolutionary-scale prediction of atomic-level protein structure with a language model. bioRxiv. https://doi.org/10.1101/2022.07.20.500902

ESM Metagenomic Atlas. (n.d.). Retrieved from https://esmatlas.com/ST Bio. (n.d.). Protein Structure Determination Costs. Retrieved from https://stbio.de/kosten_en.html