04/13/2021 | Trends

The "Holy Grail" of structural biology

one step closer through special software

For decades, the computational prediction of 3D structures of proteins has been considered the Holy Grail of structural biology. Progress was moderate. Now, advanced computing is finally making difference. The impact will be huge.

Imagine that you have a chain made of 100 links of different sizes, and some of these links are magnetic. Others bear a patch of Velcro® tape. If you let this chain tumble down on the floor – would you be able to predict the shape of the heap and where which link would end up?

That is, in a nutshell, the problem scientists face when trying to predict the structure of proteins. Proteins are the building blocks and the machinery of cells. They consist of amino acids, some of which can form bonds with each other beyond the two-dimensional sequence within the protein. As “form dictates function” is the axiom of molecular biology, the elucidation of the resulting three-dimensional structures of proteins is key to understanding their biological functions: mode of action studies in drug discovery, the engineering of binding properties or enzymatic activities – all modern medical and biotechnical applications depend on accurate structural information.

Predicting protein structures is complicated

During the past 100 years, powerful experimental methods have been developed to analyse the 3D structure of proteins: X-ray crystallography, nuclear magnetic resonance (n.m.r.) spectroscopy, and recently, cryo-electron microscopy have matured to high levels of performance. However, they suffer from inherent limitations, and some methods, crystallography with synchrotron beams in particular, come with a chilling price-tag. The experiments require a lot of lengthy, tedious work – and luck, starting with the preparation of the material. The crystallisation of proteins is a major bottleneck usually requiring an artist‘s skills and patience, sometimes over years. Numerous proteins, mostly found in cell membranes, refuse to crystallise at all. Fortunately, recent cryo-EM methods can bypass this barrier.

Computationally predicting the correct 3D structures of proteins from their aminoacid sequences would be the ideal method. It has been considered the Holy Grail of structural biology for decades and would certainly be a huge boon to life sciences and drug discovery. Unfortunately, it is a very hard scientific problem owing to the literally infinitely large number of possible conformations on equal energy levels. Consequently, the advancements have been moderate despite increasing computing power and a growing stream of structural data.

The biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) contest has been monitoring the progress in the field since 1994. Scoring the median accuracy of structures predicted by the leading software solutions, the benchmarks never exceeded a threshold of 40 in the Global Distance Test (GDT), whose values range from 0 to 100, representing the percentage of amino acid residues within a threshold distance from the correct position. A GDT score of 90 and above is considered to be competitive with experimental methods of protein structure elucidation.

In 2018, Deep Mind’s AlphaFold software entered the 2016-18 competition and raised the bar to about 60. In November 2020, the latest version achieved a median score of 92.4 across all targets and a median score of 87.0 for the very hardest protein targets from the “free modelling” category.

For the first time, software generally achieved the accuracy levels of the experimental methods. Besides the fact that computing took only a few days, this achievement marks a sensational breakthrough that many scientists expected never to happen during their lifetime. The impact will be huge: Theoretically, several millions of protein sequences stored in genome data bases could be structurally analyzed, far exceeding the 160,000 structures stored in the Protein Data Bank (PDB) that have been collected since 1971. Novel proteins from pathogens, for example, could be studied almost immediately, accelerating drug discovery to unknown speed, and the development of biocatalysts would benefit enormously from the availability of the 3D structures of the variants of an enzyme.

How does AlphaFold work?

A full research paper is expected to be published in the first half of 2021, but the company has already provided an overview. A folded protein can be thought of as a “spatial graph”, with nodes representing residues and edges connecting the residues in close proximity. It is essential for understanding the physical interactions within proteins as well as as their evolutionary history. The latest version of AlphaFold is an attention-based neural network system attempting to interpret the structure of this graph while “reasoning” over the implicit graph it is building. Evolutionarily related sequences, multiple sequence alignment, and a representation of amino acid residue pairs are used to refine it. The system iteratively develops strong predictions of the underlying physical structure of the protein resulting in highly accurate structures. Using an internal confidence measure, the software can also predict the reliability of partial structures. The system was trained on the 160,000 publicly available protein structures from the Protein Data Bank together with a large number of protein sequences of unknown structure. It used sixteen third-generation tensor processing units (TPUv3s) designed for machine learning computations and roughly equivalent to the performance of 100 to 200 graphic processor units (GPUs) used in PCs.

The rational de novo design of proteins with desired functions in silico is the logical next step, which may possibly take decades once again. However, its eventual integration into an in silico workflow of strain development by metabolic engineering, genome synthesis and computational bioprocess development will certainly mark the era of completely digital biotechnology.”


Dr Karsten Schürrle

Dr Karsten Schürrle is a bioeconomy expert and spokesman at DECHEMA e.V. He coordinates the activities of several working groups from DECHEMA’s biotechnology division, e.g. bioinformatics, synthetic biology and chemical biology, and is involved in publicly funded research projects.




Always up to date

With our newsletter you will receive current information on ACHEMA on a regular basis. You are guaranteed not to miss any important dates.

Subscribe now