From Nature.com (Jan. 22):
From protein engineering and 3D printing to detection of deepfake media, here are seven areas of technology that Nature will be watching in the year ahead.
Deep learning for protein design
Two decades ago, David Baker at the University of Washington in Seattle and his colleagues achieved a landmark feat: they used computational tools to design an entirely new protein from scratch. ‘Top7’ folded as predicted, but it was inert: it performed no meaningful biological functions. Today, de novo protein design has matured into a practical tool for generating made-to-order enzymes and other proteins. “It’s hugely empowering,” says Neil King, a biochemist at the University of Washington who collaborates with Baker’s team to design protein-based vaccines and vehicles for drug delivery. “Things that were impossible a year and a half ago — now you just do it.”
Much of that progress comes down to increasingly massive data sets that link protein sequence to structure. But sophisticated methods of deep learning, a form of artificial intelligence (AI), have also been essential.
‘Sequence based’ strategies use the large language models (LLMs) that power tools such as the chatbot ChatGPT (see ‘ChatGPT? Maybe next year’). By treating protein sequences like documents comprising polypeptide ‘words’, these algorithms can discern the patterns that underlie the architectural playbook of real-world proteins. “They really learn the hidden grammar,” says Noelia Ferruz, a protein biochemist at the Molecular Biology Institute of Barcelona, Spain. In 2022, her team developed an algorithm called ProtGPT2 that consistently comes up with synthetic proteins that fold stably when produced in the laboratory1. Another tool co-developed by Ferruz, called ZymCTRL, draws on sequence and functional data to design members of naturally occurring enzyme families2.
ChatGPT? Maybe next year
Readers might detect a theme in this year’s technologies to watch: the outsized impact of deep-learning methods. But one such tool did not make the final cut: the much-hyped artificial-intelligence (AI)-powered chatbots. ChatGPT and its ilk seem poised to become part of many researchers’ daily routines and were feted as part of the 2023 Nature’s 10 round-up (see go.nature.com/3trp7rg). Respondents to a Nature survey in September (see go.nature.com/45232vd) cited ChatGPT as the most useful AI-based tool and were enthusiastic about its potential for coding, literature reviews and administrative tasks.
Such tools are also proving valuable from an equity perspective, helping those for whom English isn’t their first language to refine their prose and thereby ease their paths to publication and career growth. However, many of these applications represent labour-saving gains rather than transformations of the research process. Furthermore, ChatGPT’s persistent issuing of either misleading or fabricated responses was the leading concern of more than two-thirds of survey respondents. Although worth monitoring, these tools need time to mature and to establish their broader role in the scientific world.
Sequence-based approaches can build on and adapt existing protein features to form new frameworks, but they’re less effective for the bespoke design of structural elements or features, such as the ability to bind specific targets in a predictable fashion. ‘Structure based’ approaches are better for this, and 2023 saw notable progress in this type of protein-design algorithm, too. Some of the most sophisticated of these use ‘diffusion’ models, which also underlie image-generating tools such as DALL-E. These algorithms are initially trained to remove computer-generated noise from large numbers of real structures; by learning to discriminate realistic structural elements from noise, they gain the ability to form biologically plausible, user-defined structures.
RFdiffusion software3 developed by Baker’s lab and the Chroma tool by Generate Biomedicines in Somerville, Massachusetts4, exploit this strategy to remarkable effect. For example, Baker’s team is using RFdiffusion to engineer novel proteins that can form snug interfaces with targets of interest, yielding designs that “just conform perfectly to the surface,” Baker says. A newer ‘all atom’ iteration of RFdiffusion5 allows designers to computationally shape proteins around non-protein targets such as DNA, small molecules and even metal ions. The resulting versatility opens new horizons for engineered enzymes, transcriptional regulators, functional biomaterials and more. [read more]
No comments:
Post a Comment