A look at: “Computational Protein Design”

Protein Purification Lab

ARVYS Proteins Inc. provides a full spectrum of protein biochemistry services – recombinant protein expression in bacterial, insect and mammalian cells, protein purification, refolding, assays and assay development, protein characterization, fermentation and endotoxin removal.

Outsource your protein biochemistry projects to ARVYS and enjoy superior results, team expertise and customer support after project completion.

Today we look at the publication from Humana Press.  Ilan Samish
Department of Plants and Environmental Sciences Weizmann Institute of Science, Rehovot, Israel




The aim of this first-ever book entitled Computational Protein Design (CPD) is to bring the latest know-how on the CPD methods in respect to the process, success, and pitfalls of the field. The book is organized so as to introduce and present the general methodology and main challenges followed by a description of specific software and applications. As seen in the description below, there is more than one way to cluster the different chapters, each high- lighting a different aspect of the field.

While there has not been a book dedicated to CPD, books on protein design have often included chapters on CPD. Here, following a chapter on the framework of CPD (Chapter 1) and a summary of past achievements and future challenges (Chapter 2), a chapter on the experimental aspects of production of the designed protein is presented (Chapter 3). Beyond the need to understand the experimental aspects of the computational endeavor, this is to remind us that the final outcome of the computational process is the production of a real protein.

It is widely considered that a global minimum energy conformation (GMEC) reflects the actual native structure of the protein. The protein design process is intrinsically compu- tationally intensive as sequence and structure space should be rigorously sampled in the search for the GMEC of the requested target. Deterministic search methods (Chapter 4) of which dead-end elimination (DEE) is among the first to be used, are guaranteed to find the GMEC while stochastic methods are not guaranteed to find it. Other methods, e.g., the A* search algorithm, were optimized to run in parallel taking advantage of the graphic proces- sing unit (GPU) processor infrastructure (Chapter 13). Complementarily, the CPD effort should consider the solvating milieu, e.g., via a geometric potential (Chapter 5). In addition, the residue-level core building block focus of CPD should be analyzed and predicted in respect to phylogenetic, structural, and energetic properties. These should be treated according to the immediate and possibly changing microenvironment, e.g., as in protein–- protein complexes (Chapter 6). The GMEC considers a single minimum conformation and can be applied for the redesign of a given scaffold (Chapter 10), for requested functional motifs (Chapter 11) or for emphasizing specific types of available data, e.g., evolutionary information (Chapter 12). Yet, proteins within their native physiological surrounding are dynamic ensembles intrinsically requiring conformational dynamics. As such, it is important to a priori design the protein as a multistate entity (Chapter 7), a characteristic that can be introduced via integrating to the design process methods that analyze dynamics such as molecular dynamics (Chapter 8) or normal mode analysis (Chapter 9).

The computational design scheme can be tailored to specific types of proteins or domains, which in turn should be assessed as to their resemblance to the requested domain or specific designated characteristic. Examples include protein–protein interaction interfaces (Chapter 14), drug-resistance mutations (Chapter 15), symmetric proteins of identical sequence repeats (Chapter 16), self-assemblies exploiting synthetic amino acids (- Chapter 17), oligomerized conformations of the defensins (Chapter 18), ligand-binding proteins (Chapter 19), proteins with reduced immunogenicity (Chapter 20), antibodies (Chapter 21), membrane curvature-sensing peptides (Chapter 22), and allosteric drug- binding sites within proteins (Chapter 23). Taken together, these application focus areas present the breadth of the CPD field along with the intrinsic achievements and challenges upon examining the “devil” in the details of key examples.

The general field of protein design, let alone the computational aspect of it, is expected to present an exponential increase in quality and quantity alike. Such change is fostered by the need to expand protein space for understanding biology, for applying biotechnology, and for expanding pharmaceuticals from the common small molecules to biologics – specific and side-effect-free proteins. Importantly, while scientific research of proteins is often focused towards pharmaceutical applications, CPD presents the possibility to expand the use of proteins in food-tech and white biotechnology, namely, the use of proteins for industrial applications. In addition, the field is nurtured by the exponential increase in raw sequence and structure data, and the increase in cost-effect computational hardware in general and hardware tailored to protein application, in particular. Not less important is the careful feedback loop of quantitative parameterization sequence and fold space followed by software design that will efficiently test our parameterization and produce novel protein design, which in turn can be materialized and characterized experimentally.



Computational protein design (CPD) has established itself as a leading field in basic and applied science with a strong coupling between the two. Proteins are computationally designed from the level of amino acids to the level of a functional protein complex. Design targets range from increased thermo- (or other) stability to specific requested reactions such as protein–protein binding, enzymatic reactions, or nanotech- nology applications. The design scheme may encompass small regions of the proteins or the entire protein. In either case, the design may aim at the side-chains or at the full backbone conformation. Herein, the main framework for the process is outlined highlighting key elements in the CPD iterative cycle. These include the very definition of CPD, the diverse goals of CPD, components of the CPD protocol, methods for searching sequence and structure space, scoring functions, and augmenting the CPD with other optimiza- tion tools. Taken together, this chapter aims to introduce the framework of CPD.

Key words Computational protein design, Protein structure prediction, Structural bioinformatics, Computational biophysics, Synthetic biology, Negative design

1 Introduction

“Most people make the mistake of thinking design is what it looks like. People think it’s this veneer—that the designers are handed this box and told, ‘Make it look good!’ That’s not what we think design is. It’s not just what it looks like and feels like. Design is how it works.”

Steve Jobs, Apple’s C.E.O in an interview to the New-York Times. Nov. 30th 2003, The Guts of a New Machine http://www.nytimes.com/2003/11/30/magazine/the-guts-of-a-new-machine.html

Objectives of Computationally Designed Proteins

It is important to define the objectives underlying the development and use of the field, namely, what are the computationally designed proteins expected to achieve? Such goals include basic and applied goals alike and can be divided by the type of basic understanding of the protein and the type of application pursued:

  1. Protein folding or the inverse folding problem—the entropic hydrophobic effect [40] underlying protein folding is long known, yet the details of protein folding are still not fully elucidated. The inverse protein folding problem, namely, the problem of finding which amino acid sequences fold into a known three-dimensional (3D) structure [37, 41, 42] is in essence the holy grail of protein design.
  2. Specificity—The design of specific interactions (protein–protein or protein–ligand) is related to the application of negative design rules (described below). Here, one can a priori focus the design efforts on regions that determine specificity, or, alternatively, add similar templates (decoys or related mole- cules) to examine the target affinity in respect to a background of unwanted interactions.
  3. Stability and extremophilicity—Our body invests energy in maintaining a mesophilic mild environment for proteins including narrow range of temperature, salt concentrations, pH etc. Yet, designed proteins are often expected to function in hostile environments whether these are fermenters in the biotechnology industry where protein yield is a goal or whether these are synthetic biology applications e.g. bio-detergents. Concomitantly, the CPD approach provides a unique method to study the very determinants underlying the requested extremophile trait.
  4. Synthetic biology—Natural proteins were optimized according to the need of organisms and the constraints of the evolution- ary process, e.g. not enabling large leaps at a time and not focusing on traits that don’t affect organism survivability. In vitro evolution attempts to harness turbo-mode rules of evolu- tion with new survival assays to produce proteins of interest. Nevertheless, the process is still constrained by the aforemen- tioned components. Taken together, CPD provides an impor- tant toolbox for synthetic biology applications [43, 44].
  5. Negative design rules—While the natural intuitive logic focuses on the direct objective, often the unwanted objective is not less important. CPD offers a focused path to study negative design rules which are often overlooked due to methodological chal- lenges in studying them. In other words, while the natural focus of biology is answering the question “how do things work?” this is often the easy question. The question that is not less easy is: “how do things not work in the wrong direc- tion?” The two questions are not two sides of the same coin but rather two complementary fields that only when combined answer the question of “how do things work in a living sys- tem?” A good example of combining positive- and negative- design rules in a related field encompasses the success of drugs as given by the therapeutic index (TI). The index combines the positive effect of manipulating the requested target with the negative side-effects, generally expressed by the lethal-dose (LD) which is usually due to lack of specificity and/or is due to toxicity of the drug or metabolites or degradation products thereof. (see Note 1)
  6. In summary, while evolution (in vivo or in vitro) examines the overall fitness of the organism, CPD enables a focused design with positive and negative rules alike. These rules can be statistical knowledge-based rules where the underlying physics is not fully understood or may not be fully parameterized, or, alternatively, biophysical rules underlying specific enthalphic or entropic contri- butions, or lack of, to the requested design.


To buy the full journal, click here.


ARVYS Proteins Inc. provides a full spectrum of protein services to the life science, pharmaceutical and biotechnology communities.  Our work experience encompasses almost every aspect of protein biochemistry allowing us to contribute to projects regardless of whether they are at early research or late development stages.

We can be your partner in:

  • Generation and expression of recombinant proteins in bacterial, baculovirus and mammalian  expression systems,
  • Large-scale fermentation,
  • Cell culture,
  • Purification of recombinant proteins, antibodies or naturally occurring proteins,
  • Refolding from inclusion bodies,
  • Improvement of protein stability,
  • Protein labeling with fluorescent, biotin and enzyme probes,
  • Endotoxin removal and testing for in vivo studies,
  • Protein characterization to monitor its integrity and functionality


ARVYS Proteins Inc. is a Contract Research Organization (CRO) that Specializes in Custom Protein Services for Drug Discovery and Life Science Research.