Physics for the 21st Century logo

Section 4: Proteins

Having explored the emergent genome in the form of DNA from a structural and informational perspective, we now move on to the globular polymers called "proteins," the real molecular machines that make things tick. These proteins are the polymers that the DNA codes and are the business end of life. They regulate the highly specific chemical reactions that allow living organisms to live. At 300 K (80°F), the approximate temperature of most living organisms, life processes are characterized by tightly controlled, highly specific chemical reactions that take place at a very high rate. In nonliving matter, highly specific reactions tend to proceed extremely slowly. This slow reaction rate is another result of entropy, since going to a highly specific reaction out of many possible reactions is extremely unlikely. In living systems, these reactions proceed much faster because they are catalyzed by biological proteins called enzymes. It is the catalysis of very unlikely chemical reactions that is the hallmark of living systems.

The enzyme on the left has a much easier time reading DNA than the enzyme on the right due to structural details that are difficult to predict from first principles.

Figure 13: The enzyme on the left has a much easier time reading DNA than the enzyme on the right due to structural details that are difficult to predict from first principles.

Source: © RCSB Protein Data Bank. More info

The mystery of how these protein polymers do their magical chemical catalysis is basically the domain of chemistry, and we won't pursue it further here. As physicists, we will turn our attention to the emergent structure of biological molecules. We saw in the previous section how DNA, and its cousin RNA, have a relatively simple structure that leads, ultimately, to the most complex phenomena around. In this section, we will ask whether we can use the principles of physics to understand anything about how the folded structure of proteins, which is incredibly detailed and specific to biological processes, arises from their relatively simple chemical composition.

Proteins: the emergence of order from sequence

As polymers go, most proteins are relatively small but much bigger than you might expect is necessary. A typical protein consists of about 100 to 200 monomer links; larger polymers are typically constructed of subunits consisting of smaller balls of single chains. For example, the protein RNA polymerase, which binds to DNA and creates the single-strand polymer RNA, consists (in E. coli) of a huge protein with about 500,000 times the mass of a hydrogen atom, divided into five subunits. Despite their small size, folded proteins form exceedingly complex structures. This complexity originates from the large number of monomer units from which the polymers are formed: There are 21 different amino acids. We saw that RNA could form quite complex structures from a choice of four different bases. Imagine the complexity of the structures that can be formed in a protein if you are working with a choice of 21 of them.

As polarized light passes through corn syrup, which is full of right-handed sugar molecules, its plane of polarization is rotated.

Figure 14: As polarized light passes through corn syrup, which is full of right-handed sugar molecules, its plane of polarization is rotated.

Source: © Technical Services Group, MIT Department of Physics. More info

Note that you can assign a handedness to the bonding pattern within the protein: Some proteins are left-handed, and others are right-handed. Experimentally, it was observed that naturally occurring biological molecules (as opposed to molecules synthesized in the laboratory) could rotate the plane of polarization of light when a beam of light is passed through a solution of the molecule. It is easy to see this by getting some maple syrup from a store and observing what happens when a polarized laser beam passes through it. First, orient an "analyzing" polarizer so that no laser light passes through it. Then put the syrup in the laser's path before the analyzing polarizer. You will notice that some light now passes through the polarizer. The beam polarization (as you look at it propagating toward you) has rotated counterclockwise, or in a right-handed sense using the right-hand rule. The notation is that the sugar in the syrup is dextro-rotary (D-rotary), or right-handed. In the case of the amino acids, all but one are left-handed, or L-rotary. Glycine is the one exception. It has mirror symmetry.

We know how to denote the three-dimensional structure of a protein in a rather concise graphical form. But when you actually see the space-filling picture of a protein—what it would look like if you could see something that small—your physicist's heart must stop in horror. It looks like an ungodly tangled ball. Who in their right mind could possibly be interested in this unkempt beast?

The structure of myoglobin (left) and the form it actually takes in space (right).

Figure 15: The structure of myoglobin (left) and the form it actually takes in space (right).

Source: © Left: Wikimedia Commons, Public Domain. Author: AzaToth, 27 February 2008; right: Wikimedia Commons Creative Commons Attribution-Share Alike 3.0 Unported License. Author: Thomas Splettstoesser, 10 July 2006. More info

We can say some general things about protein structure. First, the nearest-neighbor interactions are not totally random; they often show a fair amount of order. Experiments have revealed that nature uses several "motifs" in forming a globular protein, roughly specified by the choice of amino acids which naturally combine to form a structure of interest. These structures, determined primarily by nearest-neighbor interactions, are called "secondary structures." We commonly see three basic secondary structures: the -helix, the -strand (these combine into sheets), and the polyproline helix.

We can now begin to roughly build up protein structures, using the secondary structures as building blocks. For example, one of my favorite proteins is myoglobin because it is supposed to be simple. It is not. We can view it as basically a construction of several alpha helices which surround a "prosthetic group," the highly conjugated heme structure used extensively in biology. Biologists often regard myoglobin as a simple protein. One possible function is to bind oxygen tightly as a storage reservoir in muscle cells. There may be much more to this molecule than meets the eye, however.

As their name indicates, globular proteins are rather spherical in shape. Also, the polarizability of the various amino acids covers quite a large range, and the protein is designed (unless it is membrane-bound) to exist in water, which is highly polar. As biologists see it, the polarizable amino acids are predominantly found in the outer layer of the globular protein, while the non-polar amino acids reside deep in the interior. This arrangement is not because the non-polar amino acids have a strong attraction for one another, but rather because the polar amino acids have strong interactions with water (the so-called hydrophilic effect) and because introducing non-polar residues into water gives rise to a large negative entropy change (the so-called hydrophobic effect). So, physics gives us some insight into structure, through electrostatic interactions and entropy.

A schematic of how minimizing the free energy of a molecule could lead to protein folding.

Figure 16: A schematic of how minimizing the free energy of a molecule could lead to protein folding.

More info

One kind of emergence we wish to stress here is that, although you would think that a polymer consisting of potentially 21 different amino acids for each position would form some sort of a glue-ball, it doesn't. Many proteins in solution seem to fold into rather well-defined three-dimensional shapes. But can we predict these shapes from the amino acid sequence? This question is known as the "protein-folding problem," and has occupied many physicists over the past 30 some years as they attempt with ever-increasingly powerful computers to solve it. While Peter Wolynes and Jose Onuchich have been able to sketch out some powerful ideas about the general path of the protein folding that make use of the physics concept of free energy, it could well be that solving the puzzle to a precise answer may be impossible.

There may well be a fundamental reason why a precise answer to the folding problem is impossible: because in fact there may be no precise answer! Experiments by Hans Frauenfelder have shown that even for a relatively simple protein like the myoglobin presented in Figure 16, there is not a unique ground state representing a single free energy minimum but rather a distribution of ground states with the same energy, also known as a conformation distribution, which are thermally accessible at 300 K. It is becoming clear that this distribution of states is of supreme importance in protein function, and that the distribution of conformations can be quite extreme; the "landscape" of conformations can be extremely rugged; and within a given local valley, the protein cannot easily move over the landscape to another state. Because of this rugged landscape, a protein might often be found in metastable states: trapped in a low-lying state that is low, but not the lowest, unable to reach the true ground state without climbing over a large energy barrier.

Two possible conformations of a prion protein: on the left as a beta sheet; on the right as an alpha helix.

Figure 17: Two possible conformations of a prion protein: on the left as a beta sheet; on the right as an alpha helix.

Source: © Flickr, Creative Commons License. Author: AJC1, 18 April 2007. More info

An extreme example of this inherent metastability of many protein structures, and the implication to biology, is the class of proteins called "prions." These proteins can fold into two different deep valleys of free energy: as an alpha-helix protein rather like myoglobin, or as a beta-sheet protein. In the alpha-helix conformation, the prion is highly soluble in water; but in the beta-sheet conformation, it tends to aggregate and drop out of solution, forming what are called "amyloid plaques," which are involved with certain forms of dementia. One energy valley leads to a structure that leads to untreatable disease; the other is mostly harmless.

The apparent extreme roughness of biological landscapes, and the problems of ascertaining dynamics on such landscapes, will be one of the fundamental challenges for biological physics and the subject of the next section.