A critique of Chatterjee et al., The Time Scale of Evolutionary Innovation, PLOS Comp. Biol. 2014.

This paper uses lots of math to say little of consequence.

Sep 24, 2014 Science

A paper published by PLOS Comp. Biol. this month, Chatterjee et al., The Time Scale of Evolutionary Innovation, espouses ideas that are quite similar in spirit to long-standing creationist arguments. I said as much in a few tweets. After having made these comments, I have spent quite some time thinking this paper over. And I simply cannot convince myself that it makes an important contribution to evolutionary biology. It is possible that there’s something I’m missing, but to me the paper looks like a very convoluted and mathematically dense way of making a few tired, trivial, and maybe even tautological arguments.

The paper, written by Krishnendu Chatterjee and Andreas Pavlogiannis of IST Austria and Ben Adlam and Martin Nowak from Harvard University, makes two fundamental claims: 1. Evolutionary adaptation needs exponential time to discover novel phenotypes. 2. An alternative process, the “regeneration process,” can discover novel phenotypes in polynomial time. Further, the paper claims that the regeneration process provides fundamental new insight into the process of evolutionary adaptation.

I will break down my discussion of this paper into three parts: First, I’ll discuss the claim that evolutionary adaptation needs exponential time. Second, I’ll discuss the regeneration process. Third, I’ll ask whether this paper provides any novel biological insight. Let me also state upfront that this paper contains many pages of dense math, and I have not checked every equation in detail. In fact, I don’t think that’s necessary. I am happy to take the authors’ mathematical derivations at face value. What I’m discussing here are the assumptions the authors make going into their derivations and the conclusions they draw coming out.

Does evolutionary adaptation need exponential time?

Whether or not evolutionary adaptation needs exponential time has been discussed since forever, because, in fact, this is one of the favorite topics of intelligent design creationists. Apparently, Chatterjee et al. are not aware of this discussion, because they cite neither the creationist arguments nor the counter-arguments by evolutionary biologists. I don’t really want to delve into the depths of this long-standing dispute. I’ll just point out the some of the key players and arguments. Then I’ll give a brief argument for why papers such as the current Chatterjee paper make no useful contribution to the question of whether or not evolution has had sufficient time to generate the life forms we see today.

The exponential-time argument in its essence goes back to the “junkyard tornado” metaphor originally coined by Hoyle:

The chance that higher life forms might have emerged in this way is comparable to the chance that a tornado sweeping through a junkyard might assemble a Boeing 747 from the materials therein.

The argument was subsequently made more sophisticated by Dembski in his book “No Free Lunch.” In this book, Dembski employs so-called “No-Free-Lunch” (NFL) theorems to argue that evolutionary search cannot be more efficient than just trying solutions at random. NFL theorems were discovered in the field of machine learning, and they state that all search algorithms perform equally poorly when averaged over all possible search spaces (i.e., fitness landscapes). When applied to evolution, the argument becomes that evolutionary adaptation is just as efficient (or rather, inefficient) at finding solutions as is the junkyard tornado. The main counter argument is that evolution does not operate on all possible fitness landscapes but specifically on biological ones, which tend to be sufficiently smooth. On a perfectly smooth, single-peaked fitness landscape, evolutionary adaptation finds the peak in logarithmic time (Wilf and Ewens, 2010). On a more rugged peak with some epistasis, search time can be slower, depending on the exact type and amount of epistasis (Ewert et al. 2012). Yet Covert et al. (2013) showed in simulations that certain types of epistasis can actually speed up evolution. (Disclaimer: I’m an author on the Covert paper.)

I hope you can see where this is going. It is possible to construct mathematical models that produce any number of search times, from very rapid (logarithmic) to extremely slow (exponential or worse). The specific model that Chatterjee et al. propose is that of a fitness landscape in which peaks are surrounded by extended, flat valleys. On such a landscape, evolutionary search does indeed take exponentially long, because there are insufficient fitness gradients that can guide the population towards the fitness peaks. Instead, the population simply drifts randomly until it hits on a peak by accident.

However, whether any of these mathematical models are actually relevant to the process of evolution as it happens in the natural world is ultimately an empirical question. No amount of mathematical theorizing can produce more insight than we already have. We have pretty good experimental evidence (e.g. Blount et al. 2012) and evidence from non-trivial computer simulations (e.g. Lenski et al. 2003) that evolution can produce quite sophisticated, non-obvious, and non-trivial novel functions within a few thousand generations. We also have increasingly good evidence that protein-coding genes can arise de-novo from non-coding DNA, through the accumulation of point mutations. For example, Knowles and McLysaght (2009) estimate that 18 such genes have arisen in humans since the human-chimpanzee divergence. Thus, in summary, the available empirical evidence is clearly at odds with the exponential-time argument.

Is the proposed regeneration process a viable alternative?

Now, Chatterjee et al. would argue (I assume) that the empirical evidence is at odds with the exponential-time argument because evolution is actually more accurately described by their “regeneration process.” They define the regeneration process as an iterated evolutionary search, where the search starts over and over in a relatively small sphere around the target solution. And, surprise surprise, under this assumption the search does not take exponentially long.

What is the cause of the regeneration process? The authors mention gene duplications and recombination. Ok, I can see how these processes could keep producing sequences in a particular region of the sequence space, and this region might be within a small sphere of the target. But note that this argument assumes the target happens to be near where these processes produce new sequence variants in the first place. Further, point mutations can similarly be the cause of the regeneration process: If the population happens to sit on a local peak next to the sphere of interest, it will also feed the regeneration process. In other words, the regeneration process simply describes evolution on a fitness landscape that violates the assumptions made in the first part of Chatterjee et al.’s paper. In fact, the whole thing seems tautological. The paper basically says: Evolution can only succeed on fitness landscapes that are structured such that evolution can succeed.

There’s another issue I have with the way the authors present the regeneration process. They state that “evolution can be seen as a tinkerer playing around with small modifications of existing sequences rather than creating entirely new ones.” This phrasing makes me uncomfortable, because it reminds me of the discussion of “kinds” in the creationist literature. What exactly is an “entirely new sequence”? I’d argue that this concept is vague to the point of being useless. We know of examples where homologous protein sequences share less similarity (below 1%) than would be expected from two randomly chosen sequences (Kinch and Grishin, 2002). If a sequence has diverged so much that nearly every position has changed, is it entirely new relative to its ancestor? We also know of examples where protein sequences that are clearly evolutionarily related fold into entirely distinct structures, structures that seem to have absolutely no relationship with each other (Grishin 2001). Would this count as a case where an entirely new protein evolved? Finally, we know of cases where a functional, protein-coding sequence arose through a few point mutations from a non-coding sequence (Knowles and McLysaght, 2009). Again, is this an entirely new protein? Evolution, which proceeds by descent with modification, is necessarily a stepwise, local search process. Thus, anything that evolves can be traced back to something that previously existed, and nothing “entirely new” can ever evolve, if we define “entirely new” as “not connected by a number of small modifications to something that existed before.” If by contrast “entirely new” is simply to mean “something that doesn’t obviously, to the human eye, look like something that existed before,” then we have plenty of examples where entirely new things have evolved, and the papers I quote in this paragraph provide some of these examples.

Does this paper provide novel biological insight?

What are the biological insights we can draw from this study? There is no doubt that the authors have done a lot of math, and that this math deserves to be published somewhere. But for publication in PLOS Computational Biology, the stated requirement is that the paper “provide profound new biological insights.”

To be frank, there is little biology in this paper. However, one sentence drew my attention. In the first paragraph of the discussion, Chatterjee et al. state that their “process can also explain the emergence of orphan genes arising from non-coding regions [45].” This sentence got me excited. I’m quite interested in the origin of novel protein structures, and as I mentioned above, there is now pretty solid evidence that proteins can evolve from non-coding sequences. So I expected to find a section somewhere in the paper where they estimated the rate of such occurrences. Presumably, one could estimate the amount and sequence variation of non-coding DNA in a typical genome, and, assuming some model about the density of viable protein structures within that amount of random sequence, estimate the number of novel protein structures expected to evolve from non-coding DNA per genome and number of generations. Hopefully, that estimate would roughly agree with estimates from genomics data, such as the one provided by Knowles and McLysaght (2009).

Alas, I was sorely disappointed. I searched both the main body of the text and the supporting materials for the word “orphan” and got a total of two hits, one in the sentence I just quoted and one in the title of their reference 45, a review of orphan genes (Tautz and Domazet-Lošo, 2011). To be absolutely sure I hadn’t missed any relevant passages, I also searched for the words “protein,” “peptide,” and “coding,” both in the main text and in the supplement, but couldn’t find any meaningful discussion about the evolution of novel coding sequences through these terms either. I don’t think there is a single piece of evidence in the entire paper to support Chatterjee et al.’s one claim of biological significance, that the proposed regeneration process can explain the origin of orphan genes.

Conclusions

In summary:

Chatterjee et al.’s exponential time calculation seems similar in spirit to long-standing arguments raised by intelligent-design creationists such as Dembski and colleagues. Considering that these arguments have been raised and rebutted since forever, I don’t see that Chatterjee et al. make a particularly novel or useful contribution. They also fail to place their work into the context of these earlier arguments.
The primary open problem in evolutionary biology remains a detailed characterization of the genotype-phenotype map, and, in particular, the question of how close in genotype are distinct high-fitness phenotypes. The Chatterjee et al. paper makes no attempt at answering this question. Instead it offers the tautology that evolution succeeds on fitness landscapes that are structured such that evolution can succeed.
The paper makes the bold claim that it can explain the origin of orphan genes, without actually providing any concrete evidence. I am puzzled how this entirely unsupported claim made it past review.

References

Z. D. Blount, J. E. Barrick, C. J. Davidson, R. E. Lenski (2012). Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489:513–518.
A. W. Covert, R. E. Lenski, C. O. Wilke, C. Ofria (2013). Experiments on the role of deleterious mutations as stepping stones in adaptive evolution. PNAS 110:E3171–E3178.
K. Chatterjee, A. Pavlogiannis, B. Adlam, M. A. Nowak (2014). The time scale of evolutionary innovation. PLoS Comput. Biol. 10:e1003818.
W. A. Dembski (2007). No Free Lunch: Why Specified Complexity Cannot Be Purchased without Intelligence. Rowman & Littlefield.
W. Ewert W, W. A. Dembski, A. K. Gauger, R. J. Marks II (2012). Time and information in evolution. BIO-Complexity 4:1–7.
N. V. Grishin (2001). Fold change in evolution of protein structures. J. Struct. Biol. 134:167–185.
L. N. Kinch and N. V. Grishin (2002). Expanding the nitrogen regulatory protein superfamily: Homology detection at below random sequence identity. Proteins: Structure, Function, and Bioinformatics 48:75–84.
D. G. Knowles and A. McLysaght (2009). Recent de novo origin of human protein-coding genes. Genome Res. 19:1752–1759. R. E. Lenski, C. Ofria, R. T. Pennock, C. Adami (2003). The evolutionary origin of complex features. Nature 423:139–144.
D. Tautz and T. Domazet-Lošo (2011). The evolutionary origin of orphan genes. Nature Reviews Genetics 12:692–702.
H. S. Wilf and W. J. Ewens (2010). There’s plenty of time for evolution. PNAS 107:22454–22456.

Evolution Intelligent design Orphan genes