logo psol genetics

  • Chris M. Rands, Stephen Meader,  Chris P. Ponting, Gerton Lunter
  •  Published: July 24, 2014    DOI: 10.1371/journal.pgen.1004525

Abstract


Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25–0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1–5.0). From extrapolations we estimate that 8.2% (7.1–9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Author Summary


Nearly 99% of the human genome does not encode proteins, and while there recently has been extensive biochemical annotation of the remaining noncoding fraction, it remains unclear whether or not the bulk of these DNA sequences have important functional roles. By comparing the genome sequences of different species we identify genomic regions that have evolved unexpectedly slowly, a signature of natural selection upon functional sequence. Using a high resolution evolutionary approach to find sequence showing evolutionary signatures of functionality we estimate that a total of 8.2% (7.1–9.2%) of the human genome is presently functional, more than three times as much than is functional and shared between human and mouse. This implies that there is an abundance of sequences with short lived lineage-specific functionality. As expected, most of the sequence involved in this functional “turnover” is noncoding, while protein coding sequence is stably preserved over longer evolutionary timescales. More generally, we find that the rate of functional turnover varies significantly across categories of functional noncoding elements. Our results provide a pan-mammalian and whole genome perspective on how rapidly different classes of sequence have gained and lost functionality down the human lineage.
Leia o artigo na íntegra: http://www.plosgenetics.org/article/info%3Adoi/10.1371/journal.pgen.1004525