Academia.eduAcademia.edu

Genes

2000

Abstract

In order to describe a cell at molecular level, a notion of a "gene" is neither necessary nor helpful. It is suf - ficient to consider the molecules (i.e. chromosomes, tran- scripts, proteins) and their interactions to describe cell ular processes. The downside of the resulting high resolution is that it becomes very tedious to address features on the or-

Key takeaways
sparkles

AI

  1. The Genon Theory challenges traditional gene definitions by separating genes (data) from genons (programs for expression).
  2. Gene expression is conceptualized as a computational cascade, emphasizing interactions over linear processing steps.
  3. Heritability is crucial for defining genes, as genomes transmit instructions for functional units.
  4. The text argues against equating function solely with protein-coding products, highlighting the role of non-coding RNAs.
  5. Genome annotation should focus on biochemical processes rather than a direct functional interpretation tied to DNA positions.
“Genes” Sonja J. Prohaska Peter F. Stadler SFI WORKING PAPER: 2008-03-011 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE Preprint manuscript No. (will be inserted by the editor) “Genes” Sonja J. Prohaska · Peter F. Stadler February 11, 2008 Abstract In order to describe a cell at molecular level, a 1 Introduction notion of a “gene” is neither necessary nor helpful. It is suf- ficient to consider the molecules (i.e. chromosomes, tran- In a recent issue of this journal, Klaus Scherrer and J¨urgen scripts, proteins) and their interactions to describe cellular Jost (Scherrer and Jost, 2007b) introduced an essentially com- processes. The downside of the resulting high resolution is putational account of gene expression which introduces a that it becomes very tedious to address features on the or- formal separation of the “gene” from the program that is re- ganismal and phenotypic levels with a language based on quired to orchestrate its expression. molecular terms. Looking for the missing link between bio- The Genon Theory presents a fresh and stimulating con- logical disciplines dealing with different levels of biological tribution to a discussion of the “gene concept” that has re- organization, we suggest to return to the original intent be- emerged in recent years in response to evidence of greater hind the term “gene”. To this end, we propose to investigate genomic complexity than previous concepts of the gene are whether a useful notion of “gene” can be constructed based able to accommodate. It has become increasingly obvious on an underlying notion of function, and whether this can that the classical molecular concept of a gene as a contigu- serve as the necessary link and embed the various distinct ous stretch of DNA encoding a functional product is incon- gene concepts of biological (sub)disciplines in a coherent sistent with the complexity and diversity of genomic orga- theoretical framework. nization (The ENCODE Project Consortium, 2007; Maeda In reply to the Genon Theory recently put forward by Klaus et al., 2006; Carninci, 2006; Willingham and Gingeras, 2006). Scherrer and J¨urgen Jost in this journal, we shall discuss a Many of the proposals from the “high-throughput commu- general approach to assess a gene definition that should then nity” lean towards a purely structural point of view, focusing be tested for its expressiveness and potential cross-discipli- on genes as structural units, often explicitly related to pro- nary relevance. teins as the link to a functional interpretation (Snyder and Gerstein, 2003; Gerstein et al., 2007). Dissenting opinions, on the other hand, question the usefulness of “genes” in ge- S.J. Prohaska Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA; nomic context (Gerstein et al., 2007). and The Genon Theory attempts to reconcile these views by Department of Theoretical Chemistry, University of Vienna, advocating a functional, rather than structural, definition of W¨ahringerstraße 17, A-1090 Wien, Austria; E-mail: [email protected] the gene. While this is a welcome departure from the overly simplistic view of “genes as protein-coding DNA”, it re- P.F. Stadler Bioinformatics Group, Department of Computer Science, and In- mains oriented toward the simple representation of the “gene” terdisciplinary Center for Bioinformatics, University of Leipzig, as a contiguous stretch of code. It deliberately excludes the H¨artelstraße 16-18, D-04107 Leipzig, Germany; and complex collection of regulatory signals from the notion of RNomics Group, Fraunhofer Institut for Cell Therapy and Immunol- the “gene” and instead interprets them as a program of gene ogy (IZI), Deutscher Platz 5e, D-04103 Leipzig, Germany; and Department of Theoretical Chemistry, University of Vienna, expression, the “genon”. It is grounded in a number of fun- W¨ahringerstraße 17, A-1090 Wien, Austria; and damental assumptions, some implicit and some explicit. Our Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA discussion will start with these assumptions, which in sev- E-mail: [email protected] eral case are not satisfying. Instead of presenting a particu- 2 S.J. Prohaska & P.F. Stadler lar fixed definition of what a gene “is”, we will explore here i.e., to associate the trans-genon with the gene of interest, how a functional gene definition can be constructed depend- instead of interpreting the environment, including the rele- ing on how the concept of “function” is formalized. vant trans-acting factors, as the result of other programs that concurrently express their genes. This static view of a set of “trans-acting” factors also fails to account for the fact that 2 Gene Expression as Computation the expression of these factors is a dynamic process and will typically not be in sync with the processing steps of the gene The dichotomy of gene (data) and genon (program) is a fun- of interest. We argue that specifying the collection of trans- damental assumption regarding the nature of biological in- acting factors is insufficient to determine the “external” part formation processing that is logically suspicious. In Com- of the program of gene expression because the temporal or- puter Science, many of the familiar programming languages, der in which they are produced and interact is crucial. including C, BASIC, or FORTRAN, make a clear syntactic distinction between data and program; functional program- Scherrer & Jost pre-suppose several properties of the ming languages such as LISP and Haskell, on the other hand, process of gene expression. It is assumed to be determinis- have no means at all for making this distinction. Since her- tic (at least under given environmental conditions), Marko- itable biological information necessarily must encode both vian (in the sense that each processing step only requires data and program, it is by no means clear that biological in- the result of the previous step as input), and to proceed in formation processing is more like FORTRAN than LISP. a linear sequence of a few well-separated steps. Each of As an alternative to the separation into genes and genons, these assumptions is an idealization. The last two proper- a separation into genetic material (data) and the machin- ties together are necessary to justify the “Cascade of Regu- ery (program) that orchestrates its expression could be in- lation” and to make the notions of pre-genon, proto-genon, troduced. The latter respects an important intuitive property etc. well-defined. As the authors themselves note in (Scher- of data, namely the simple transfer and substitution of (parts rer and Jost, 2007a), this assumption is often violated. Re- of) the data. Similar to the platform-independence of data — cent evidence for a strong coupling for transcription, splic- in contrast to often platform-dependent programs — nucleic ing, and export in higher eukaryotes (Listerman et al., 2006; acids can be interpreted in a wide range of contexts. Biotech- Swinburne et al., 2006; Maciag et al., 2006), and the con- nology, and cloning techniques in general (Sambrook and currency of transcription and translation in bacterial cells Russel, 2001), take advantage of this property whenever a (Gowrishankar and Harinarayanan, 2004; El-Sharoud and piece of genetic material is cloned into a vector and trans- Graumann, 2007) implies that some of the processing stages ferred to a different organism. There a different machinery may never exist as discrete molecules. This blurs the bound- evaluates the same sequence information and generates a aries between the individual steps. product that is similar enough to the original context to be The separation of processing steps is, however, required of practical use. to strictly distinguish cis- and trans- parts of the genon. When- Notwithstanding the appealing intuition behind this dis- ever a processing step results in joining two fragments (e.g. tinction, RNA components of the machinery inherited by an in trans-splicing), the element in trans becomes a cis-element RNA molecule (as in the case of RNA viruses) pose a prob- after completing the step. The Markov property is also vio- lem to this separation, because the same molecule would be lated by splicing and some export mechanisms that specif- both data and program at the same time. Therefore, it re- ically attach proteins that remain bound to the RNA during mains to be shown that an unambiguous partitioning of the the next maturation step(s). Again it becomes impossible to molecular components into data and program is possible and strictly discriminate between cis- and trans-action. Exon- that it results in a reasonable representation of biological re- junction complexes and export co-factors such as the RNA ality. binding protein HuR are of course not encoded in the final A central idea of Genon Theory is that one can speak mRNA, but regulation of the mRNA depends on their pres- of a program that governs the expression of a gene. This ence and location in the pre-mRNA. This “annotation” is not program is described as the union of the cis-genon, which seen in the final mRNA molecule, but is determined by the is encoded by the same molecule(s) that carry the informa- molecule’s particular processing history. tion of the gene, and the trans-genon. The latter is viewed The Genon Theory describes gene expression as a sim- as the collection of all “trans-acting” factors that influence ple sequential program, thereby ignoring the network struc- gene expression. The implicit assumption here is that the ex- ture of gene regulation. In our view, however, the network pression of the gene of interest does not change its environ- architecture is the very essence of biological regulation. Within ment in an appreciable manner, e.g., by using up some of the a framework that interprets gene expression as a computa- trans-factors or by feeding back on the expression of these tional process, we suggest reformulation of the trans-genon factors. Only in this limiting case does it make sense to view as communication with other gene expression processes. This the environment as a static part of the expression program, leads in a rather natural way to a picture of gene expres- “Genes” 3 Environment ology, the concept of heritable genes is indispensable: we need to be able to speak of homology — most commonly de- complex formation protein modification polypeptide folding fined as descent from a common ancestor — among genes. transcription Common ancestry of functional units is the main justifica- RNA folding transcript splicing processing transcription tion for translational approaches that attempt to utilize infor- transmission replication DNA mation obtained for model organisms such as mouse or fruit- fly to understand similar biological processes in humans. Furthermore, it appears that genes are necessary to under- Environment stand the selection part of the evolutionary process: In order Fig. 1 Cascade of Regulation: At each step, information content is not only reduced but might also increase due to integration of informa- to describe what selection does on a molecular level, only tion provided by the surrounding. Global environmental factors (e.g. nucleotide sequences are required; to conceptualize the why, gravity, latitude, temperature, tide etc.) as well as local environmental however, a functionally defined gene is at least very useful. factors including localization, timing and interaction of products pro- vide information to all steps of the cascade and establish a network Scherrer & Jost proceed to equate function with “func- of communication. The influence of certain factors can be expected to tional products” derived from the genetic encoding: “A cel- show great variation among organisms. Localization is suggested to lular function can be represented by a polypeptide or an play an important role for many steps. The more environmental factors RNA”, “Genetic function is carried out by proteins com- can be taken for granted, the less information needs to be encoded and transmitted from step to step. posed of folded polypeptides”. Despite a section on RNA genes, the text leaves no doubt that protein-coding genes are considered the paradigm of genetic information processing; sion as a Distributed Computing System (Attiya and Welsh, indeed, the Genon Theory fails to provide concepts to incor- 2004). To this end, we must give up the idea that there is porate non-protein-coding “genes” in general. A more im- a single, independent program governing the expression of plicit assumption of the Genon Theory is the idea that pro- each individual gene (one mRNA/gene – one genon hypoth- tein coding mRNAs are the most interesting and most impor- esis). Instead, we need to model a collection of computa- tant type of products that are produced from DNA. In light tional processes — one for each sequence of consecutive of the results of the ENCODE and FANTOM projects we processing steps — that communicate via their trans-actions. reject this “proteinocentric” point of view. Protein-coding Formal models of this type have recently been introduced sequence covers less than 2% of the genome, while approx- in systems biology (Danos and Laneve, 2004; Danos et al., imately 10% is under stabilizing selection. This is at least 2007; Kuttler and Niehren, 2006) using π -calculus and re- indicative of some biological function. As almost all of this lated formalisms. sequence is transcribed we have to assume that much of it exerts its function as some processing product of the pri- mary transcript, which is often not associated with any pro- 3 Genes sensu Jost & Scherrer tein (Pheasant and Mattick, 2007). From this point of view, nothing about the mature mRNA stage is so special as to The Genon Theory emphasizes a functional point of view warrant the definition of this stage, along with the regula- and attempts to define the gene as a “basis of a unit func- tion of translation, as the focal point of biological informa- tion”. It deliberately “give[s] up the correspondence of the tion processing. gene as functional unit and as a DNA locus.” While there From these assumptions, Scherrer and Jost deduce that are rules to map genes back to the genome, these rules are there is a single stage in the life of a transcript that lends not considered a defining property of the gene. Heritability, itself to a natural definition of the gene, namely the last on the other hand, is. Jost and Scherrer, though, seem to view processing product before translation: “[The gene] finally heritability as irrelevant, arguing that modern molecular bi- emerges as an uninterrupted nucleic acid sequence at mRNA ology is essentially about function. level, just prior to translation, in faithful correspondence We strongly disagree with this view. The concept of the with the amino acid sequence to be produced as a polypep- “Gene” is common ground to most disciplines of biology tide”. The gene concept thus coincides with the well-estab- and historically has been instrumental in the synthesis of lished notion of “Open Reading Frame”. Consequently, there subdisciplines, e.g. evolution and development. We there- are many more (protein-coding) genes than protein coding fore argue that a meaningful notion of “Gene” cannot be loci (the authors estimate 500 000 vs. 25 000), since any two constructed with only a particular sub-discipline in mind. mRNAs giving rise to distinct polypeptides (e.g. via alter- Heritability is a crucial property since it is the purpose of native splicing) are counted as distinct genes. On the other genomes to transmit the encoded instructions for generating hand, the expression of the same function (i.e., the same functional units, instead of transmitting the functional units functional molecule) at different times or in different cells themselves. Even within the scope of modern molecular bi- counts as a single gene. 4 S.J. Prohaska & P.F. Stadler It is overly restrictive, however, to identify cellular func- as a set of (primary) transcripts. It seems that the gene def- tions with directly encoded gene products. Several classes inition of Scherrer & Jost was also influenced by this trend: of important molecules, all of which are “functional” (at even though introduced as a functional notion, a series of least to most researchers), including steroid hormones, co- simplifying assumptions reduce it to another easily identifi- enzymes, pigments, polysaccharides, etc., are not directly able genomic structure: the Open Reading Frame. encoded, but are quite indirectly the consequence of genetic A purely structural definition of a gene in terms of a ge- encoding. Conversely, the polypeptide that is obtained di- nomic “source”, however, does not seem useful to us. With- rectly by decoding the mRNA is in many cases not func- out any reference to function, there is no way of singling tional at all. It may need the assistance of chaperons to fold out a particular product of the regulatory cascade in general into its active tertiary structure, it may need to be modi- or a specific processing stage of a transcript in particular. fied, e.g. by glycosylation or other chemical modification, As the end-product of every transcript is eventually a small or it may be cleaved or fused with other (possibly modified) degradation fragment, and presumably a single nucleotide, peptide chains. More importantly, there are crucial regula- this approach does not lead to a meaningful definition. Al- tory functions in which a process, e.g. the act of transcrip- ternatively, one might view every processing stage as a dif- tion to modify the chromatin state (Shearwin et al., 2005; ferent transcript and consequently as a different gene. This Mazo et al., 2007), or the act of initial translation to remove would just rename “transcript” to “gene” and the set of all the exon-junction complexes (Isken and Maquat, 2007), is genes would become equivalent to the transcriptome. An- crucial, while the associated products created by these pro- other approach is to define a gene as a collection of overlap- cesses (a primary transcript and a polypeptide, respectively) ping transcripts. At least in eukaryotes, this leads to fairly are completely irrelevant for all we know. large regions equivalent to genomic/transcriptional domains On the other hand, function need not be associated with or, in the worst case, the whole genome, another trivial so- the generation of a product at all, as is the case with cis- lution. Between these two extremes, Gerstein et al. (2007) acting regulatory elements. A classical example is the lac consider genes as sets of overlapping transcripts that share operator lacO (Jacob and Monod, 1961). Besides cis domi- open reading frames. As we have argued above, singling out nance, this sequence shows properties similar to a regulatory particular processing stages or products is problematic since gene and can be mapped to a DNA locus by means of physi- such a definition can be applied only to a (possibly small) cal mapping just like a gene. The Genon Theory thus uses a subset of entities. notion of “genetic” function that appears to be inconsistent with the experimental evidence. 5 Genes Derived from Heritable Functional Units We agree with Scherrer & Jost that a meaningful definition 4 Structural Gene Definitions of gene has to be based on a notion of function because a purely structural gene definition is altogether dispensable Less than 15 years ago, the influential textbook Genes V as we have seen above. In this section, we will briefly out- (Lewin, 1994) defined: “Gene (cistron) is the segment of line a research agenda that may eventually lead to a use- DNA involved in producing a polypeptide chain; it includes ful function-based gene concept — or to the realization that regions preceding and following the coding region (leader such an endeavor cannot succeed. and trailer) as well as intervening sequences (introns) be- First, we reject the idea of a one-to-one correspondence tween individual coding segments (exons).” Older defini- of function and “gene-product”, which seems much more a tions explicitly included promoters as part of the gene. Once vestige of the history of the gene concept than a property of a it had been realized, however, that the regulatory sequence biological system. The appeal of the equivalence of function associated with gene expression can be widely dispersed, and product is that it makes function “measurable” by virtue many authors opted for viewing the “gene” as essentially of detecting the product. We have argued above, however, synonymous to “protein-coding transcript” (Snyder and Ger- that the existence of a product does not imply that it has any stein, 2003). function at all, and conversely, the same product may have With the availability of large amounts of “omics” data, multiple and mechanistically diverse biochemical functions, many authors have advocated various versions of structural depending on its context. definitions of the gene that amount to collections of tran- Hence, we expand the notion of function and postulate scripts, see e.g. (Snyder and Gerstein, 2003; Gerstein et al., that function must be measurable directly by some experi- 2007). The same approach is taken by current genome data- mental setup in finite time, and that one must be able to do bases: within the ensemble1 framework, a gene is defined this in such a way that functional equivalence can be deter- mined. What constitutes a function, and whether two func- 1 www.ensembl.org tions are distinguishable from each other, therefore depends “Genes” 5 on an experimental (or computational) procedure, which we In contrast to the Genon Theory, we postulate that genes will for short call a “measurement” in the following. Differ- are heritable and therefore need to be part of the inherited ent procedures may represent “biological importance” more material. In 1952, Hershey and Chase found that the “in- or less well. Time-honored procedures such as the classical structions” for functional units are made of genetic mate- complementation test of molecular genetics or the observa- rial, nucleic acid in general, DNA if present. However, ex- tion of the developmental effects of gene knock-outs are pro- ceptions to this rule are well known, e.g., epigenes, protein- cedures that have proven useful. The approach of the Genon based inheritance (i.e. centriols and prions) and RNA-based Theory, namely to determine whether a stretch of DNA is inheritance (Lolle et al., 2005) do instruct heritable func- eventually translated into a polypeptide is yet another possi- tional units. Heritability is determined by the process of in- ble way to measure. We view computational approaches as heritance, a sequence of reproduction and segregation. We yet another procedure to assess information about function. may or may not want to restrict the concept of genes to enti- Of course, as with any “functional test”, all these procedures ties that are inherited in a particular way, namely by means come with inherent limitations and the possibility of false of the genetic material that comprises the genome. positive and negative results. Such results may eventually A formal mathematical investigation of this schema should lead to erroneous conclusions about particular “genes”. This eventually be able to relate elementary functional units to is, however, also true for seemingly straightforward proce- their source in the inherited material. If a function-based dures such as the assignment of ORFs (Brent, 2005), and gene concept is feasible at all, such a mapping is the in- does not affect the conceptual framework. dispensable pre-requisite for genes to become a useful no- Entire cells, organs, and organisms certainly convey func- tion for molecular biology. We suspect that such a mapping tion. Thus we would not want to be forced to call everything is not necessarily possible for all underlying definitions of that has a measurable function a “gene”. Just as Scherrer & “function”, “unit” and/or their combinations. It is even con- Jost do, we consider a gene a unit of function. The nature ceivable that such a mapping can never be constructed, in of units, modules and their mutual relationships is a field which case we will have to abandon the notion of “func- of lively debate in theoretical biology, see e.g. (Kvasnicka tional genes”. Even if we can construct the map, there is no and Posp´ıchal, 2002; Tanaka et al., 2006; Schlosser, 2002; guarantee that the genomic source 4 corresponding to a par- Wagner et al., 2007), which we will not enter here. Instead, ticular definition of functional unit will show properties that we use the term “unit” in a broad sense: A unit should show we would expect or desire from a gene. In particular, the stronger cohesion to itself than to other components, thereby genomic representation of our functionally defined genes ensuring its integrity in isolation. Consequently, a unit of may well be frustratingly complex and disparate from the function should execute its function in isolation2 , thereby physical entities that we deal with in the various flavors of representing a “building block” or “basis element” of the “omics”. space of functions3. Novel functions may emerge from col- In line with our arguments above we suggest that an ap- lections of functional sub-units. Within a given experimental propriate definition of a functional unit should not make protocol we may be able to distinguish the function of higher explicit reference to a particular class of molecules. While level units from those of their components, thus functional determining the chemical composition is within the scope units can be nested within each other. Intuitively, we would of acceptable experimental protocols, a consequence of this like to correlate the gene with the elementary functional type of protocol is the disparate classification of molecules unit, i.e., a unit that cannot be understood as a collection with similar or identical functions, e.g. a protein enzyme vs. of functional units together with the emergent function(s) a ribozyme that catalyzes the same chemical reaction. It is arising from their combination. Whereas single molecules at least conceivable that the chemical implementation of a and/or molecular complexes and their interactions play the catalyst or regulator is irrelevant for a cell. Consequently, central role in molecular biology, researchers in other bio- functional units may just as well be of DNA nature. Op- logical disciplines might be more interested in higher order erators and other cis-regulatory elements behave much like functional units. Such a coarse-grained level of functional- regulatory genes when assayed with many procedures typ- ity could be represented by chemical reactions, interaction ically used in genetics. In such a context, we may well be networks, or phenotypic traits rather than products as func- obliged to treat them as functional units and consequently tional units. We suggest that each of these is a valid starting as genes. On the other hand, Developmentally Regulated point for a gene definition. DNA Rearrangements (DRDR) are not uncommon as mech- anisms of expression regulation throughout eukaryotes (Zu- fall et al., 2005). Ciliate genome processing (which inter- 2 Units, whose function(s) rely on input and/or communication of estingly is regulated by small RNAs (Garnier et al., 2004)), course need to be provided with this stimulus. 3 “Space” is used here in the formal mathematical sense as “a set 4 For simplicity of language we speak of the “genomic source” in- endowed with a certain abstract structure.” stead of the more general “encoding in the inheritable material”. 6 S.J. Prohaska & P.F. Stadler chromatin diminution (i.e., the selective elimination of por- A simple, but practically relevant implication of the dis- tions of chromosomes), the vertebrate immune system, and tinction between expressed products and functionally de- the amplification of rDNA genes are the most prominent ex- fined genes as advocated here, is that (at least at present) amples. DRDR is also involved in mating type switching genes are irrelevant for genome annotation. This statement in yeast and prokaryotic differentiation, see e.g. (Carrasco might be perceived as provocative. Nonetheless, we think et al., 1995). Hence processes operating on the genomic ma- there are good arguments to take such a radical step. Genome terial have to be included in the processing program. annotation, after all, is a pragmatic enterprise and hence has The boundaries of our genes as Heritable Elementary to concentrate on information that is readily available or can Functional Units are eventually determined by the underly- be generated with reasonable efforts. Therefore it is at least ing notion of function. Depending on this choice, genes may largely limited to the physical objects of the expression cas- or may not contain the information necessary to orchestrate cade and information such as binding sites. This informa- the production of the corresponding functional units from tion is about biochemical processes at best and is indepen- the heritable material. dent of the higher-level biological interpretation. Given the organization of the transcriptome as a complex structure of overlapping products in both reading directions (The EN- 6 Concluding Remarks CODE Project Consortium, 2007; Kapranov et al., 2007), it makes little sense to tie a functional interpretation or a In our discussion, we started from assumptions similar to disease relevance directly to a DNA position once the func- but less restrictive than those of the Genon Theory. We have tional product involved has been identified. There are, in- arrived at the definition of a gene as the pre-image of el- deed, an increasing number of examples where the same ementary functional units on the heritable material. Aban- DNA locus gives rise to different products with different doning the identification of function with a functional prod- functions (Ikeda et al., 2007; Bender, 2008). Of course, if uct, we highlight the logical separation between functions the information arose from a mutation or association study, (measured by some experimental protocol) and expression we can only map it to a DNA region, since we do not know products. Expression of products, as described in Section 2, the responsible “gene” or expression product. is understood as computation-like processing cascade that starts with the generation of a working copy of the inherita- ble genetic information. The understanding of the mechan- Acknowledgements We thank Brendy Alexander, Gene T. Onic, and Margarita A.T. Thepool for stimulating discussions on the gene con- ics of expression (or the corresponding computation) does cept in September 2007, Claudia Copland for comments and editing not require the notion of a gene at all. It is sufficient to con- assistance, and David Krakauer for suggestions on a preliminary ver- sider the processing products and their molecular interac- sion of this manuscript. tions. Indeed, a sufficiently detailed model of the expression processes is likely to be a good starting point to define func- tion, functional units, and eventually genes. References The precise meaning of the term “gene expression” re- mains elusive. Logically, it refers to the construction of func- Attiya H, Welsh J, 2004. Distributed Computing: Funda- tional units from their heritable source. Since genes are not mentals, Simulations, and Advanced Topics. New York: synonymous with “products in the expression cascade”, gene Wiley. expression is not synonymous with the processing of indi- Bender W, 2008. MicroRNAs in the Drosophila bithorax vidual transcripts (or other individual processing products). complex. Genes Dev 22:14–19. Instead, it must be understood as a composite of the ex- Brent MR, 2005. Genome annotation past, present, and fu- pression program governing the construction of the molec- ture: How to define an ORF at each locus. Genome Res ular components of the functional unit, together with addi- 15:1777–1786. tional interactions that are not encapsulated in any expressed Carninci P, 2006. Tagging mammalian transcription com- molecular product. A simple one-to-one relation between plexity. Trends Genetics 22:501–510. the chemical and logical expression programs exists only in Carrasco CD, Buettner JA, Golden JW, 1995. Programed limiting cases, for instance when functional units are identi- DNA rearrangement of a Cyanobacterial hupL gene in fied with polypeptides as in the Genon Theory. In general, it Heterocysts. Proc Natl Acad Sci USA 92:791–795. remains to be seen to what extent (logical) gene expression Danos V, Feret J, Fontana W, Harmer R, Krivine J, 2007. can be modeled in a computational framework analogous Rule-based modelling of cellular signalling. In: Caires L, to the physical expression of products (in the sense of sec- Vasconcelos VT, editors, CONCUR 2007 - Concurrency tion 2). Even if gene expression can be modeled in this way, Theory, 18th International Conference, vol. 4703 of Lec- it is not clear a priori how the relations between the physical ture Notes in Computer Science, (pp. 17–41). Heidelberg: and the logical expression program can be described. Springer. “Genes” 7 Danos V, Laneve C, 2004. Formal molecular biology. The- Beisel KW, Bult CJ, Fletcher CF, Forrest AR, Fu- oretical Computer Science 325:69–110. runo M, Hill D, Itoh M, Kanamori-Katayama M, El-Sharoud WM, Graumann PL, 2007. Cold shock proteins Katayama S, Katoh M, Kawashima T, Quackenbush J, aid coupling of transcription and translation in bacteria. Ravasi T, Ring BZ, Shibata K, Sugiura K, Takenaka Sci Prog 90:15–27. Y, Teasdale RD, Wells CA, Zhu Y, Kai C, Kawai J, Garnier O, Serrano V, Duharcourt S, Meyer E, 2004. RNA- Hume DA, Carninci P, Hayashizaki Y, 2006. Tran- mediated programming of developmental genome rear- script annotation in FANTOM3: Mouse gene catalog rangements in Paramecium tetraurelia. Mol Cell Biol based on physical cDNAs. PLoS Genetics 2:e62. 24:7370–7379. Doi:10.1371/journal.pgen.0020062. Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Kor- Mazo A, Hodgson JW, Petruk S, Sedkov Y, Brock HW, bel JO, Emanuelsson O, Zhang ZD, Weissman S, Snyder 2007. Transcriptional interference: an unexpected layer M, 2007. What is a gene, post-ENCODE? history and of complexity in gene regulation. J Cell Sci 120:2755– updated definition. Genome Res 17:669–681. 2761. Gowrishankar J, Harinarayanan R, 2004. Why is transcrip- Pheasant M, Mattick JS, 2007. Raising the estimate of func- tion coupled to translation in bacteria? Mol Microbiol tional human sequences. Genome Res 17:1245–1253. 54:598–603. Sambrook J, Russel D, 2001. Molecular Cloning: A Labo- Ikeda Y, Daughters RS, Ranum LP, 2007. Bidi- ratory Manual. Cold Spring Harbor: Cold Spinger Harbor rectional expression of the SCA8 expansion muta- Laboratory Press. tion: One mutation, two genes. Cerebellum Doi: Scherrer K, Jost J, 2007a. The gene and the genon concept: 10.1080/14734220701413781. A conceptual and information-theoretic analysis of ge- Isken O, Maquat LE, 2007. Quality control of eukaryotic netic storage and expression in the light of modern molec- mRNA: safeguarding cells from abnormal mRNA func- ular biology. Th Biosci 126:65–113. tion. Genes Dev 21:1833–1856. Scherrer K, Jost J, 2007b. The gene and the genon concept: Jacob F, Monod J, 1961. Genetic regulatory mechanisms in a functional and information-theoretic analysis. Mol Syst the synthesis of proteins. J Mol Biol 3:318–356. Biol 3:87. Kapranov P, Cheng J, Dike S, Nix D, Duttagupta R, Willing- Schlosser G, 2002. Modularity and the units of evolution. ham AT, Stadler PF, Hertel J, Hackerm¨uller J, Hofacker Theory in Biosciences 121:1–80. IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt Shearwin KE, Callen BP, Egan JB, 2005. Transcriptional G, Madhavan G, Piccolboni A, Sementchenko V, Tam- interference—a crash course. Trends Genet 21:339–345. mana H, Gingeras TR, 2007. RNA maps reveal new RNA Snyder M, Gerstein M, 2003. Genomics: Defining genes in classes and a possible function for pervasive transcription. the genomics era. Science 300:258–260. Science 316:1484–1488. Swinburne IA, Meyer CA, Liu XS, Silver PA, Brodsky AS, Kuttler C, Niehren J, 2006. Gene regulation in the Pi Calcu- 2006. Genomic localization of RNA binding proteins re- lus: Simulating cooperativity at the Lambda Switch. In: veals links between pre-mRNA processing and transcrip- Transactions on Computational Systems Biology VII, vol. tion. Genome Res 16:912–921. 4230 of Lecture Notes in Computer Science, (pp. 24–55). Tanaka RJ, Okano H, Kimura H, 2006. Mathematical de- Heidelberg: Springer Berlin. scription of gene regulatory units. Biophys J 91:1235– Kvasnicka V, Posp´ıchal J, 2002. Emergence of modularity 1247. in genotype-phenotype mappings. Artif Life 8:295–310. The ENCODE Project Consortium, 2007. Identification Lewin B, 1994. Genes V. Oxford, UK: Oxford Univ. Press. and analysis of functional elements in 1% of the human Listerman I, Sapra AK, Neugebauer KM, 2006. Cotran- genome by the ENCODE pilot project. Nature 447:799– scriptional coupling of splicing factor recruitment and 816. precursor messenger RNA splicing in mammalian cells. Wagner GP, Pavlicev M, Cheverud JM, 2007. The road to Nat Struct Mol Biol 13:815–822. modularity. Nat Rev Genet 8:921–931. Lolle SJ, Victor JL, Young JM, Pruitt RE, 2005. Genome- Willingham AT, Gingeras TR, 2006. TUF love for “junk” wide non-mendelian inheritance of extra-genomic infor- DNA. Cell 125:1215–1220. mation in Arabidopsis. Nature 434:505–509. Zufall RA, Robinson T, Katz LA, 2005. Evolution of devel- Maciag K, Altschuler SJ, Slack MD, Krogan NJ, Emili A, opmentally regulated genome rearrangements in eukary- Greenblatt JF, Maniatis T, Wu LF, 2006. Systems-level otes. J Exp Zool Mol Dev Evol 304B:448–455. analyses identify extensive coupling among gene expres- sion machines. Mol Syst Biol 3:0003. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, Engstr¨om PG, Lenhard B, Aturaliya RN, Batalov S,

References (36)

  1. Attiya H, Welsh J, 2004. Distributed Computing: Funda- mentals, Simulations, and Advanced Topics. New York: Wiley.
  2. Bender W, 2008. MicroRNAs in the Drosophila bithorax complex. Genes Dev 22:14-19.
  3. Brent MR, 2005. Genome annotation past, present, and fu- ture: How to define an ORF at each locus. Genome Res 15:1777-1786.
  4. Carninci P, 2006. Tagging mammalian transcription com- plexity. Trends Genetics 22:501-510.
  5. Carrasco CD, Buettner JA, Golden JW, 1995. Programed DNA rearrangement of a Cyanobacterial hupL gene in Heterocysts. Proc Natl Acad Sci USA 92:791-795.
  6. Danos V, Feret J, Fontana W, Harmer R, Krivine J, 2007. Rule-based modelling of cellular signalling. In: Caires L, Vasconcelos VT, editors, CONCUR 2007 -Concurrency Theory, 18th International Conference, vol. 4703 of Lec- ture Notes in Computer Science, (pp. 17-41). Heidelberg: Springer.
  7. Danos V, Laneve C, 2004. Formal molecular biology. The- oretical Computer Science 325:69-110.
  8. El-Sharoud WM, Graumann PL, 2007. Cold shock proteins aid coupling of transcription and translation in bacteria. Sci Prog 90:15-27.
  9. Garnier O, Serrano V, Duharcourt S, Meyer E, 2004. RNA- mediated programming of developmental genome rear- rangements in Paramecium tetraurelia. Mol Cell Biol 24:7370-7379.
  10. Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Kor- bel JO, Emanuelsson O, Zhang ZD, Weissman S, Snyder M, 2007. What is a gene, post-ENCODE? history and updated definition. Genome Res 17:669-681.
  11. Gowrishankar J, Harinarayanan R, 2004. Why is transcrip- tion coupled to translation in bacteria? Mol Microbiol 54:598-603.
  12. Ikeda Y, Daughters RS, Ranum LP, 2007. Bidi- rectional expression of the SCA8 expansion muta- tion: One mutation, two genes. Cerebellum Doi: 10.1080/14734220701413781.
  13. Isken O, Maquat LE, 2007. Quality control of eukaryotic mRNA: safeguarding cells from abnormal mRNA func- tion. Genes Dev 21:1833-1856.
  14. Jacob F, Monod J, 1961. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318-356.
  15. Kapranov P, Cheng J, Dike S, Nix D, Duttagupta R, Willing- ham AT, Stadler PF, Hertel J, Hackermüller J, Hofacker IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt G, Madhavan G, Piccolboni A, Sementchenko V, Tam- mana H, Gingeras TR, 2007. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316:1484-1488.
  16. Kuttler C, Niehren J, 2006. Gene regulation in the Pi Calcu- lus: Simulating cooperativity at the Lambda Switch. In: Transactions on Computational Systems Biology VII, vol. 4230 of Lecture Notes in Computer Science, (pp. 24-55). Heidelberg: Springer Berlin.
  17. Kvasnicka V, Pospíchal J, 2002. Emergence of modularity in genotype-phenotype mappings. Artif Life 8:295-310.
  18. Lewin B, 1994. Genes V. Oxford, UK: Oxford Univ. Press.
  19. Listerman I, Sapra AK, Neugebauer KM, 2006. Cotran- scriptional coupling of splicing factor recruitment and precursor messenger RNA splicing in mammalian cells. Nat Struct Mol Biol 13:815-822.
  20. Lolle SJ, Victor JL, Young JM, Pruitt RE, 2005. Genome- wide non-mendelian inheritance of extra-genomic infor- mation in Arabidopsis. Nature 434:505-509.
  21. Maciag K, Altschuler SJ, Slack MD, Krogan NJ, Emili A, Greenblatt JF, Maniatis T, Wu LF, 2006. Systems-level analyses identify extensive coupling among gene expres- sion machines. Mol Syst Biol 3:0003.
  22. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, Engström PG, Lenhard B, Aturaliya RN, Batalov S, Beisel KW, Bult CJ, Fletcher CF, Forrest AR, Fu- runo M, Hill D, Itoh M, Kanamori-Katayama M, Katayama S, Katoh M, Kawashima T, Quackenbush J, Ravasi T, Ring BZ, Shibata K, Sugiura K, Takenaka Y, Teasdale RD, Wells CA, Zhu Y, Kai C, Kawai J, Hume DA, Carninci P, Hayashizaki Y, 2006. Tran- script annotation in FANTOM3: Mouse gene catalog based on physical cDNAs. PLoS Genetics 2:e62. Doi:10.1371/journal.pgen.0020062.
  23. Mazo A, Hodgson JW, Petruk S, Sedkov Y, Brock HW, 2007. Transcriptional interference: an unexpected layer of complexity in gene regulation. J Cell Sci 120:2755- 2761.
  24. Pheasant M, Mattick JS, 2007. Raising the estimate of func- tional human sequences. Genome Res 17:1245-1253.
  25. Sambrook J, Russel D, 2001. Molecular Cloning: A Labo- ratory Manual. Cold Spring Harbor: Cold Spinger Harbor Laboratory Press.
  26. Scherrer K, Jost J, 2007a. The gene and the genon concept: A conceptual and information-theoretic analysis of ge- netic storage and expression in the light of modern molec- ular biology. Th Biosci 126:65-113.
  27. Scherrer K, Jost J, 2007b. The gene and the genon concept: a functional and information-theoretic analysis. Mol Syst Biol 3:87.
  28. Schlosser G, 2002. Modularity and the units of evolution. Theory in Biosciences 121:1-80.
  29. Shearwin KE, Callen BP, Egan JB, 2005. Transcriptional interference-a crash course. Trends Genet 21:339-345.
  30. Snyder M, Gerstein M, 2003. Genomics: Defining genes in the genomics era. Science 300:258-260.
  31. Swinburne IA, Meyer CA, Liu XS, Silver PA, Brodsky AS, 2006. Genomic localization of RNA binding proteins re- veals links between pre-mRNA processing and transcrip- tion. Genome Res 16:912-921.
  32. Tanaka RJ, Okano H, Kimura H, 2006. Mathematical de- scription of gene regulatory units. Biophys J 91:1235- 1247.
  33. The ENCODE Project Consortium, 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799- 816.
  34. Wagner GP, Pavlicev M, Cheverud JM, 2007. The road to modularity. Nat Rev Genet 8:921-931.
  35. Willingham AT, Gingeras TR, 2006. TUF love for "junk" DNA. Cell 125:1215-1220.
  36. Zufall RA, Robinson T, Katz LA, 2005. Evolution of devel- opmentally regulated genome rearrangements in eukary- otes. J Exp Zool Mol Dev Evol 304B:448-455.
About the author
Universität Leipzig, Faculty Member
Papers
38
Followers
22
View all papers from Sonja Prohaskaarrow_forward