“Genes”
Sonja J. Prohaska
Peter F. Stadler
SFI WORKING PAPER: 2008-03-011
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the
views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or
proceedings volumes, but not papers that have already appeared in print. Except for papers by our external
faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or
funded by an SFI grant.
©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure
timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights
therein are maintained by the author(s). It is understood that all persons copying this information will
adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only
with the explicit permission of the copyright holder.
www.santafe.edu
SANTA FE INSTITUTE
Preprint manuscript No.
(will be inserted by the editor)
“Genes”
Sonja J. Prohaska · Peter F. Stadler
February 11, 2008
Abstract In order to describe a cell at molecular level, a 1 Introduction
notion of a “gene” is neither necessary nor helpful. It is suf-
ficient to consider the molecules (i.e. chromosomes, tran- In a recent issue of this journal, Klaus Scherrer and J¨urgen
scripts, proteins) and their interactions to describe cellular Jost (Scherrer and Jost, 2007b) introduced an essentially com-
processes. The downside of the resulting high resolution is putational account of gene expression which introduces a
that it becomes very tedious to address features on the or- formal separation of the “gene” from the program that is re-
ganismal and phenotypic levels with a language based on quired to orchestrate its expression.
molecular terms. Looking for the missing link between bio- The Genon Theory presents a fresh and stimulating con-
logical disciplines dealing with different levels of biological tribution to a discussion of the “gene concept” that has re-
organization, we suggest to return to the original intent be- emerged in recent years in response to evidence of greater
hind the term “gene”. To this end, we propose to investigate genomic complexity than previous concepts of the gene are
whether a useful notion of “gene” can be constructed based able to accommodate. It has become increasingly obvious
on an underlying notion of function, and whether this can that the classical molecular concept of a gene as a contigu-
serve as the necessary link and embed the various distinct ous stretch of DNA encoding a functional product is incon-
gene concepts of biological (sub)disciplines in a coherent sistent with the complexity and diversity of genomic orga-
theoretical framework. nization (The ENCODE Project Consortium, 2007; Maeda
In reply to the Genon Theory recently put forward by Klaus et al., 2006; Carninci, 2006; Willingham and Gingeras, 2006).
Scherrer and J¨urgen Jost in this journal, we shall discuss a Many of the proposals from the “high-throughput commu-
general approach to assess a gene definition that should then nity” lean towards a purely structural point of view, focusing
be tested for its expressiveness and potential cross-discipli- on genes as structural units, often explicitly related to pro-
nary relevance. teins as the link to a functional interpretation (Snyder and
Gerstein, 2003; Gerstein et al., 2007). Dissenting opinions,
on the other hand, question the usefulness of “genes” in ge-
S.J. Prohaska
Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA; nomic context (Gerstein et al., 2007).
and The Genon Theory attempts to reconcile these views by
Department of Theoretical Chemistry, University of Vienna, advocating a functional, rather than structural, definition of
W¨ahringerstraße 17, A-1090 Wien, Austria;
E-mail:
[email protected]
the gene. While this is a welcome departure from the overly
simplistic view of “genes as protein-coding DNA”, it re-
P.F. Stadler
Bioinformatics Group, Department of Computer Science, and In-
mains oriented toward the simple representation of the “gene”
terdisciplinary Center for Bioinformatics, University of Leipzig, as a contiguous stretch of code. It deliberately excludes the
H¨artelstraße 16-18, D-04107 Leipzig, Germany; and complex collection of regulatory signals from the notion of
RNomics Group, Fraunhofer Institut for Cell Therapy and Immunol- the “gene” and instead interprets them as a program of gene
ogy (IZI), Deutscher Platz 5e, D-04103 Leipzig, Germany; and
Department of Theoretical Chemistry, University of Vienna,
expression, the “genon”. It is grounded in a number of fun-
W¨ahringerstraße 17, A-1090 Wien, Austria; and damental assumptions, some implicit and some explicit. Our
Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA discussion will start with these assumptions, which in sev-
E-mail:
[email protected] eral case are not satisfying. Instead of presenting a particu-
2 S.J. Prohaska & P.F. Stadler
lar fixed definition of what a gene “is”, we will explore here i.e., to associate the trans-genon with the gene of interest,
how a functional gene definition can be constructed depend- instead of interpreting the environment, including the rele-
ing on how the concept of “function” is formalized. vant trans-acting factors, as the result of other programs that
concurrently express their genes. This static view of a set of
“trans-acting” factors also fails to account for the fact that
2 Gene Expression as Computation
the expression of these factors is a dynamic process and will
typically not be in sync with the processing steps of the gene
The dichotomy of gene (data) and genon (program) is a fun-
of interest. We argue that specifying the collection of trans-
damental assumption regarding the nature of biological in-
acting factors is insufficient to determine the “external” part
formation processing that is logically suspicious. In Com-
of the program of gene expression because the temporal or-
puter Science, many of the familiar programming languages,
der in which they are produced and interact is crucial.
including C, BASIC, or FORTRAN, make a clear syntactic
distinction between data and program; functional program- Scherrer & Jost pre-suppose several properties of the
ming languages such as LISP and Haskell, on the other hand, process of gene expression. It is assumed to be determinis-
have no means at all for making this distinction. Since her- tic (at least under given environmental conditions), Marko-
itable biological information necessarily must encode both vian (in the sense that each processing step only requires
data and program, it is by no means clear that biological in- the result of the previous step as input), and to proceed in
formation processing is more like FORTRAN than LISP. a linear sequence of a few well-separated steps. Each of
As an alternative to the separation into genes and genons, these assumptions is an idealization. The last two proper-
a separation into genetic material (data) and the machin- ties together are necessary to justify the “Cascade of Regu-
ery (program) that orchestrates its expression could be in- lation” and to make the notions of pre-genon, proto-genon,
troduced. The latter respects an important intuitive property etc. well-defined. As the authors themselves note in (Scher-
of data, namely the simple transfer and substitution of (parts rer and Jost, 2007a), this assumption is often violated. Re-
of) the data. Similar to the platform-independence of data — cent evidence for a strong coupling for transcription, splic-
in contrast to often platform-dependent programs — nucleic ing, and export in higher eukaryotes (Listerman et al., 2006;
acids can be interpreted in a wide range of contexts. Biotech- Swinburne et al., 2006; Maciag et al., 2006), and the con-
nology, and cloning techniques in general (Sambrook and currency of transcription and translation in bacterial cells
Russel, 2001), take advantage of this property whenever a (Gowrishankar and Harinarayanan, 2004; El-Sharoud and
piece of genetic material is cloned into a vector and trans- Graumann, 2007) implies that some of the processing stages
ferred to a different organism. There a different machinery may never exist as discrete molecules. This blurs the bound-
evaluates the same sequence information and generates a aries between the individual steps.
product that is similar enough to the original context to be The separation of processing steps is, however, required
of practical use. to strictly distinguish cis- and trans- parts of the genon. When-
Notwithstanding the appealing intuition behind this dis- ever a processing step results in joining two fragments (e.g.
tinction, RNA components of the machinery inherited by an in trans-splicing), the element in trans becomes a cis-element
RNA molecule (as in the case of RNA viruses) pose a prob- after completing the step. The Markov property is also vio-
lem to this separation, because the same molecule would be lated by splicing and some export mechanisms that specif-
both data and program at the same time. Therefore, it re- ically attach proteins that remain bound to the RNA during
mains to be shown that an unambiguous partitioning of the the next maturation step(s). Again it becomes impossible to
molecular components into data and program is possible and strictly discriminate between cis- and trans-action. Exon-
that it results in a reasonable representation of biological re- junction complexes and export co-factors such as the RNA
ality. binding protein HuR are of course not encoded in the final
A central idea of Genon Theory is that one can speak mRNA, but regulation of the mRNA depends on their pres-
of a program that governs the expression of a gene. This ence and location in the pre-mRNA. This “annotation” is not
program is described as the union of the cis-genon, which seen in the final mRNA molecule, but is determined by the
is encoded by the same molecule(s) that carry the informa- molecule’s particular processing history.
tion of the gene, and the trans-genon. The latter is viewed The Genon Theory describes gene expression as a sim-
as the collection of all “trans-acting” factors that influence ple sequential program, thereby ignoring the network struc-
gene expression. The implicit assumption here is that the ex- ture of gene regulation. In our view, however, the network
pression of the gene of interest does not change its environ- architecture is the very essence of biological regulation. Within
ment in an appreciable manner, e.g., by using up some of the a framework that interprets gene expression as a computa-
trans-factors or by feeding back on the expression of these tional process, we suggest reformulation of the trans-genon
factors. Only in this limiting case does it make sense to view as communication with other gene expression processes. This
the environment as a static part of the expression program, leads in a rather natural way to a picture of gene expres-
“Genes” 3
Environment
ology, the concept of heritable genes is indispensable: we
need to be able to speak of homology — most commonly de-
complex formation
protein
modification
polypeptide
folding
fined as descent from a common ancestor — among genes.
transcription
Common ancestry of functional units is the main justifica-
RNA
folding
transcript
splicing
processing
transcription
tion for translational approaches that attempt to utilize infor-
transmission
replication
DNA
mation obtained for model organisms such as mouse or fruit-
fly to understand similar biological processes in humans.
Furthermore, it appears that genes are necessary to under-
Environment
stand the selection part of the evolutionary process: In order
Fig. 1 Cascade of Regulation: At each step, information content is not
only reduced but might also increase due to integration of informa-
to describe what selection does on a molecular level, only
tion provided by the surrounding. Global environmental factors (e.g. nucleotide sequences are required; to conceptualize the why,
gravity, latitude, temperature, tide etc.) as well as local environmental however, a functionally defined gene is at least very useful.
factors including localization, timing and interaction of products pro-
vide information to all steps of the cascade and establish a network Scherrer & Jost proceed to equate function with “func-
of communication. The influence of certain factors can be expected to tional products” derived from the genetic encoding: “A cel-
show great variation among organisms. Localization is suggested to lular function can be represented by a polypeptide or an
play an important role for many steps. The more environmental factors RNA”, “Genetic function is carried out by proteins com-
can be taken for granted, the less information needs to be encoded and
transmitted from step to step. posed of folded polypeptides”. Despite a section on RNA
genes, the text leaves no doubt that protein-coding genes are
considered the paradigm of genetic information processing;
sion as a Distributed Computing System (Attiya and Welsh, indeed, the Genon Theory fails to provide concepts to incor-
2004). To this end, we must give up the idea that there is porate non-protein-coding “genes” in general. A more im-
a single, independent program governing the expression of plicit assumption of the Genon Theory is the idea that pro-
each individual gene (one mRNA/gene – one genon hypoth- tein coding mRNAs are the most interesting and most impor-
esis). Instead, we need to model a collection of computa- tant type of products that are produced from DNA. In light
tional processes — one for each sequence of consecutive of the results of the ENCODE and FANTOM projects we
processing steps — that communicate via their trans-actions. reject this “proteinocentric” point of view. Protein-coding
Formal models of this type have recently been introduced sequence covers less than 2% of the genome, while approx-
in systems biology (Danos and Laneve, 2004; Danos et al., imately 10% is under stabilizing selection. This is at least
2007; Kuttler and Niehren, 2006) using π -calculus and re- indicative of some biological function. As almost all of this
lated formalisms. sequence is transcribed we have to assume that much of it
exerts its function as some processing product of the pri-
mary transcript, which is often not associated with any pro-
3 Genes sensu Jost & Scherrer tein (Pheasant and Mattick, 2007). From this point of view,
nothing about the mature mRNA stage is so special as to
The Genon Theory emphasizes a functional point of view warrant the definition of this stage, along with the regula-
and attempts to define the gene as a “basis of a unit func- tion of translation, as the focal point of biological informa-
tion”. It deliberately “give[s] up the correspondence of the tion processing.
gene as functional unit and as a DNA locus.” While there From these assumptions, Scherrer and Jost deduce that
are rules to map genes back to the genome, these rules are there is a single stage in the life of a transcript that lends
not considered a defining property of the gene. Heritability, itself to a natural definition of the gene, namely the last
on the other hand, is. Jost and Scherrer, though, seem to view processing product before translation: “[The gene] finally
heritability as irrelevant, arguing that modern molecular bi- emerges as an uninterrupted nucleic acid sequence at mRNA
ology is essentially about function. level, just prior to translation, in faithful correspondence
We strongly disagree with this view. The concept of the with the amino acid sequence to be produced as a polypep-
“Gene” is common ground to most disciplines of biology tide”. The gene concept thus coincides with the well-estab-
and historically has been instrumental in the synthesis of lished notion of “Open Reading Frame”. Consequently, there
subdisciplines, e.g. evolution and development. We there- are many more (protein-coding) genes than protein coding
fore argue that a meaningful notion of “Gene” cannot be loci (the authors estimate 500 000 vs. 25 000), since any two
constructed with only a particular sub-discipline in mind. mRNAs giving rise to distinct polypeptides (e.g. via alter-
Heritability is a crucial property since it is the purpose of native splicing) are counted as distinct genes. On the other
genomes to transmit the encoded instructions for generating hand, the expression of the same function (i.e., the same
functional units, instead of transmitting the functional units functional molecule) at different times or in different cells
themselves. Even within the scope of modern molecular bi- counts as a single gene.
4 S.J. Prohaska & P.F. Stadler
It is overly restrictive, however, to identify cellular func- as a set of (primary) transcripts. It seems that the gene def-
tions with directly encoded gene products. Several classes inition of Scherrer & Jost was also influenced by this trend:
of important molecules, all of which are “functional” (at even though introduced as a functional notion, a series of
least to most researchers), including steroid hormones, co- simplifying assumptions reduce it to another easily identifi-
enzymes, pigments, polysaccharides, etc., are not directly able genomic structure: the Open Reading Frame.
encoded, but are quite indirectly the consequence of genetic A purely structural definition of a gene in terms of a ge-
encoding. Conversely, the polypeptide that is obtained di- nomic “source”, however, does not seem useful to us. With-
rectly by decoding the mRNA is in many cases not func- out any reference to function, there is no way of singling
tional at all. It may need the assistance of chaperons to fold out a particular product of the regulatory cascade in general
into its active tertiary structure, it may need to be modi- or a specific processing stage of a transcript in particular.
fied, e.g. by glycosylation or other chemical modification, As the end-product of every transcript is eventually a small
or it may be cleaved or fused with other (possibly modified) degradation fragment, and presumably a single nucleotide,
peptide chains. More importantly, there are crucial regula- this approach does not lead to a meaningful definition. Al-
tory functions in which a process, e.g. the act of transcrip- ternatively, one might view every processing stage as a dif-
tion to modify the chromatin state (Shearwin et al., 2005; ferent transcript and consequently as a different gene. This
Mazo et al., 2007), or the act of initial translation to remove would just rename “transcript” to “gene” and the set of all
the exon-junction complexes (Isken and Maquat, 2007), is genes would become equivalent to the transcriptome. An-
crucial, while the associated products created by these pro- other approach is to define a gene as a collection of overlap-
cesses (a primary transcript and a polypeptide, respectively) ping transcripts. At least in eukaryotes, this leads to fairly
are completely irrelevant for all we know. large regions equivalent to genomic/transcriptional domains
On the other hand, function need not be associated with or, in the worst case, the whole genome, another trivial so-
the generation of a product at all, as is the case with cis- lution. Between these two extremes, Gerstein et al. (2007)
acting regulatory elements. A classical example is the lac consider genes as sets of overlapping transcripts that share
operator lacO (Jacob and Monod, 1961). Besides cis domi- open reading frames. As we have argued above, singling out
nance, this sequence shows properties similar to a regulatory particular processing stages or products is problematic since
gene and can be mapped to a DNA locus by means of physi- such a definition can be applied only to a (possibly small)
cal mapping just like a gene. The Genon Theory thus uses a subset of entities.
notion of “genetic” function that appears to be inconsistent
with the experimental evidence.
5 Genes Derived from Heritable Functional Units
We agree with Scherrer & Jost that a meaningful definition
4 Structural Gene Definitions of gene has to be based on a notion of function because
a purely structural gene definition is altogether dispensable
Less than 15 years ago, the influential textbook Genes V as we have seen above. In this section, we will briefly out-
(Lewin, 1994) defined: “Gene (cistron) is the segment of line a research agenda that may eventually lead to a use-
DNA involved in producing a polypeptide chain; it includes ful function-based gene concept — or to the realization that
regions preceding and following the coding region (leader such an endeavor cannot succeed.
and trailer) as well as intervening sequences (introns) be- First, we reject the idea of a one-to-one correspondence
tween individual coding segments (exons).” Older defini- of function and “gene-product”, which seems much more a
tions explicitly included promoters as part of the gene. Once vestige of the history of the gene concept than a property of a
it had been realized, however, that the regulatory sequence biological system. The appeal of the equivalence of function
associated with gene expression can be widely dispersed, and product is that it makes function “measurable” by virtue
many authors opted for viewing the “gene” as essentially of detecting the product. We have argued above, however,
synonymous to “protein-coding transcript” (Snyder and Ger- that the existence of a product does not imply that it has any
stein, 2003). function at all, and conversely, the same product may have
With the availability of large amounts of “omics” data, multiple and mechanistically diverse biochemical functions,
many authors have advocated various versions of structural depending on its context.
definitions of the gene that amount to collections of tran- Hence, we expand the notion of function and postulate
scripts, see e.g. (Snyder and Gerstein, 2003; Gerstein et al., that function must be measurable directly by some experi-
2007). The same approach is taken by current genome data- mental setup in finite time, and that one must be able to do
bases: within the ensemble1 framework, a gene is defined this in such a way that functional equivalence can be deter-
mined. What constitutes a function, and whether two func-
1 www.ensembl.org tions are distinguishable from each other, therefore depends
“Genes” 5
on an experimental (or computational) procedure, which we In contrast to the Genon Theory, we postulate that genes
will for short call a “measurement” in the following. Differ- are heritable and therefore need to be part of the inherited
ent procedures may represent “biological importance” more material. In 1952, Hershey and Chase found that the “in-
or less well. Time-honored procedures such as the classical structions” for functional units are made of genetic mate-
complementation test of molecular genetics or the observa- rial, nucleic acid in general, DNA if present. However, ex-
tion of the developmental effects of gene knock-outs are pro- ceptions to this rule are well known, e.g., epigenes, protein-
cedures that have proven useful. The approach of the Genon based inheritance (i.e. centriols and prions) and RNA-based
Theory, namely to determine whether a stretch of DNA is inheritance (Lolle et al., 2005) do instruct heritable func-
eventually translated into a polypeptide is yet another possi- tional units. Heritability is determined by the process of in-
ble way to measure. We view computational approaches as heritance, a sequence of reproduction and segregation. We
yet another procedure to assess information about function. may or may not want to restrict the concept of genes to enti-
Of course, as with any “functional test”, all these procedures ties that are inherited in a particular way, namely by means
come with inherent limitations and the possibility of false of the genetic material that comprises the genome.
positive and negative results. Such results may eventually A formal mathematical investigation of this schema should
lead to erroneous conclusions about particular “genes”. This eventually be able to relate elementary functional units to
is, however, also true for seemingly straightforward proce- their source in the inherited material. If a function-based
dures such as the assignment of ORFs (Brent, 2005), and gene concept is feasible at all, such a mapping is the in-
does not affect the conceptual framework. dispensable pre-requisite for genes to become a useful no-
Entire cells, organs, and organisms certainly convey func- tion for molecular biology. We suspect that such a mapping
tion. Thus we would not want to be forced to call everything is not necessarily possible for all underlying definitions of
that has a measurable function a “gene”. Just as Scherrer & “function”, “unit” and/or their combinations. It is even con-
Jost do, we consider a gene a unit of function. The nature ceivable that such a mapping can never be constructed, in
of units, modules and their mutual relationships is a field which case we will have to abandon the notion of “func-
of lively debate in theoretical biology, see e.g. (Kvasnicka tional genes”. Even if we can construct the map, there is no
and Posp´ıchal, 2002; Tanaka et al., 2006; Schlosser, 2002; guarantee that the genomic source 4 corresponding to a par-
Wagner et al., 2007), which we will not enter here. Instead, ticular definition of functional unit will show properties that
we use the term “unit” in a broad sense: A unit should show we would expect or desire from a gene. In particular, the
stronger cohesion to itself than to other components, thereby genomic representation of our functionally defined genes
ensuring its integrity in isolation. Consequently, a unit of may well be frustratingly complex and disparate from the
function should execute its function in isolation2 , thereby physical entities that we deal with in the various flavors of
representing a “building block” or “basis element” of the “omics”.
space of functions3. Novel functions may emerge from col- In line with our arguments above we suggest that an ap-
lections of functional sub-units. Within a given experimental propriate definition of a functional unit should not make
protocol we may be able to distinguish the function of higher explicit reference to a particular class of molecules. While
level units from those of their components, thus functional determining the chemical composition is within the scope
units can be nested within each other. Intuitively, we would of acceptable experimental protocols, a consequence of this
like to correlate the gene with the elementary functional type of protocol is the disparate classification of molecules
unit, i.e., a unit that cannot be understood as a collection with similar or identical functions, e.g. a protein enzyme vs.
of functional units together with the emergent function(s) a ribozyme that catalyzes the same chemical reaction. It is
arising from their combination. Whereas single molecules at least conceivable that the chemical implementation of a
and/or molecular complexes and their interactions play the catalyst or regulator is irrelevant for a cell. Consequently,
central role in molecular biology, researchers in other bio- functional units may just as well be of DNA nature. Op-
logical disciplines might be more interested in higher order erators and other cis-regulatory elements behave much like
functional units. Such a coarse-grained level of functional- regulatory genes when assayed with many procedures typ-
ity could be represented by chemical reactions, interaction ically used in genetics. In such a context, we may well be
networks, or phenotypic traits rather than products as func- obliged to treat them as functional units and consequently
tional units. We suggest that each of these is a valid starting as genes. On the other hand, Developmentally Regulated
point for a gene definition. DNA Rearrangements (DRDR) are not uncommon as mech-
anisms of expression regulation throughout eukaryotes (Zu-
fall et al., 2005). Ciliate genome processing (which inter-
2 Units, whose function(s) rely on input and/or communication of
estingly is regulated by small RNAs (Garnier et al., 2004)),
course need to be provided with this stimulus.
3 “Space” is used here in the formal mathematical sense as “a set 4 For simplicity of language we speak of the “genomic source” in-
endowed with a certain abstract structure.” stead of the more general “encoding in the inheritable material”.
6 S.J. Prohaska & P.F. Stadler
chromatin diminution (i.e., the selective elimination of por- A simple, but practically relevant implication of the dis-
tions of chromosomes), the vertebrate immune system, and tinction between expressed products and functionally de-
the amplification of rDNA genes are the most prominent ex- fined genes as advocated here, is that (at least at present)
amples. DRDR is also involved in mating type switching genes are irrelevant for genome annotation. This statement
in yeast and prokaryotic differentiation, see e.g. (Carrasco might be perceived as provocative. Nonetheless, we think
et al., 1995). Hence processes operating on the genomic ma- there are good arguments to take such a radical step. Genome
terial have to be included in the processing program. annotation, after all, is a pragmatic enterprise and hence has
The boundaries of our genes as Heritable Elementary to concentrate on information that is readily available or can
Functional Units are eventually determined by the underly- be generated with reasonable efforts. Therefore it is at least
ing notion of function. Depending on this choice, genes may largely limited to the physical objects of the expression cas-
or may not contain the information necessary to orchestrate cade and information such as binding sites. This informa-
the production of the corresponding functional units from tion is about biochemical processes at best and is indepen-
the heritable material. dent of the higher-level biological interpretation. Given the
organization of the transcriptome as a complex structure of
overlapping products in both reading directions (The EN-
6 Concluding Remarks
CODE Project Consortium, 2007; Kapranov et al., 2007),
it makes little sense to tie a functional interpretation or a
In our discussion, we started from assumptions similar to
disease relevance directly to a DNA position once the func-
but less restrictive than those of the Genon Theory. We have
tional product involved has been identified. There are, in-
arrived at the definition of a gene as the pre-image of el-
deed, an increasing number of examples where the same
ementary functional units on the heritable material. Aban-
DNA locus gives rise to different products with different
doning the identification of function with a functional prod-
functions (Ikeda et al., 2007; Bender, 2008). Of course, if
uct, we highlight the logical separation between functions
the information arose from a mutation or association study,
(measured by some experimental protocol) and expression
we can only map it to a DNA region, since we do not know
products. Expression of products, as described in Section 2,
the responsible “gene” or expression product.
is understood as computation-like processing cascade that
starts with the generation of a working copy of the inherita-
ble genetic information. The understanding of the mechan- Acknowledgements We thank Brendy Alexander, Gene T. Onic, and
Margarita A.T. Thepool for stimulating discussions on the gene con-
ics of expression (or the corresponding computation) does cept in September 2007, Claudia Copland for comments and editing
not require the notion of a gene at all. It is sufficient to con- assistance, and David Krakauer for suggestions on a preliminary ver-
sider the processing products and their molecular interac- sion of this manuscript.
tions. Indeed, a sufficiently detailed model of the expression
processes is likely to be a good starting point to define func-
tion, functional units, and eventually genes. References
The precise meaning of the term “gene expression” re-
mains elusive. Logically, it refers to the construction of func- Attiya H, Welsh J, 2004. Distributed Computing: Funda-
tional units from their heritable source. Since genes are not mentals, Simulations, and Advanced Topics. New York:
synonymous with “products in the expression cascade”, gene Wiley.
expression is not synonymous with the processing of indi- Bender W, 2008. MicroRNAs in the Drosophila bithorax
vidual transcripts (or other individual processing products). complex. Genes Dev 22:14–19.
Instead, it must be understood as a composite of the ex- Brent MR, 2005. Genome annotation past, present, and fu-
pression program governing the construction of the molec- ture: How to define an ORF at each locus. Genome Res
ular components of the functional unit, together with addi- 15:1777–1786.
tional interactions that are not encapsulated in any expressed Carninci P, 2006. Tagging mammalian transcription com-
molecular product. A simple one-to-one relation between plexity. Trends Genetics 22:501–510.
the chemical and logical expression programs exists only in Carrasco CD, Buettner JA, Golden JW, 1995. Programed
limiting cases, for instance when functional units are identi- DNA rearrangement of a Cyanobacterial hupL gene in
fied with polypeptides as in the Genon Theory. In general, it Heterocysts. Proc Natl Acad Sci USA 92:791–795.
remains to be seen to what extent (logical) gene expression Danos V, Feret J, Fontana W, Harmer R, Krivine J, 2007.
can be modeled in a computational framework analogous Rule-based modelling of cellular signalling. In: Caires L,
to the physical expression of products (in the sense of sec- Vasconcelos VT, editors, CONCUR 2007 - Concurrency
tion 2). Even if gene expression can be modeled in this way, Theory, 18th International Conference, vol. 4703 of Lec-
it is not clear a priori how the relations between the physical ture Notes in Computer Science, (pp. 17–41). Heidelberg:
and the logical expression program can be described. Springer.
“Genes” 7
Danos V, Laneve C, 2004. Formal molecular biology. The- Beisel KW, Bult CJ, Fletcher CF, Forrest AR, Fu-
oretical Computer Science 325:69–110. runo M, Hill D, Itoh M, Kanamori-Katayama M,
El-Sharoud WM, Graumann PL, 2007. Cold shock proteins Katayama S, Katoh M, Kawashima T, Quackenbush J,
aid coupling of transcription and translation in bacteria. Ravasi T, Ring BZ, Shibata K, Sugiura K, Takenaka
Sci Prog 90:15–27. Y, Teasdale RD, Wells CA, Zhu Y, Kai C, Kawai J,
Garnier O, Serrano V, Duharcourt S, Meyer E, 2004. RNA- Hume DA, Carninci P, Hayashizaki Y, 2006. Tran-
mediated programming of developmental genome rear- script annotation in FANTOM3: Mouse gene catalog
rangements in Paramecium tetraurelia. Mol Cell Biol based on physical cDNAs. PLoS Genetics 2:e62.
24:7370–7379. Doi:10.1371/journal.pgen.0020062.
Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Kor- Mazo A, Hodgson JW, Petruk S, Sedkov Y, Brock HW,
bel JO, Emanuelsson O, Zhang ZD, Weissman S, Snyder 2007. Transcriptional interference: an unexpected layer
M, 2007. What is a gene, post-ENCODE? history and of complexity in gene regulation. J Cell Sci 120:2755–
updated definition. Genome Res 17:669–681. 2761.
Gowrishankar J, Harinarayanan R, 2004. Why is transcrip- Pheasant M, Mattick JS, 2007. Raising the estimate of func-
tion coupled to translation in bacteria? Mol Microbiol tional human sequences. Genome Res 17:1245–1253.
54:598–603. Sambrook J, Russel D, 2001. Molecular Cloning: A Labo-
Ikeda Y, Daughters RS, Ranum LP, 2007. Bidi- ratory Manual. Cold Spring Harbor: Cold Spinger Harbor
rectional expression of the SCA8 expansion muta- Laboratory Press.
tion: One mutation, two genes. Cerebellum Doi: Scherrer K, Jost J, 2007a. The gene and the genon concept:
10.1080/14734220701413781. A conceptual and information-theoretic analysis of ge-
Isken O, Maquat LE, 2007. Quality control of eukaryotic netic storage and expression in the light of modern molec-
mRNA: safeguarding cells from abnormal mRNA func- ular biology. Th Biosci 126:65–113.
tion. Genes Dev 21:1833–1856. Scherrer K, Jost J, 2007b. The gene and the genon concept:
Jacob F, Monod J, 1961. Genetic regulatory mechanisms in a functional and information-theoretic analysis. Mol Syst
the synthesis of proteins. J Mol Biol 3:318–356. Biol 3:87.
Kapranov P, Cheng J, Dike S, Nix D, Duttagupta R, Willing- Schlosser G, 2002. Modularity and the units of evolution.
ham AT, Stadler PF, Hertel J, Hackerm¨uller J, Hofacker Theory in Biosciences 121:1–80.
IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt Shearwin KE, Callen BP, Egan JB, 2005. Transcriptional
G, Madhavan G, Piccolboni A, Sementchenko V, Tam- interference—a crash course. Trends Genet 21:339–345.
mana H, Gingeras TR, 2007. RNA maps reveal new RNA Snyder M, Gerstein M, 2003. Genomics: Defining genes in
classes and a possible function for pervasive transcription. the genomics era. Science 300:258–260.
Science 316:1484–1488. Swinburne IA, Meyer CA, Liu XS, Silver PA, Brodsky AS,
Kuttler C, Niehren J, 2006. Gene regulation in the Pi Calcu- 2006. Genomic localization of RNA binding proteins re-
lus: Simulating cooperativity at the Lambda Switch. In: veals links between pre-mRNA processing and transcrip-
Transactions on Computational Systems Biology VII, vol. tion. Genome Res 16:912–921.
4230 of Lecture Notes in Computer Science, (pp. 24–55). Tanaka RJ, Okano H, Kimura H, 2006. Mathematical de-
Heidelberg: Springer Berlin. scription of gene regulatory units. Biophys J 91:1235–
Kvasnicka V, Posp´ıchal J, 2002. Emergence of modularity 1247.
in genotype-phenotype mappings. Artif Life 8:295–310. The ENCODE Project Consortium, 2007. Identification
Lewin B, 1994. Genes V. Oxford, UK: Oxford Univ. Press. and analysis of functional elements in 1% of the human
Listerman I, Sapra AK, Neugebauer KM, 2006. Cotran- genome by the ENCODE pilot project. Nature 447:799–
scriptional coupling of splicing factor recruitment and 816.
precursor messenger RNA splicing in mammalian cells. Wagner GP, Pavlicev M, Cheverud JM, 2007. The road to
Nat Struct Mol Biol 13:815–822. modularity. Nat Rev Genet 8:921–931.
Lolle SJ, Victor JL, Young JM, Pruitt RE, 2005. Genome- Willingham AT, Gingeras TR, 2006. TUF love for “junk”
wide non-mendelian inheritance of extra-genomic infor- DNA. Cell 125:1215–1220.
mation in Arabidopsis. Nature 434:505–509. Zufall RA, Robinson T, Katz LA, 2005. Evolution of devel-
Maciag K, Altschuler SJ, Slack MD, Krogan NJ, Emili A, opmentally regulated genome rearrangements in eukary-
Greenblatt JF, Maniatis T, Wu LF, 2006. Systems-level otes. J Exp Zool Mol Dev Evol 304B:448–455.
analyses identify extensive coupling among gene expres-
sion machines. Mol Syst Biol 3:0003.
Maeda N, Kasukawa T, Oyama R, Gough J, Frith M,
Engstr¨om PG, Lenhard B, Aturaliya RN, Batalov S,