Cogn Process (2005)
DOI 10.1007/s10339-005-0017-7
R ES E AR C H R E P OR T
Gerard O’Brien Æ Jon Opie
How do connectionist networks compute?
Received: 13 July 2005 / Revised: 19 July 2005Accepted: 19 July 2005
Ó Marta Olivetti Belardinelli and Springer-Verlag 2005
Abstract Although connectionism is advocated by its mind nearly 20 years ago.1 Yet, despite all that has
proponents as an alternative to the classical computa- since been written about this approach to cognition,
tional theory of mind, doubts persist about its compu- we still lack a satisfactory account of how connec-
tational credentials. Our aim is to dispel these doubts by tionist networks compute. Into this vacuum has crept
explaining how connectionist networks compute. We doubt about connectionism’s computational credentials.
first develop a generic account of computation—no easy This doubt takes three forms. One is the view that
task, because computation, like almost every other connectionism, far from being a rival computational
foundational concept in cognitive science, has resisted paradigm, is nothing more than a modern version of
canonical definition. We opt for a characterisation that associationism, and hence suffers from all the well-
does justice to the explanatory role of computation in known vices of this much older position.2 A second is
cognitive science. Next we examine what might be re- the claim that while connectionists typically interpret
garded as the ‘‘conventional’’ account of connectionist the states and activity of connectionist networks in
computation. We show why this account is inadequate representational terms, closer scrutiny reveals that
and hence fosters the suspicion that connectionist net- these putative representations fail to do any explana-
works are not genuinely computational. Lastly, we turn tory work, and since there is ‘‘no computation without
to the principal task of the paper: the development of a representation’’ (Pylyshyn 1984, p. 62), the connec-
more robust portrait of connectionist computation. The tionist framework is better interpreted non-computa-
basis of this portrait is an explanation of the represen- tionally.3 And a third is the suggestion that
tational capacities of connection weights, supported by connectionist networks are better characterised as
an analysis of the weight configurations of a series of dynamical systems rather than computational devices.4
simulated neural networks. If connectionism is ever to stand as a serious alter-
native to the classical computational theory of mind, this
Keywords Computation Æ Connectionism Æ doubt must be dispelled. And the only way to do this is
Representation Æ Resemblance to explain how connectionist networks compute. That is
the task we have set ourselves in this paper. We begin by
developing a generic account of computation—no easy
Introduction task, since like almost every other foundational concept
in cognitive science, computation has resisted canonical
Connectionism was first widely recognised as a definition. In the face of this problem, we opt for a
potential rival to the classical computational theory of
1
We use ‘‘connectionism’’ generically to denote the explanatory
framework that models human perceptual and cognitive processes
in terms of the operations of neuron-like processing units con-
nected together to form neural-like networks. While this explana-
Communicated by John Sutton tory framework has antecedents running back more than 50 years,
we take the appearance of Rumelhart and McClelland (1986) and
G. O’Brien (&) Æ J. Opie McClelland and Rumelhart (1986) as the moment when connec-
Discipline of Philosophy, University of Adelaide, tionism, in the guise of parallel distributed processing, came of age.
2
5005 Adelaide, SA, Australia This claim is most famously associated with Fodor (e.g. Fodor
E-mail:
[email protected] and Pylyshyn 1988; Fodor 2000, Ch. 3) but it pops up in a number
URL: http://arts.adelaide.edu.au/humanities/gobrien/ of different places (Pinker 1997, pp. 112–131 and 2002, pp.78–83).
3
Tel.: +61-8-8303-5298 See, e.g. Ramsey (1997).
4
Fax: +61-8-8303-5241 See, e.g. the various contributions to Port and Van Gelder (1995).
characterisation that captures the intended role of arrange things that these state changes are rational
computation in cognitive science. Next we examine what in the sense that, given a true symbol to play with,
might be regarded as the ‘‘conventional’’ account of the machine will reliably covert it into other symbols
connectionist computation. We show why this account is that are also true’’. (Fodor 1992, p. 6)
inadequate and hence fosters the kinds of doubt we have
just enumerated. We then turn to the principal task of The rest, as one says, is history. Cognitive science
the paper: the development of a more robust portrait of emerged as a discipline (or at least, a ‘‘multi-discipline’’)
connectionist computation. The basis of this portrait is in the 1950s. What was novel about cognitive science (as
an explanation of the representational capacities of opposed to those already established disciplines that
connection weights, supported by an analysis of the study the mind, including neuroscience, psychology and
weight configurations of a series of simulated neural philosophy) was its commitment to the computational
networks. Once this explanation is in place, it will be theory of mind: the idea that cognitive processes are the
apparent how connectionist networks compute. symbol manipulations of a neurally realised digital
computer.
At its inception, cognitive science thus embraced a
Toward a proper understanding of computation Turing-inspired understanding of computation. Com-
in cognitive science putation is what happens in a digital computer: a causal/
mechanical process in which language-like representing
Computation is a concept so overused and so variously vehicles are recognised and transformed in a semanti-
defined that we sometimes despair of it ever being cally coherent fashion purely on the basis of their syn-
meaningfully deployed. And yet we also believe that tactic properties.
computation is the most important concept in all of The problem with this characterisation is that it
cognitive science. Indeed, we would argue that without pays scant attention to the history of computer science.
the concept of computation there is no cognitive (as For more than 2000 years, theorists and practitioners
distinct from behavioural, psychological, biological, or have recognised a distinction between two forms of
just plain physical) science in the first place. So some- computation: digital computation, admirably formalised
thing must be done. by Turing and others, and analog computation. The
In all that has been written about computation in latter currently lacks a precise formal definition, but a
cognitive science, two extreme characterisations are quick survey of computer science textbooks of the
discernible. At one extreme is a depiction of computa- 1950s and 60s reveals an intuitively clear demarcation:
tion in terms of the symbol manipulations of a digital while digital computers employ semantically inert
computer; at the other is the claim that computation is symbols (tokens that bear no resemblance to what they
simply a matter of implementing a function. We’ll briefly represent), analog computers employ internal models
say what’s wrong with these proposals, before develop- that physically or structurally resemble their repre-
ing a middle ground characterisation that does justice to sentanda.5 Analog computation is thus not properly
the explanatory role of computation in cognitive science. conceived as symbol manipulation, but as a physical
process driven by the structural properties of analog
Computation as symbol manipulation representational media.
From this perspective, Turing’s great achievement
Jerry Fodor is fond of remarking that there is only one was not that of conceiving the idea of computation, but
important idea about how the mind works that anybody of developing one very powerful means of mechanising
has ever had. This idea he attributes to Alan Turing: computational processes. Drawing this distinction
is important because once it is clear that the idea of
[G]iven the methodological commitment to materi- computation is distinct from Turing’s account of how
alism, the question arises, how a machine could be computational processes might be mechanised, it is
rational?...Forty years or so ago, the great logician possible to investigate the former independent of the
Alan Turing proposed an answer to this ques- latter. What we therefore require is a characterisation of
tion...Turing noticed that it isn’t strictly true that computation that captures more (if not all) of those
states of mind are the only semantically evaluable processes that have earned this epithet over the last
material things. The other kind of material thing 2000 years.
that is semantically evaluable is symbols... Having
noticed this parallelism between thoughts and sym-
Computation as implementing a function
bols, Turing went on to have the following perfectly
stunning idea. ‘‘I’ll bet’’, Turing (more or less) said,
In response to this demand, a different kind of charac-
‘‘that one could build a symbol manipulating machine
terisation of computation is now popular in cognitive
whose changes of state are driven by the material
properties of the symbols on which they operate (for 5
See Truitt and Rogers (1960) for both a semi-formal account of
example, by their weight, or their shape, or their analog computation along these lines, and for numerous examples
electrical conductivity). And I’ll bet one could so of analog computers.
science. For example, in an influential article Church- Chalmers’ reasoning fails to reassure, however. The
land et al. (1990) have this to say on the subject: concept of computation was originally introduced as a
way of distinguishing two classes of causal processes:
In a most general sense, we can consider a physical those characteristic of the vast majority of physical
system as a computational system just in case there systems (e.g. intestines, microwave ovens, cups of tea,
is an appropriate (revealing) mapping between some etc.), and those that are the preserve of intelligent sys-
algorithm and associated physical variables. More tems alone. Computational processes are supposed to be
exactly, a physical system computes a function f(x) special in some way—in a way, moreover, that provides
when there is (1) a mapping between the system’s us with some explanatory purchase with respect to the
physical inputs and x, (2) a mapping between the problem of intelligent behaviour. Since implementing a
system’s physical outputs and y, such that (3) function is a ubiquitous feature of nature, choosing to
f(x)=y. (1990) characterise computation in this way repudiates the very
motivation for introducing the concept into cognitive
In this passage, however, it is not obvious that the science in the first place.
reference to an ‘‘appropriate (revealing)’’ mapping is
doing any real work. Once this is removed, what remains
is the proposal that a computation is performed by some Computation as content-shaped causal processing
physical system just in case its causal operation can be
interpreted as implementing some function. Chalmers What we need is a way of characterising computation
(1994) summarises the idea as follows: that limns a middle path between the restrictiveness of
A physical system implements a given computation digital computation and the promiscuity of abstract
when there exists a grouping of physical states of the causal organisation. One way to do this is to re-visit the
system into state-types and a one-to-one mapping account of computation we get from digital computers,
from formal states of the computation to physical and consider whether this can be liberalised to some
state-types, such that formal states related by an ab- degree without falling prey to the problem of explana-
stract state-transition relation are mapped onto tory vacuity.
physical state-types related by a corresponding causal Digital computation, remember, is symbol manipu-
state-transition relation. (1994, p. 392) lation: a causal/mechanical process in which language-
like representing vehicles are recognised and trans-
The bottom line here, according to Chalmers, is that formed in a semantically coherent fashion purely on the
computation is simply ‘‘ an abstract specification of basis of their syntactic properties. But as already re-
causal organisation’’ (1994, p. 396, emphasis in original). marked, the practice of computation has not historically
This characterisation does satisfy the desideratum we been restricted to processes defined over symbols. Con-
mooted above, given that it captures both analog and sider, for example, the familiar tactic of representing a
digital computation in its net. But it does so at a very physical variable, such as the velocity of a particle, using
great cost. Since all law-governed physical systems (and, a curve on the plane. If we plot velocity on one axis, and
granting determinism, this equates with all physical time on the other, it is possible to compute distance
systems) are interpretable as implementing some func- travelled by measuring the area under the curve, or
tion or other, we arrive at the unwelcomed conclusion acceleration by constructing tangents to the curve. These
that all physical systems are computational. And that are examples of analog computations which employ a
would appear to render the concept of computation in non-symbolic representing vehicle.
cognitive science explanatorily vacuous. Viewed from this less-restrictive perspective, there are
Chalmers, for one, resists this conclusion: two distinctive features of computational processes (as
opposed to causal processes in general). First, they are
This objection expresses the feeling that if every associated with representing vehicles of some kind.
process, including such things as digestion and oxi- Second, and more importantly, computational processes
dation, implements some computation, then there are shaped by the contents of the very representations
seems to be nothing special about cognition any more, they implicate. We thus arrive at the following charac-
as computation is so pervasive. This objection rests on terisation:
a misunderstanding. It is true that any given instance
of digestion will implement some computation, as any Computations are causal processes that implicate one
physical system does, but the system’s implementing or more representing vehicles, such that their trajectory
this computation is in general irrelevant to its being is shaped by the representational contents of those
an instance of digestion.... With cognition, by con- vehicles.
trast, the claim is that it is in virtue of implementing Talk of representational content ‘‘shaping’’ the causal
some computation that a system is cognitive. That is, trajectory of computation is vague, of course. But this
there is a certain class of computations such that any is deliberate. Prima facie, there are different ways of
system implementing that computation is cognitive organising physical systems such that representational
(1994, p. 397).
content can play this role. In the case of digital sys-
tems, while computational operations only ever have
The answer that Haugeland goes on to develop is the
access to the syntactic properties of symbols, the rules
fundamental basis of digital computation:
that govern these syntactic manipulations are none-
theless carefully crafted so as to ensure that they re- The idea...is to design these formal systems so that
spect the contents of the symbols. In Dennett’s they can be interpreted as axiomatic systems in the
memorable terms: digital computers are syntactic en- intuitive sense. That requires two things of the system
gines that behave as if they were semantic engines (as interpreted);
(Dennett 1987, p. 61). Analog computers, by contrast,
1. the axioms should be true...; and
are systems whose behaviour is driven not by content-
sensitive rules, but by semantically ‘‘active’’ analog 2. the rules should be truth preserving (1985,
representations that physically or structurally resemble pp. 103–105).
what they represent.6 In this light, the whole point of Haugeland’s formalists’
Although this strategy of characterising computation motto is to reinforce the message that it is only when the
in terms of operations shaped by representational contents syntactically specified rules of the system are so crafted
is quite common in the literature7 it does not find favour that they satisfy these semantic constraints, that ‘‘the
everywhere. Chalmers, for instance, has this to say: semantics will take care of itself’’.
The original account of Turing machines by Turing This isn’t just an exercise in academic exegesis.
(1936) certainly had no semantic constraints built in. A Understanding the role of representational content in
Turing machine is defined purely in terms of the shaping computational processes is pivotal to under-
mechanisms involved, that is, in terms of syntactic standing why the concept of computation arose in the
patterns and the way they are transformed.... To first place. Intelligence is a rare commodity, and one that
implement a Turing machine, we need only ensure that provokes a profound question: how is that some physi-
this formal structure is reflected in the causal structure cal systems are capable of intelligent behaviour when the
of the implementation.... [W]hen computer designers majority of systems in the universe are not? The concept
ensure that their machines implement the programs of computation is supposed to provide some leverage
that they are supposed to, they do this by ensuring that here—intelligent systems are special because they alone
the mechanisms have the right causal organisation; they engage in computation. But this answer won’t suffice
are not concerned with semantic content. In the words unless computational processes are themselves special.
of Haugeland (1985), if you take care of the syntax, the The characterisation developed above explains why they
semantics will take care of itself (1994, p. 399). are (computational processes are shaped by the repre-
sentational contents of the vehicles they implicate) and
In our view, this represents a profound misreading of hence explains why the concept of computation is
both Turing and Haugeland. Far from eschewing foundational for cognitive science.
semantic considerations, computer science is in the With this characterisation of computation in place we
business of designing and implementing formal opera- can now turn to the principal task of the paper: that of
tions that satisfy semantic constraints. In the passage of explaining how connectionist systems compute. To sat-
his classic text just prior to articulating his famous isfy this task we will need to show how representational
‘‘formalists’ motto’’ (quoted approvingly by Chalmers), content plays a role in shaping the trajectory of con-
Haugeland takes himself to be addressing the following nectionist computational processes.
question:
Interpretation and semantics transcend the strictly Connectionist computation: what’s wrong
formal—because formal systems as such must be self- with the conventional story?
contained. Hence to regard formal tokens as symbols
is to see them in a new light: semantic properties are It is possible to identify something of a consensus
not and cannot be syntactical properties. To put it among proponents of connectionism as to the nature
dramatically, interpreted formal tokens lead two lives: of computation in connectionist networks. The argu-
SYNTACTICAL LIVES, in which they are mean- mentative burden of this section is to establish that
ingless markers, moved according to the rules of some this ‘‘conventional’’ account of connectionist compu-
self-contained game; and SEMANTIC LIVES, in tation is unsatisfactory, and to explain why it has
which they have meanings and symbolic relations to nurtured doubts about connectionism’s computational
the outside world. The corresponding dramatic credentials.
question then is this: how do the two lives get to-
gether? (1985, p. 100).
The conventional story
6
See O’Brien (1999) for further discussion.
7
See, e.g. Cummins and Schwarz (1991), p.64; Dietrich (1989); The characterisation of computation we developed in
Fodor (1975), p. 27; and Von Eckardt (1993), pp. 97–116. the preceding section emphasises the importance of
representation for computation. It is not surprising, ‘‘bins’’, and is thus a 61-dimensional vector of which the
therefore, that the conventional account of connectionist first component is the reflectance intensity at a wave-
computation focuses on showing how activity across length of 400 nm; the second, the reflectance at 405 nm,
connectionist networks admits of a representational and so on, through to the 61st component which is the
interpretation. reflectance at a wavelength of 700 nm. The input layer
The story goes like this. A connectionist network is a thus has 61 input units onto which are locked the
collection of interconnected processing units (modelled amplitude values of the spectra. There are three units in
on neurons), each of which has an activation level the hidden layer, and five binary units in the output layer
(modelled on a neuron’s spiking frequency) that is for encoding the relevant colour categories (red, green,
communicated to other units in the network via modi- blue, yellow, and purple). After training via backprop-
fiable, weighted connections (modelled on synapses). agation of errors, the network achieved better than 90%
From moment to moment, each unit sums the weighted accuracy in its assignment of input spectra to colour
activation it receives, and generates a new activation categories. (See Laakso and Cottrell 2000, pp. 58–67 for
level that is some threshold function of its current further details.)
activity and that input. Via this process, a network We reproduced these results by training a series of
transforms patterns of activity across its input layer into networks on the same data set. The activity at the hidden
patterns of activity across its output layer. Altering the layer of a trained network can be portrayed as a three-
network’s connection weights alters the activation pat- dimensional activation space, in which the activity of
terns the network produces in response to its inputs. each hidden unit is represented along one coordinate
Consequently, a network can be taught to generate a axis. For each input to the network, one gets a different
range of target patterns in response to a range of inputs. pattern of activation on the hidden layer, and a corre-
These patterns of activity, because they are produced by sponding point in activation space. We found that each
a training regime that gradually shapes the network’s colour-categorisation network partitions its activation
responses so that it is successful in negotiating some task space into linearly separable regions (in three-dimen-
domain, are thought to constitute a form of information sions, these are regions that can be cleanly divided by a
coding, often termed activation pattern representation. plane), such that the activation points corresponding to
According to this account, therefore, connectionist net- the various colour categories are located in distinct parts
works compute by transforming activation pattern rep- of the space (Fig. 2). This is typical of feedforward neural
resentations across their input units into activation networks, and it is widely agreed that it is by virtue of
pattern representations across their output units.8 organising their activation spaces in this way that such
But this account is superficial. What we really want to networks are able to correctly categorise their inputs.
know is how connectionist networks are able to transform This much about hidden unit activation pattern rep-
their input representations into appropriate output rep- resentation is common lore among connectionists. What
resentations. It is at this point that the conventional story is not always appreciated about hidden unit activation
gets both more complicated and more interesting. The patterns, however, is that collectively they structurally
proffered explanation focuses on the fact that the hidden resemble aspects of the task domain over which the net-
unit landscape of a trained network is partitioned into work has been trained. Indeed, it is the existence of this
linearly separable regions, regions that capture the cate- structural resemblance relation that anchors the repre-
gorial distinctions necessary for generating a solution to sentational interpretation of activation patterns in the
the computational problem(s) posed by the inputs. first place (O’Brien and Opie 2001; 2004). Since this
To illustrate this idea, consider a three layer, feed- structural resemblance theory of representational content
forward network designed by Laakso and Cottrell will be important to the argument developed in the next
(2000) to perform colour categorisation (see Fig. 1). The section, we will pause here to examine it in some detail.
task of this network is to take reflectance spec- Resemblance is a fairly unconstrained relationship,
tra—which provide a measure of the relative amounts of because objects or systems of objects can resemble each
light reflected by an object across a range of wave- other in a huge variety of ways, and to various different
lengths—and produce a colour judgment corresponding degrees. The most straightforward kind of resemblance
to that of a normal human observer. The inputs to the involves the sharing of one or more physical properties.
network are 523 reflectance spectra selected from a Thus, two objects might have the same colour, or mass,
database produced at the University of Kuopio (anon- the same density, or electric charge, or be equal along a
ymous 1995; Parkkinen 1989).9 Each spectrum is mea- number of physical dimensions simultaneously. We
sured over the 400–700 nanometre range in 5 nm shall refer to this kind of relationship as first-order
resemblance.10 A representing vehicle and its represented
8
See, e.g. Bechtel and Abrahamsen (2002); Clark (1989), 1993; and
10
Tienson (1987). We are here adapting some terminology developed by Shepard and
9
These spectra were generated by measuring the reflectance profile Chipman (1970). They distinguish between first- and second-order
of colour cards in the Munsell Book of Color (anonymous 1976), a isomorphism. Isomorphism is a very restrictive way of characterising
set of cards that is used in standard psychometric tests of colour resemblance, and hence we prefer to avoid this terminology (see
perception. O’Brien and Opie 2004).
Fig. 1 The structure of the
colour-categorisation network,
showing an example of an input
spectrum to be encoded on the
input layer
Fig. 2 Hidden unit activation
space for one of the colour-
categorisation networks
object resemble each other in this way if they have incompatible with what we know about the brain. It is
physical properties in common. quite obvious that our brains are capable of representing
First-order resemblance is clearly unsuitable as a features of the world that are not replicable in neural
general ground of neural representation, since it is tissue. There is, however, another kind of resemblance
available, which we shall refer to as second-order tion space) correspond to similarities and differences
resemblance.11 In second-order resemblance, the among the reflectance spectra that the network is
requirement that representing vehicles share physical responding to (see Sect. 4 for a more detailed discussion).
properties with their represented objects can be relaxed The structural resemblance relation between hidden
in favour of one in which the relations among a system unit activation patterns and aspects of a connectionist
of representing vehicles mirror the relations among their network’s task domain licenses an interpretation of the
objects. For example, a mercury thermometer can be former as representing vehicles. This in turn appears to
used to represent temperature in virtue of the linear support the claim, made by the proponents of connec-
relationship between the length of a column of mercury tionism, that these networks are in the computing
and ambient temperature—variations in the one corre- business. Why then have doubts about connectionism’s
spond systematically with variations in the other. computational credentials continued to linger in the
Although first-order resemblance cannot be the gen- cognitive science literature? It is to this issue that we will
eral ground of neural representation, the same is not true now turn.
of second-order resemblance. Two systems can share a
pattern of relations without sharing the physical prop- What’s wrong with the conventional story?
erties upon which those relations depend. Second-order
resemblance is actually a very abstract relationship. The conventional story about connectionist computa-
Essentially, nothing about the physical form of the tion is elegant, but incomplete. Recall that a computa-
relations defined over a system of representing vehicles is tional interpretation of connectionism must not only
implied by the fact that it resembles a set of represented show that connectionist networks implement represent-
objects at second-order; second-order resemblance is a ing vehicles; it must also show how processing in net-
formal relationship, not a substantial or physical one. works is shaped by the representational contents of
As already foreshadowed, the form of second-order those vehicles. It is this latter requirement that the
resemblance that is relevant in the present context is conventional story fails to satisfy.
structural resemblance. One system structurally resembles To see this, consider the colour-categorisation net-
another when the physical relations among the objects work we described above. This network is required to
that comprise the first preserve some aspects of the sort spectra into colour categories, a task at which it
relational organisation of the objects that comprise succeeds because the network’s hidden unit activation
the second. Structural resemblance would seem to be space is partitioned into regions corresponding to those
the right second-order resemblance relation for categories. And it is a relatively simple exercise to map
explaining the representational content of connectionist from regions in activation space to binary representa-
representing vehicles. Hidden unit activation space is a tions of colour on the network’s output layer. Notice,
mathematical space used by theorists to portray the set however, that given any spectrum as input, it is the
of activation patterns a network is capable of producing configuration of weights between the input and hidden
over its hidden layer. Activation patterns themselves are layers that determines the resulting hidden layer activity.
physical objects (patterns of neural firing, if realised in a Furthermore, since they govern each and every such
brain), and thus distance relations in activation space mapping, it is these weights that are responsible for the
actually codify physical relations among activation global structure of the hidden unit activity space. The
states. What is crucial here is that the set of hidden unit representing vehicles on which the conventional story
activation patterns generated across any trained-up focuses—activation patterns across the hidden layer—
connectionist network constitutes a system of repre- are not causally implicated in these transformations.
senting vehicles whose physical relations sustain a sec- They are the products, not the source, of processing. And
ond-order resemblance relation with respect to the task as such, their representational contents play no role in
domain over which the network has been trained. shaping the trajectory this processing takes.12
Consider, for example, the relationship between the set It is precisely this kind of analysis which invites the
of hidden layer activation patterns generated by the col- charge that connectionism is nothing more than a latter
our-categorisation network and its task domain. Physical day version of associationism. This interpretation is
similarities and differences among these patterns of quite consistent with a representational understanding
activity (which appear as relative distances in the activa- of the activity across the layers of connectionist net-
works. It’s just that it restricts connectionist networks to
11 the mere association of ‘‘ideas’’, rather than the content-
Bunge (1969), in a useful early discussion of resemblance, draws a
distinction between substantial and formal analogy which is close to
our distinction between first- and second-order resemblance. Two
theorists who have kept the torch of second-order resemblance
12
burning over the years are Palmer (1978) and Shepard (Shepard This is not to deny that the physical relations among activation
and Chipman 1970; and Shepard and Metzler 1971). More recently, patterns on the hidden layer have a bearing on downstream pro-
Blachowicz (1997); Cummins (1996); Gardenfors (1996); Johnson- cesses, both at the output layer and in other networks. Our point is
Laird (1983); O’Brien (1999) and Swoyer (1991), have all sought to simply that this (diachronic) relational structure is governed by
apply, though in different ways, the concept of second-order some other (synchronic) feature of the network, namely, the con-
resemblance to representation. figuration of its connection weights.
driven forms of information processing that are neces- connectionist networks as representing vehicles, doubts
sary to explain intelligent behaviour. will persist about connectionism’s computational cre-
There is a fairly standard riposte to this charge in dentials unless Ramsey’s challenge can be answered.
connectionist circles. Connectionist networks implement What is required is a ‘‘level of understanding or
two quite different kinds of representation: in addition explanatory motivation that requires us to view the
to the information coded in activation patterns, which is weights as representations’’. It is time to meet this
transient and hence obliterated whenever the network is challenge.
exposed to new input, information is coded in a long-
term fashion in the network’s connection weights. These
weights, it is often claimed, constitute the network’s Connection weight representation
memory. Since it is connection weights that govern the
transformations of activity from layer to layer in a net- We have seen that activation pattern representation is
work, it thus appears that we do have a representational supported by a relation of structural resemblance be-
story to tell about the structures that shape the trajec- tween the patterns of activity in a connectionist network
tory of connectionist processing. and the task domain in which that network operates.
The trouble with this response, however, is that we The proposal we explore here is that there is a more
currently lack a representational analysis of connection fundamental structural resemblance between the con-
weights comparable to the kind of analysis that is nection weights of such a network and its task domain;
available for activation patterns. Consequently, the one that supports a species of representation we will call
claim that connection weights represent a network’s connection weight representation.13
long-term knowledge is left unanchored, and commen- Although, the relation of structural resemblance be-
tators are justified in expressing doubts about this claim. tween a trained-up network’s patterns of activity and its
Ramsey, for example, highlights what he takes to be a task domain is relatively easy to identify, the same
fundamental difference between connection weights and cannot be said of any such relation between connection
the rules that govern the symbol manipulations of digital weights and task domain. If such a relation exists, it will
computers: require some teasing out. We will approach this problem
by more closely examining the role of connection
As the relevant content for this type of representation
weights in connectionist processing.
is the system’s long-term knowledge...the most obvi-
ous point of comparison should be with the explicit
rules that sometimes govern classical computation
Processing with connection weights
systems and are thought to encode those systems’
knowledge base. Is there an explanatory pay-off in
It is well-known that networks operating in the same
viewing connection weights as representations that is
domain, but trained-up with different initial assignments
similar to the return we get when this is done with
of connection weights, come to occupy different points
rules in classical models? I believe the answer is ‘no’
in ‘‘weight space’’.14 There is no simple relationship
for the following reason. [In] classical models it is
between the position in weight space occupied by a
typically the case that causally distinct structures en-
trained network and the task domain. We demonstrated
code commands for specific stages of the computa-
this by training a group of 20, three-layer feedforward
tion... However, in trained connectionist models, this
networks to perform at close to 100% accuracy on La-
type of specificity is not possible. While it might be
akso and Cottrell’s colour-categorisation task. We then
true that some connection weights contribute to some
measured the pair-wise correlations among the (hidden
episodes of processing more than others, there is no
layer) weight matrices of these networks (for a total of
level of analysis at which we can say a particular
190 comparisons). The set of correlations turned out to
weight encodes a particular command or governs a
be randomly distributed about a mean of zero, con-
specific algorithmic step in the computation. Instead,
firming that there is no simple, first-order relationship
all the system’s know-how is superimposed on all the
weights with no particular mappings between the two.
13
(1997, pp. 48–49) In what follows, we develop this proposal by focusing solely on
the connection weights between the input and hidden layers of
feedforward networks. (We will reinforce this point by occasionally
Further rumination on this issue eventually leads referring to the ‘‘hidden layer’’ connection weights: these are the
Ramsey to conclude that ‘‘there doesn’t appear to be weights that determine the activity across the hidden layer.) It is
our view, however, that this proposal applies to connectionist
any other level of understanding or explanatory moti- systems more generally.
vation that requires us to view the weights as represen- 14
The weight space of a network is a Euclidean vector space in
tations’’ (1997, p. 51), and he recommends that we view which each of the network’s connection strengths is represented as
connectionist explanations of cognition as dynamical the position along a distinct coordinate axis. The dimensionality of
this space corresponds to the number of connections in the net-
rather than computational (1997, p. 61). work. Once can picture training a network as a journey through
The dialectical position, we think, is this. However weight space, and different final positions in the space as alternative
strong our reasons for interpreting activation patterns in ways of dealing with the task demands.
between these networks (see Fig. 3). Since the networks layer of a successful connectionist network structurally
themselves are not related in any straightforward way, it resemble aspects of the network’s task domain.
appears unlikely that each bears some simple relation- To investigate this conjecture we trained a series of
ship to the task domain over which they operate. three-layer feedforward networks to solve the colour-
It remains a live possibility, however, that connec- categorisation problem using a subset of Laakso and
tionist networks embody some (higher-order) internal Cottrell’s original data: about 25 each of the spectra
structure that warrants a representational understanding normally classified as red, green, and blue, respectively.
of their connection weights. To explore this possibility Each network had 61 input units and three hidden units.
we need to take a closer look at how connectionist We represented the fan-ins of the trained networks using
networks process their inputs. weight diagrams and compared these with the means of
The key players in network processing are what we the red, green and blue input data sets.
call fan-ins. A fan-in is the vector of weights modulating A typical example is shown in Fig. 5. The three fan-
the effect of incoming activity on a particular hidden ins are depicted on the right, the mean spectra on the
unit. Within any feedforward network there is one fan-in left. One immediately notices a striking similarity be-
per hidden unit, each corresponding to a row of the tween the fan-ins of this network and the means of the
network’s hidden layer weight matrix (see Fig. 4). Fan- data sets. The shape of the fan-in for hidden unit 2, for
ins effect the transformation of the network’s input space example, corresponds nicely to the shape of the mean
into its hidden unit activation space. More specifically, spectrum of the 25 inputs that normal observers classify
each fan-in determines how one hidden unit responds to as red. Likewise, the fan-in for hidden unit 3 resembles
input, by way of a product of input activation and fan-in the mean of the ‘‘green’’ spectra, and the fan-in for
values. This product is then modified by the hidden unit’s hidden unit 1 resembles the mean of the ‘‘blue’’ spectra.
activation function to produce the value along a single What this indicates is that, for each fan-in, the relative
coordinate in activation space. It is thus a network’s fan- magnitudes of its component weights mirror the relative
ins that interface directly with the structure of the vectors amplitudes of the various wavelengths comprising one of
coded at the input layer, and which ultimately determine the mean spectra. Since this mirroring is a similarity at
the structure of activation space. Accordingly, if we are the level of relations, rather than properties, it is an in-
to discover any structural resemblance between a net- stance of second-order resemblance. And since it is
work’s connection weights and its task domain it is the grounded in the physical relations among the fan-in
fan-ins on which we should focus. weights (i.e. their relative magnitudes), it is a structural
resemblance.
In the previous section we saw that it is a relation of
Connection weights as representing vehicles structural resemblance that anchors a representational
interpretation of hidden unit activation patterns. We’ve
Given the crucial role of fan-ins in network processing, just seen (Fig. 5) that there is a structural resemblance
we offer the following proposal: the fan-ins in the hidden between the fan-ins of the colour-categorisation network
Fig. 3 A plot of cumulative
probability against weight-
matrix correlation. A good fit to
the straight line indicates a
normal distribution
Fig. 4 A simple network with
and without its three fan-ins
(r1, r2, & r3) highlighted
and the task domain over which it operates. That It is the ‘‘shape’’ of these vectors that govern the
resemblance licenses an interpretation of fan-ins (and respective activities of the hidden units they influence, by
their component weights) as representing vehicles. way of the so-called ‘‘dot product’’ of weights and input
activation. Taking a dot product is a well-known way of
measuring the similarity of two vectors.15 Each fan-in is,
Connectionist computation: the real story in effect, a filter looking for input with a particular
shape. The dot product indicates the extent to which a
The characterisation of computation we offered above given input matches a particular fan-in filter, as does the
suggests that connectionist systems must satisfy two activity of the corresponding hidden unit. Input that is
conditions if they are to count as computational devices: presented to the colour-categorisation network, for
(i) they must implicate representing vehicles of some example, is filtered through three fan-in vectors, thereby
kind, and (ii) the contents of those vehicles must shape modifying the activation of the three units in the hidden
the causal processes that occur in connectionist pro- layer. Activity in the hidden layer thus reflects the degree
cessing. We established that connection weights may of similarity between the input spectra and the fan-ins.
legitimately be interpreted as representing vehicles, at Correlatively, hidden unit activation space forms a
least for a significant class of connectionist systems. It three-dimensional map that allows us to compare the
remains to show that the contents of this species of filtered versions of the input spectra, one with the other.
vehicle influence the trajectory of connectionist pro-
cessing. 15
The dot product of two vectors in a Euclidean space is at a
We have noted the crucial role of fan-ins in trans- maximum when the angle between them is zero, and decreases as
forming a network’s inputs into hidden layer activation. the angular separation between them increases.
Fig. 5 On the left are the mean
spectra of the three classes of
inputs; those classified (from top
to bottom) as red, green and
blue. On the right are the three
fan-ins of the network, with
weight value on the y-axis and
input index on the x-axis
Now the final piece of the puzzle is in place. We network’s hidden units in response to its various inputs,
have shown that the fan-ins in the colour-categorisa- and, more importantly, it is sustained (synchronically)
tion network structurally resemble aspects of the task by the higher order structure of the network’s hidden
domain, namely, the mean spectra of the three classes layer connection weights.
of input (red, green and blue). That resemblance war- These two kinds of structural resemblance support an
rants us in regarding those fan-ins, and their compo- interpretation of activation patterns and connection
nent weights, as representing vehicles. But, we have weights as different species of representing vehicle. And
also shown that it is this same resemblance, embodied these two kinds of representing vehicle shape the tra-
in the physical structure of the fan-ins, that drives the jectory of connectionist processing in different ways.
causal processes within the network. It is by virtue of Activation pattern representations shape the impact that
their resemblance to global features of the input data one network has on other networks or motor mecha-
that the fan-in vectors contrive to transform reflectance nisms to which it is connected. Connection weight rep-
spectra into a map of categorial colour, and thereby resentations, by contrast, are responsible for the
solve the problem posed to the network. Representa- production of these activation pattern representations in
tional content is in the driver’s seat here, as we require, the first place.
and it appears that connectionist networks are genuine This last point is important because it secures a
computational devices. computational understanding of connectionist process-
This is a very satisfying result for proponents of ing, at least according to the characterisation we have
connectionism. It enables us to meet Ramsey’s chal- developed in this paper. The causal operations that
lenge, because we now have a robust explanatory generate a hidden unit activation pattern implicate one
motivation for viewing connection weights as represen- or more representing vehicles (the fan-in connection
tations. And this in turn puts to bed the lingering doubts weights) and the trajectory of this process is shaped by
about connectionism’s computational credentials. the representational content of these vehicles (since it is
Connectionist networks are capable of successfully the structural resemblance relation that determines the
negotiating their task domains because they structurally representational content of the fan-ins). Connectionist
resemble them—a resemblance relation they gradually networks are not merely association engines or dynam-
acquire in the course of training. This structural ical systems; they are full-blooded computational
resemblance relation is sustained at two different levels mechanisms. And they compute by exploiting relations
of description. It is sustained (diachronically), by the of structural resemblance between their connection
set of activation patterns that are produced across a weights and their target domains.
McClelland JL, Rumelhart DE (eds) (1986) Parallel distributed
References processing: explorations in the microstructure of cognition,
Vol. 2. MIT Press, Cambridge
Anonymous (1976) Munsell book of color: matte finish collection. O’Brien G (1999) Connectionism, analogicity and mental content.
Munsell Color Company, Inc Acta Analytica 22:111–131
Anonymous (1995) Kuopio color database. http://www.lut.fi/ltkk/ O’Brien G, Opie J (2001). Connectionist vehicles, structural
tite/research/color/lutcs_database.html resemblance, and the phenomenal mind. In: Veldeman J (eds).
Bechtel W, Abrahamsen A (2002) Connectionism and the mind: Naturalism and the phenomenal mind, a special issue of
parallel processing, dynamics, and evolution in networks. Communication and Cognition. 34: 13–38
Blackwell, Oxford O’Brien G, Opie J (2004) Notes towards a structuralist theory of
Blachowicz J (1997) Analog representation beyond mental imagery. mental representation. In: Clapin H, Staines P, Slezak P (eds)
J Philosophy 94:55–84 Representation in mind: new approaches to mental represen-
Bunge M (1969) Analogy, simulation, representation. Revue- tation. Elsevier
Internationale-de-Philosophie 23:16–33 Palmer S (1978) Fundamental aspects of cognitive representation.
Chalmers DJ (1994) On implementing a computation. Mind Mach In: Rosch E, Lloyd B (eds) Cognition and categorization.
4:391–402 Lawrence Erlbaum
Churchland PS, Koch C, Sejnowski T (1990) What is compu- Parkkinen JPS, Hallikainen J, Jaaskelainen T (1989) Characteristic
tational neuroscience? In: Schwartz E (eds) Computational spectra of Munsell colors. J Opt Soc A 6(2):318–322
neuroscience. MIT Press, Cambridge Pinker S (1997) How the mind works. Norton, New York
Clark A (1989) Microcognition: philosophy, cognitive science, and Pinker S (2002) The blank slate: the modern denial of human
parallel distributed processing. MIT Press, Cambridge nature. Viking, New York
Clark A (1993) Associative engines: connectionism, concepts, and Port R, van Gelder TJ (1995) Mind as motion: explorations in the
representational change. MIT Press, Cambridge dynamics of cognition. MIT Press, Cambridge
Cummins R (1996) Representations, targets, and attitudes. MIT Pylyshyn ZW (1984) Computation and cognition: toward a foun-
Press, Cambridge dation for cognitive science. MIT Press, Cambridge
Cummins R, Schwarz G (1991). Connectionism, computation and Ramsey W (1997) Do connectionist representations earn their
cognition. In: Horgan T, Tienson J (eds). Connectionism and explanatory keep? Mind Lang 12(1):34–66
the philosophy of mind. Kluwer, Dordrecht Rumelhart DE, McClelland JL (eds) (1986) Parallel distributed
Dennett D (1987) The intentional stance. MIT Press, Cambridge processing: explorations in the microstructure of cognition, vol.
Dietrich (1989) Semantics and the computational paradigm in 1. MIT Press, Cambridge
cognitive psychology. Synthese 79:119–141 Shepard R, Chipman S (1970) Second-order isomorphism of
Fodor JA (1975) The language of thought. Harvester Press, internal representations: shapes of states. Cog Psychol 1:1–17
London Shepard R, Metzler J (1971) Mental rotation of three-dimensional
Fodor JA (1992) The big idea: can there be a science of the mind? objects. Science 171:701–703
Times Literary Supplement July 3: 5–7 Swoyer C (1991) Structural representation and surrogative
Fodor JA (2000) The mind doesn’t work that way: the scope and reasoning. Synthese 87:449–508
limits of computational psychology. MIT Press, Cambridge Tienson J (1987) Introduction to connectionism. South J Philos
Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive 26:1–16
architecture: a critical analysis. Cognition 28:3–71 Truitt TD, Rogers AE (1960) Basics of analog computers. John F.
Gardenfors P (1996) Mental representation, conceptual spaces and Rider
metaphors. Synthese 106:21–47 Von Eckardt B (1993) What is cognitive science? MIT Press,
Johnson-Laird P (1983) Mental models. Harvard University Press Cambridge
Laakso A, Cottrell G (2000) Content and cluster analysis: assessing
representational similarity in neural systems. Philos Psyc 13:47–76