Academia.eduAcademia.edu

Outline

How do Connectionist Networks Compute?

https://doi.org/10.1007/S10339-005-0017-7

Abstract

Although connectionism is advocated by its proponents as an alternative to the classical computational theory of mind, doubts persist about its computational credentials. Our aim is to dispel these doubts by explaining how connectionist networks compute. We first develop a generic account of computation—no easy task, because computation, like almost every other foundational concept in cognitive science, has resisted canonical definition. We opt for a characterisation that does justice to the explanatory role of computation in cognitive science. Next we examine what might be regarded as the ‘‘conventional’’ account of connectionist computation. We show why this account is inadequate and hence fosters the suspicion that connectionist networks are not genuinely computational. Lastly, we turn to the principal task of the paper: the development of a more robust portrait of connectionist computation. The basis of this portrait is an explanation of the representational capacities of connection weights, supported by an analysis of the weight configurations of a series of simulated neural networks.

Cogn Process (2005) DOI 10.1007/s10339-005-0017-7 R ES E AR C H R E P OR T Gerard O’Brien Æ Jon Opie How do connectionist networks compute? Received: 13 July 2005 / Revised: 19 July 2005Accepted: 19 July 2005 Ó Marta Olivetti Belardinelli and Springer-Verlag 2005 Abstract Although connectionism is advocated by its mind nearly 20 years ago.1 Yet, despite all that has proponents as an alternative to the classical computa- since been written about this approach to cognition, tional theory of mind, doubts persist about its compu- we still lack a satisfactory account of how connec- tational credentials. Our aim is to dispel these doubts by tionist networks compute. Into this vacuum has crept explaining how connectionist networks compute. We doubt about connectionism’s computational credentials. first develop a generic account of computation—no easy This doubt takes three forms. One is the view that task, because computation, like almost every other connectionism, far from being a rival computational foundational concept in cognitive science, has resisted paradigm, is nothing more than a modern version of canonical definition. We opt for a characterisation that associationism, and hence suffers from all the well- does justice to the explanatory role of computation in known vices of this much older position.2 A second is cognitive science. Next we examine what might be re- the claim that while connectionists typically interpret garded as the ‘‘conventional’’ account of connectionist the states and activity of connectionist networks in computation. We show why this account is inadequate representational terms, closer scrutiny reveals that and hence fosters the suspicion that connectionist net- these putative representations fail to do any explana- works are not genuinely computational. Lastly, we turn tory work, and since there is ‘‘no computation without to the principal task of the paper: the development of a representation’’ (Pylyshyn 1984, p. 62), the connec- more robust portrait of connectionist computation. The tionist framework is better interpreted non-computa- basis of this portrait is an explanation of the represen- tionally.3 And a third is the suggestion that tational capacities of connection weights, supported by connectionist networks are better characterised as an analysis of the weight configurations of a series of dynamical systems rather than computational devices.4 simulated neural networks. If connectionism is ever to stand as a serious alter- native to the classical computational theory of mind, this Keywords Computation Æ Connectionism Æ doubt must be dispelled. And the only way to do this is Representation Æ Resemblance to explain how connectionist networks compute. That is the task we have set ourselves in this paper. We begin by developing a generic account of computation—no easy Introduction task, since like almost every other foundational concept in cognitive science, computation has resisted canonical Connectionism was first widely recognised as a definition. In the face of this problem, we opt for a potential rival to the classical computational theory of 1 We use ‘‘connectionism’’ generically to denote the explanatory framework that models human perceptual and cognitive processes in terms of the operations of neuron-like processing units con- nected together to form neural-like networks. While this explana- Communicated by John Sutton tory framework has antecedents running back more than 50 years, we take the appearance of Rumelhart and McClelland (1986) and G. O’Brien (&) Æ J. Opie McClelland and Rumelhart (1986) as the moment when connec- Discipline of Philosophy, University of Adelaide, tionism, in the guise of parallel distributed processing, came of age. 2 5005 Adelaide, SA, Australia This claim is most famously associated with Fodor (e.g. Fodor E-mail: [email protected] and Pylyshyn 1988; Fodor 2000, Ch. 3) but it pops up in a number URL: http://arts.adelaide.edu.au/humanities/gobrien/ of different places (Pinker 1997, pp. 112–131 and 2002, pp.78–83). 3 Tel.: +61-8-8303-5298 See, e.g. Ramsey (1997). 4 Fax: +61-8-8303-5241 See, e.g. the various contributions to Port and Van Gelder (1995). characterisation that captures the intended role of arrange things that these state changes are rational computation in cognitive science. Next we examine what in the sense that, given a true symbol to play with, might be regarded as the ‘‘conventional’’ account of the machine will reliably covert it into other symbols connectionist computation. We show why this account is that are also true’’. (Fodor 1992, p. 6) inadequate and hence fosters the kinds of doubt we have just enumerated. We then turn to the principal task of The rest, as one says, is history. Cognitive science the paper: the development of a more robust portrait of emerged as a discipline (or at least, a ‘‘multi-discipline’’) connectionist computation. The basis of this portrait is in the 1950s. What was novel about cognitive science (as an explanation of the representational capacities of opposed to those already established disciplines that connection weights, supported by an analysis of the study the mind, including neuroscience, psychology and weight configurations of a series of simulated neural philosophy) was its commitment to the computational networks. Once this explanation is in place, it will be theory of mind: the idea that cognitive processes are the apparent how connectionist networks compute. symbol manipulations of a neurally realised digital computer. At its inception, cognitive science thus embraced a Toward a proper understanding of computation Turing-inspired understanding of computation. Com- in cognitive science putation is what happens in a digital computer: a causal/ mechanical process in which language-like representing Computation is a concept so overused and so variously vehicles are recognised and transformed in a semanti- defined that we sometimes despair of it ever being cally coherent fashion purely on the basis of their syn- meaningfully deployed. And yet we also believe that tactic properties. computation is the most important concept in all of The problem with this characterisation is that it cognitive science. Indeed, we would argue that without pays scant attention to the history of computer science. the concept of computation there is no cognitive (as For more than 2000 years, theorists and practitioners distinct from behavioural, psychological, biological, or have recognised a distinction between two forms of just plain physical) science in the first place. So some- computation: digital computation, admirably formalised thing must be done. by Turing and others, and analog computation. The In all that has been written about computation in latter currently lacks a precise formal definition, but a cognitive science, two extreme characterisations are quick survey of computer science textbooks of the discernible. At one extreme is a depiction of computa- 1950s and 60s reveals an intuitively clear demarcation: tion in terms of the symbol manipulations of a digital while digital computers employ semantically inert computer; at the other is the claim that computation is symbols (tokens that bear no resemblance to what they simply a matter of implementing a function. We’ll briefly represent), analog computers employ internal models say what’s wrong with these proposals, before develop- that physically or structurally resemble their repre- ing a middle ground characterisation that does justice to sentanda.5 Analog computation is thus not properly the explanatory role of computation in cognitive science. conceived as symbol manipulation, but as a physical process driven by the structural properties of analog Computation as symbol manipulation representational media. From this perspective, Turing’s great achievement Jerry Fodor is fond of remarking that there is only one was not that of conceiving the idea of computation, but important idea about how the mind works that anybody of developing one very powerful means of mechanising has ever had. This idea he attributes to Alan Turing: computational processes. Drawing this distinction is important because once it is clear that the idea of [G]iven the methodological commitment to materi- computation is distinct from Turing’s account of how alism, the question arises, how a machine could be computational processes might be mechanised, it is rational?...Forty years or so ago, the great logician possible to investigate the former independent of the Alan Turing proposed an answer to this ques- latter. What we therefore require is a characterisation of tion...Turing noticed that it isn’t strictly true that computation that captures more (if not all) of those states of mind are the only semantically evaluable processes that have earned this epithet over the last material things. The other kind of material thing 2000 years. that is semantically evaluable is symbols... Having noticed this parallelism between thoughts and sym- Computation as implementing a function bols, Turing went on to have the following perfectly stunning idea. ‘‘I’ll bet’’, Turing (more or less) said, In response to this demand, a different kind of charac- ‘‘that one could build a symbol manipulating machine terisation of computation is now popular in cognitive whose changes of state are driven by the material properties of the symbols on which they operate (for 5 See Truitt and Rogers (1960) for both a semi-formal account of example, by their weight, or their shape, or their analog computation along these lines, and for numerous examples electrical conductivity). And I’ll bet one could so of analog computers. science. For example, in an influential article Church- Chalmers’ reasoning fails to reassure, however. The land et al. (1990) have this to say on the subject: concept of computation was originally introduced as a way of distinguishing two classes of causal processes: In a most general sense, we can consider a physical those characteristic of the vast majority of physical system as a computational system just in case there systems (e.g. intestines, microwave ovens, cups of tea, is an appropriate (revealing) mapping between some etc.), and those that are the preserve of intelligent sys- algorithm and associated physical variables. More tems alone. Computational processes are supposed to be exactly, a physical system computes a function f(x) special in some way—in a way, moreover, that provides when there is (1) a mapping between the system’s us with some explanatory purchase with respect to the physical inputs and x, (2) a mapping between the problem of intelligent behaviour. Since implementing a system’s physical outputs and y, such that (3) function is a ubiquitous feature of nature, choosing to f(x)=y. (1990) characterise computation in this way repudiates the very motivation for introducing the concept into cognitive In this passage, however, it is not obvious that the science in the first place. reference to an ‘‘appropriate (revealing)’’ mapping is doing any real work. Once this is removed, what remains is the proposal that a computation is performed by some Computation as content-shaped causal processing physical system just in case its causal operation can be interpreted as implementing some function. Chalmers What we need is a way of characterising computation (1994) summarises the idea as follows: that limns a middle path between the restrictiveness of A physical system implements a given computation digital computation and the promiscuity of abstract when there exists a grouping of physical states of the causal organisation. One way to do this is to re-visit the system into state-types and a one-to-one mapping account of computation we get from digital computers, from formal states of the computation to physical and consider whether this can be liberalised to some state-types, such that formal states related by an ab- degree without falling prey to the problem of explana- stract state-transition relation are mapped onto tory vacuity. physical state-types related by a corresponding causal Digital computation, remember, is symbol manipu- state-transition relation. (1994, p. 392) lation: a causal/mechanical process in which language- like representing vehicles are recognised and trans- The bottom line here, according to Chalmers, is that formed in a semantically coherent fashion purely on the computation is simply ‘‘ an abstract specification of basis of their syntactic properties. But as already re- causal organisation’’ (1994, p. 396, emphasis in original). marked, the practice of computation has not historically This characterisation does satisfy the desideratum we been restricted to processes defined over symbols. Con- mooted above, given that it captures both analog and sider, for example, the familiar tactic of representing a digital computation in its net. But it does so at a very physical variable, such as the velocity of a particle, using great cost. Since all law-governed physical systems (and, a curve on the plane. If we plot velocity on one axis, and granting determinism, this equates with all physical time on the other, it is possible to compute distance systems) are interpretable as implementing some func- travelled by measuring the area under the curve, or tion or other, we arrive at the unwelcomed conclusion acceleration by constructing tangents to the curve. These that all physical systems are computational. And that are examples of analog computations which employ a would appear to render the concept of computation in non-symbolic representing vehicle. cognitive science explanatorily vacuous. Viewed from this less-restrictive perspective, there are Chalmers, for one, resists this conclusion: two distinctive features of computational processes (as opposed to causal processes in general). First, they are This objection expresses the feeling that if every associated with representing vehicles of some kind. process, including such things as digestion and oxi- Second, and more importantly, computational processes dation, implements some computation, then there are shaped by the contents of the very representations seems to be nothing special about cognition any more, they implicate. We thus arrive at the following charac- as computation is so pervasive. This objection rests on terisation: a misunderstanding. It is true that any given instance of digestion will implement some computation, as any Computations are causal processes that implicate one physical system does, but the system’s implementing or more representing vehicles, such that their trajectory this computation is in general irrelevant to its being is shaped by the representational contents of those an instance of digestion.... With cognition, by con- vehicles. trast, the claim is that it is in virtue of implementing Talk of representational content ‘‘shaping’’ the causal some computation that a system is cognitive. That is, trajectory of computation is vague, of course. But this there is a certain class of computations such that any is deliberate. Prima facie, there are different ways of system implementing that computation is cognitive organising physical systems such that representational (1994, p. 397). content can play this role. In the case of digital sys- tems, while computational operations only ever have The answer that Haugeland goes on to develop is the access to the syntactic properties of symbols, the rules fundamental basis of digital computation: that govern these syntactic manipulations are none- theless carefully crafted so as to ensure that they re- The idea...is to design these formal systems so that spect the contents of the symbols. In Dennett’s they can be interpreted as axiomatic systems in the memorable terms: digital computers are syntactic en- intuitive sense. That requires two things of the system gines that behave as if they were semantic engines (as interpreted); (Dennett 1987, p. 61). Analog computers, by contrast, 1. the axioms should be true...; and are systems whose behaviour is driven not by content- sensitive rules, but by semantically ‘‘active’’ analog 2. the rules should be truth preserving (1985, representations that physically or structurally resemble pp. 103–105). what they represent.6 In this light, the whole point of Haugeland’s formalists’ Although this strategy of characterising computation motto is to reinforce the message that it is only when the in terms of operations shaped by representational contents syntactically specified rules of the system are so crafted is quite common in the literature7 it does not find favour that they satisfy these semantic constraints, that ‘‘the everywhere. Chalmers, for instance, has this to say: semantics will take care of itself’’. The original account of Turing machines by Turing This isn’t just an exercise in academic exegesis. (1936) certainly had no semantic constraints built in. A Understanding the role of representational content in Turing machine is defined purely in terms of the shaping computational processes is pivotal to under- mechanisms involved, that is, in terms of syntactic standing why the concept of computation arose in the patterns and the way they are transformed.... To first place. Intelligence is a rare commodity, and one that implement a Turing machine, we need only ensure that provokes a profound question: how is that some physi- this formal structure is reflected in the causal structure cal systems are capable of intelligent behaviour when the of the implementation.... [W]hen computer designers majority of systems in the universe are not? The concept ensure that their machines implement the programs of computation is supposed to provide some leverage that they are supposed to, they do this by ensuring that here—intelligent systems are special because they alone the mechanisms have the right causal organisation; they engage in computation. But this answer won’t suffice are not concerned with semantic content. In the words unless computational processes are themselves special. of Haugeland (1985), if you take care of the syntax, the The characterisation developed above explains why they semantics will take care of itself (1994, p. 399). are (computational processes are shaped by the repre- sentational contents of the vehicles they implicate) and In our view, this represents a profound misreading of hence explains why the concept of computation is both Turing and Haugeland. Far from eschewing foundational for cognitive science. semantic considerations, computer science is in the With this characterisation of computation in place we business of designing and implementing formal opera- can now turn to the principal task of the paper: that of tions that satisfy semantic constraints. In the passage of explaining how connectionist systems compute. To sat- his classic text just prior to articulating his famous isfy this task we will need to show how representational ‘‘formalists’ motto’’ (quoted approvingly by Chalmers), content plays a role in shaping the trajectory of con- Haugeland takes himself to be addressing the following nectionist computational processes. question: Interpretation and semantics transcend the strictly Connectionist computation: what’s wrong formal—because formal systems as such must be self- with the conventional story? contained. Hence to regard formal tokens as symbols is to see them in a new light: semantic properties are It is possible to identify something of a consensus not and cannot be syntactical properties. To put it among proponents of connectionism as to the nature dramatically, interpreted formal tokens lead two lives: of computation in connectionist networks. The argu- SYNTACTICAL LIVES, in which they are mean- mentative burden of this section is to establish that ingless markers, moved according to the rules of some this ‘‘conventional’’ account of connectionist compu- self-contained game; and SEMANTIC LIVES, in tation is unsatisfactory, and to explain why it has which they have meanings and symbolic relations to nurtured doubts about connectionism’s computational the outside world. The corresponding dramatic credentials. question then is this: how do the two lives get to- gether? (1985, p. 100). The conventional story 6 See O’Brien (1999) for further discussion. 7 See, e.g. Cummins and Schwarz (1991), p.64; Dietrich (1989); The characterisation of computation we developed in Fodor (1975), p. 27; and Von Eckardt (1993), pp. 97–116. the preceding section emphasises the importance of representation for computation. It is not surprising, ‘‘bins’’, and is thus a 61-dimensional vector of which the therefore, that the conventional account of connectionist first component is the reflectance intensity at a wave- computation focuses on showing how activity across length of 400 nm; the second, the reflectance at 405 nm, connectionist networks admits of a representational and so on, through to the 61st component which is the interpretation. reflectance at a wavelength of 700 nm. The input layer The story goes like this. A connectionist network is a thus has 61 input units onto which are locked the collection of interconnected processing units (modelled amplitude values of the spectra. There are three units in on neurons), each of which has an activation level the hidden layer, and five binary units in the output layer (modelled on a neuron’s spiking frequency) that is for encoding the relevant colour categories (red, green, communicated to other units in the network via modi- blue, yellow, and purple). After training via backprop- fiable, weighted connections (modelled on synapses). agation of errors, the network achieved better than 90% From moment to moment, each unit sums the weighted accuracy in its assignment of input spectra to colour activation it receives, and generates a new activation categories. (See Laakso and Cottrell 2000, pp. 58–67 for level that is some threshold function of its current further details.) activity and that input. Via this process, a network We reproduced these results by training a series of transforms patterns of activity across its input layer into networks on the same data set. The activity at the hidden patterns of activity across its output layer. Altering the layer of a trained network can be portrayed as a three- network’s connection weights alters the activation pat- dimensional activation space, in which the activity of terns the network produces in response to its inputs. each hidden unit is represented along one coordinate Consequently, a network can be taught to generate a axis. For each input to the network, one gets a different range of target patterns in response to a range of inputs. pattern of activation on the hidden layer, and a corre- These patterns of activity, because they are produced by sponding point in activation space. We found that each a training regime that gradually shapes the network’s colour-categorisation network partitions its activation responses so that it is successful in negotiating some task space into linearly separable regions (in three-dimen- domain, are thought to constitute a form of information sions, these are regions that can be cleanly divided by a coding, often termed activation pattern representation. plane), such that the activation points corresponding to According to this account, therefore, connectionist net- the various colour categories are located in distinct parts works compute by transforming activation pattern rep- of the space (Fig. 2). This is typical of feedforward neural resentations across their input units into activation networks, and it is widely agreed that it is by virtue of pattern representations across their output units.8 organising their activation spaces in this way that such But this account is superficial. What we really want to networks are able to correctly categorise their inputs. know is how connectionist networks are able to transform This much about hidden unit activation pattern rep- their input representations into appropriate output rep- resentation is common lore among connectionists. What resentations. It is at this point that the conventional story is not always appreciated about hidden unit activation gets both more complicated and more interesting. The patterns, however, is that collectively they structurally proffered explanation focuses on the fact that the hidden resemble aspects of the task domain over which the net- unit landscape of a trained network is partitioned into work has been trained. Indeed, it is the existence of this linearly separable regions, regions that capture the cate- structural resemblance relation that anchors the repre- gorial distinctions necessary for generating a solution to sentational interpretation of activation patterns in the the computational problem(s) posed by the inputs. first place (O’Brien and Opie 2001; 2004). Since this To illustrate this idea, consider a three layer, feed- structural resemblance theory of representational content forward network designed by Laakso and Cottrell will be important to the argument developed in the next (2000) to perform colour categorisation (see Fig. 1). The section, we will pause here to examine it in some detail. task of this network is to take reflectance spec- Resemblance is a fairly unconstrained relationship, tra—which provide a measure of the relative amounts of because objects or systems of objects can resemble each light reflected by an object across a range of wave- other in a huge variety of ways, and to various different lengths—and produce a colour judgment corresponding degrees. The most straightforward kind of resemblance to that of a normal human observer. The inputs to the involves the sharing of one or more physical properties. network are 523 reflectance spectra selected from a Thus, two objects might have the same colour, or mass, database produced at the University of Kuopio (anon- the same density, or electric charge, or be equal along a ymous 1995; Parkkinen 1989).9 Each spectrum is mea- number of physical dimensions simultaneously. We sured over the 400–700 nanometre range in 5 nm shall refer to this kind of relationship as first-order resemblance.10 A representing vehicle and its represented 8 See, e.g. Bechtel and Abrahamsen (2002); Clark (1989), 1993; and 10 Tienson (1987). We are here adapting some terminology developed by Shepard and 9 These spectra were generated by measuring the reflectance profile Chipman (1970). They distinguish between first- and second-order of colour cards in the Munsell Book of Color (anonymous 1976), a isomorphism. Isomorphism is a very restrictive way of characterising set of cards that is used in standard psychometric tests of colour resemblance, and hence we prefer to avoid this terminology (see perception. O’Brien and Opie 2004). Fig. 1 The structure of the colour-categorisation network, showing an example of an input spectrum to be encoded on the input layer Fig. 2 Hidden unit activation space for one of the colour- categorisation networks object resemble each other in this way if they have incompatible with what we know about the brain. It is physical properties in common. quite obvious that our brains are capable of representing First-order resemblance is clearly unsuitable as a features of the world that are not replicable in neural general ground of neural representation, since it is tissue. There is, however, another kind of resemblance available, which we shall refer to as second-order tion space) correspond to similarities and differences resemblance.11 In second-order resemblance, the among the reflectance spectra that the network is requirement that representing vehicles share physical responding to (see Sect. 4 for a more detailed discussion). properties with their represented objects can be relaxed The structural resemblance relation between hidden in favour of one in which the relations among a system unit activation patterns and aspects of a connectionist of representing vehicles mirror the relations among their network’s task domain licenses an interpretation of the objects. For example, a mercury thermometer can be former as representing vehicles. This in turn appears to used to represent temperature in virtue of the linear support the claim, made by the proponents of connec- relationship between the length of a column of mercury tionism, that these networks are in the computing and ambient temperature—variations in the one corre- business. Why then have doubts about connectionism’s spond systematically with variations in the other. computational credentials continued to linger in the Although first-order resemblance cannot be the gen- cognitive science literature? It is to this issue that we will eral ground of neural representation, the same is not true now turn. of second-order resemblance. Two systems can share a pattern of relations without sharing the physical prop- What’s wrong with the conventional story? erties upon which those relations depend. Second-order resemblance is actually a very abstract relationship. The conventional story about connectionist computa- Essentially, nothing about the physical form of the tion is elegant, but incomplete. Recall that a computa- relations defined over a system of representing vehicles is tional interpretation of connectionism must not only implied by the fact that it resembles a set of represented show that connectionist networks implement represent- objects at second-order; second-order resemblance is a ing vehicles; it must also show how processing in net- formal relationship, not a substantial or physical one. works is shaped by the representational contents of As already foreshadowed, the form of second-order those vehicles. It is this latter requirement that the resemblance that is relevant in the present context is conventional story fails to satisfy. structural resemblance. One system structurally resembles To see this, consider the colour-categorisation net- another when the physical relations among the objects work we described above. This network is required to that comprise the first preserve some aspects of the sort spectra into colour categories, a task at which it relational organisation of the objects that comprise succeeds because the network’s hidden unit activation the second. Structural resemblance would seem to be space is partitioned into regions corresponding to those the right second-order resemblance relation for categories. And it is a relatively simple exercise to map explaining the representational content of connectionist from regions in activation space to binary representa- representing vehicles. Hidden unit activation space is a tions of colour on the network’s output layer. Notice, mathematical space used by theorists to portray the set however, that given any spectrum as input, it is the of activation patterns a network is capable of producing configuration of weights between the input and hidden over its hidden layer. Activation patterns themselves are layers that determines the resulting hidden layer activity. physical objects (patterns of neural firing, if realised in a Furthermore, since they govern each and every such brain), and thus distance relations in activation space mapping, it is these weights that are responsible for the actually codify physical relations among activation global structure of the hidden unit activity space. The states. What is crucial here is that the set of hidden unit representing vehicles on which the conventional story activation patterns generated across any trained-up focuses—activation patterns across the hidden layer— connectionist network constitutes a system of repre- are not causally implicated in these transformations. senting vehicles whose physical relations sustain a sec- They are the products, not the source, of processing. And ond-order resemblance relation with respect to the task as such, their representational contents play no role in domain over which the network has been trained. shaping the trajectory this processing takes.12 Consider, for example, the relationship between the set It is precisely this kind of analysis which invites the of hidden layer activation patterns generated by the col- charge that connectionism is nothing more than a latter our-categorisation network and its task domain. Physical day version of associationism. This interpretation is similarities and differences among these patterns of quite consistent with a representational understanding activity (which appear as relative distances in the activa- of the activity across the layers of connectionist net- works. It’s just that it restricts connectionist networks to 11 the mere association of ‘‘ideas’’, rather than the content- Bunge (1969), in a useful early discussion of resemblance, draws a distinction between substantial and formal analogy which is close to our distinction between first- and second-order resemblance. Two theorists who have kept the torch of second-order resemblance 12 burning over the years are Palmer (1978) and Shepard (Shepard This is not to deny that the physical relations among activation and Chipman 1970; and Shepard and Metzler 1971). More recently, patterns on the hidden layer have a bearing on downstream pro- Blachowicz (1997); Cummins (1996); Gardenfors (1996); Johnson- cesses, both at the output layer and in other networks. Our point is Laird (1983); O’Brien (1999) and Swoyer (1991), have all sought to simply that this (diachronic) relational structure is governed by apply, though in different ways, the concept of second-order some other (synchronic) feature of the network, namely, the con- resemblance to representation. figuration of its connection weights. driven forms of information processing that are neces- connectionist networks as representing vehicles, doubts sary to explain intelligent behaviour. will persist about connectionism’s computational cre- There is a fairly standard riposte to this charge in dentials unless Ramsey’s challenge can be answered. connectionist circles. Connectionist networks implement What is required is a ‘‘level of understanding or two quite different kinds of representation: in addition explanatory motivation that requires us to view the to the information coded in activation patterns, which is weights as representations’’. It is time to meet this transient and hence obliterated whenever the network is challenge. exposed to new input, information is coded in a long- term fashion in the network’s connection weights. These weights, it is often claimed, constitute the network’s Connection weight representation memory. Since it is connection weights that govern the transformations of activity from layer to layer in a net- We have seen that activation pattern representation is work, it thus appears that we do have a representational supported by a relation of structural resemblance be- story to tell about the structures that shape the trajec- tween the patterns of activity in a connectionist network tory of connectionist processing. and the task domain in which that network operates. The trouble with this response, however, is that we The proposal we explore here is that there is a more currently lack a representational analysis of connection fundamental structural resemblance between the con- weights comparable to the kind of analysis that is nection weights of such a network and its task domain; available for activation patterns. Consequently, the one that supports a species of representation we will call claim that connection weights represent a network’s connection weight representation.13 long-term knowledge is left unanchored, and commen- Although, the relation of structural resemblance be- tators are justified in expressing doubts about this claim. tween a trained-up network’s patterns of activity and its Ramsey, for example, highlights what he takes to be a task domain is relatively easy to identify, the same fundamental difference between connection weights and cannot be said of any such relation between connection the rules that govern the symbol manipulations of digital weights and task domain. If such a relation exists, it will computers: require some teasing out. We will approach this problem by more closely examining the role of connection As the relevant content for this type of representation weights in connectionist processing. is the system’s long-term knowledge...the most obvi- ous point of comparison should be with the explicit rules that sometimes govern classical computation Processing with connection weights systems and are thought to encode those systems’ knowledge base. Is there an explanatory pay-off in It is well-known that networks operating in the same viewing connection weights as representations that is domain, but trained-up with different initial assignments similar to the return we get when this is done with of connection weights, come to occupy different points rules in classical models? I believe the answer is ‘no’ in ‘‘weight space’’.14 There is no simple relationship for the following reason. [In] classical models it is between the position in weight space occupied by a typically the case that causally distinct structures en- trained network and the task domain. We demonstrated code commands for specific stages of the computa- this by training a group of 20, three-layer feedforward tion... However, in trained connectionist models, this networks to perform at close to 100% accuracy on La- type of specificity is not possible. While it might be akso and Cottrell’s colour-categorisation task. We then true that some connection weights contribute to some measured the pair-wise correlations among the (hidden episodes of processing more than others, there is no layer) weight matrices of these networks (for a total of level of analysis at which we can say a particular 190 comparisons). The set of correlations turned out to weight encodes a particular command or governs a be randomly distributed about a mean of zero, con- specific algorithmic step in the computation. Instead, firming that there is no simple, first-order relationship all the system’s know-how is superimposed on all the weights with no particular mappings between the two. 13 (1997, pp. 48–49) In what follows, we develop this proposal by focusing solely on the connection weights between the input and hidden layers of feedforward networks. (We will reinforce this point by occasionally Further rumination on this issue eventually leads referring to the ‘‘hidden layer’’ connection weights: these are the Ramsey to conclude that ‘‘there doesn’t appear to be weights that determine the activity across the hidden layer.) It is our view, however, that this proposal applies to connectionist any other level of understanding or explanatory moti- systems more generally. vation that requires us to view the weights as represen- 14 The weight space of a network is a Euclidean vector space in tations’’ (1997, p. 51), and he recommends that we view which each of the network’s connection strengths is represented as connectionist explanations of cognition as dynamical the position along a distinct coordinate axis. The dimensionality of this space corresponds to the number of connections in the net- rather than computational (1997, p. 61). work. Once can picture training a network as a journey through The dialectical position, we think, is this. However weight space, and different final positions in the space as alternative strong our reasons for interpreting activation patterns in ways of dealing with the task demands. between these networks (see Fig. 3). Since the networks layer of a successful connectionist network structurally themselves are not related in any straightforward way, it resemble aspects of the network’s task domain. appears unlikely that each bears some simple relation- To investigate this conjecture we trained a series of ship to the task domain over which they operate. three-layer feedforward networks to solve the colour- It remains a live possibility, however, that connec- categorisation problem using a subset of Laakso and tionist networks embody some (higher-order) internal Cottrell’s original data: about 25 each of the spectra structure that warrants a representational understanding normally classified as red, green, and blue, respectively. of their connection weights. To explore this possibility Each network had 61 input units and three hidden units. we need to take a closer look at how connectionist We represented the fan-ins of the trained networks using networks process their inputs. weight diagrams and compared these with the means of The key players in network processing are what we the red, green and blue input data sets. call fan-ins. A fan-in is the vector of weights modulating A typical example is shown in Fig. 5. The three fan- the effect of incoming activity on a particular hidden ins are depicted on the right, the mean spectra on the unit. Within any feedforward network there is one fan-in left. One immediately notices a striking similarity be- per hidden unit, each corresponding to a row of the tween the fan-ins of this network and the means of the network’s hidden layer weight matrix (see Fig. 4). Fan- data sets. The shape of the fan-in for hidden unit 2, for ins effect the transformation of the network’s input space example, corresponds nicely to the shape of the mean into its hidden unit activation space. More specifically, spectrum of the 25 inputs that normal observers classify each fan-in determines how one hidden unit responds to as red. Likewise, the fan-in for hidden unit 3 resembles input, by way of a product of input activation and fan-in the mean of the ‘‘green’’ spectra, and the fan-in for values. This product is then modified by the hidden unit’s hidden unit 1 resembles the mean of the ‘‘blue’’ spectra. activation function to produce the value along a single What this indicates is that, for each fan-in, the relative coordinate in activation space. It is thus a network’s fan- magnitudes of its component weights mirror the relative ins that interface directly with the structure of the vectors amplitudes of the various wavelengths comprising one of coded at the input layer, and which ultimately determine the mean spectra. Since this mirroring is a similarity at the structure of activation space. Accordingly, if we are the level of relations, rather than properties, it is an in- to discover any structural resemblance between a net- stance of second-order resemblance. And since it is work’s connection weights and its task domain it is the grounded in the physical relations among the fan-in fan-ins on which we should focus. weights (i.e. their relative magnitudes), it is a structural resemblance. In the previous section we saw that it is a relation of Connection weights as representing vehicles structural resemblance that anchors a representational interpretation of hidden unit activation patterns. We’ve Given the crucial role of fan-ins in network processing, just seen (Fig. 5) that there is a structural resemblance we offer the following proposal: the fan-ins in the hidden between the fan-ins of the colour-categorisation network Fig. 3 A plot of cumulative probability against weight- matrix correlation. A good fit to the straight line indicates a normal distribution Fig. 4 A simple network with and without its three fan-ins (r1, r2, & r3) highlighted and the task domain over which it operates. That It is the ‘‘shape’’ of these vectors that govern the resemblance licenses an interpretation of fan-ins (and respective activities of the hidden units they influence, by their component weights) as representing vehicles. way of the so-called ‘‘dot product’’ of weights and input activation. Taking a dot product is a well-known way of measuring the similarity of two vectors.15 Each fan-in is, Connectionist computation: the real story in effect, a filter looking for input with a particular shape. The dot product indicates the extent to which a The characterisation of computation we offered above given input matches a particular fan-in filter, as does the suggests that connectionist systems must satisfy two activity of the corresponding hidden unit. Input that is conditions if they are to count as computational devices: presented to the colour-categorisation network, for (i) they must implicate representing vehicles of some example, is filtered through three fan-in vectors, thereby kind, and (ii) the contents of those vehicles must shape modifying the activation of the three units in the hidden the causal processes that occur in connectionist pro- layer. Activity in the hidden layer thus reflects the degree cessing. We established that connection weights may of similarity between the input spectra and the fan-ins. legitimately be interpreted as representing vehicles, at Correlatively, hidden unit activation space forms a least for a significant class of connectionist systems. It three-dimensional map that allows us to compare the remains to show that the contents of this species of filtered versions of the input spectra, one with the other. vehicle influence the trajectory of connectionist pro- cessing. 15 The dot product of two vectors in a Euclidean space is at a We have noted the crucial role of fan-ins in trans- maximum when the angle between them is zero, and decreases as forming a network’s inputs into hidden layer activation. the angular separation between them increases. Fig. 5 On the left are the mean spectra of the three classes of inputs; those classified (from top to bottom) as red, green and blue. On the right are the three fan-ins of the network, with weight value on the y-axis and input index on the x-axis Now the final piece of the puzzle is in place. We network’s hidden units in response to its various inputs, have shown that the fan-ins in the colour-categorisa- and, more importantly, it is sustained (synchronically) tion network structurally resemble aspects of the task by the higher order structure of the network’s hidden domain, namely, the mean spectra of the three classes layer connection weights. of input (red, green and blue). That resemblance war- These two kinds of structural resemblance support an rants us in regarding those fan-ins, and their compo- interpretation of activation patterns and connection nent weights, as representing vehicles. But, we have weights as different species of representing vehicle. And also shown that it is this same resemblance, embodied these two kinds of representing vehicle shape the tra- in the physical structure of the fan-ins, that drives the jectory of connectionist processing in different ways. causal processes within the network. It is by virtue of Activation pattern representations shape the impact that their resemblance to global features of the input data one network has on other networks or motor mecha- that the fan-in vectors contrive to transform reflectance nisms to which it is connected. Connection weight rep- spectra into a map of categorial colour, and thereby resentations, by contrast, are responsible for the solve the problem posed to the network. Representa- production of these activation pattern representations in tional content is in the driver’s seat here, as we require, the first place. and it appears that connectionist networks are genuine This last point is important because it secures a computational devices. computational understanding of connectionist process- This is a very satisfying result for proponents of ing, at least according to the characterisation we have connectionism. It enables us to meet Ramsey’s chal- developed in this paper. The causal operations that lenge, because we now have a robust explanatory generate a hidden unit activation pattern implicate one motivation for viewing connection weights as represen- or more representing vehicles (the fan-in connection tations. And this in turn puts to bed the lingering doubts weights) and the trajectory of this process is shaped by about connectionism’s computational credentials. the representational content of these vehicles (since it is Connectionist networks are capable of successfully the structural resemblance relation that determines the negotiating their task domains because they structurally representational content of the fan-ins). Connectionist resemble them—a resemblance relation they gradually networks are not merely association engines or dynam- acquire in the course of training. This structural ical systems; they are full-blooded computational resemblance relation is sustained at two different levels mechanisms. And they compute by exploiting relations of description. It is sustained (diachronically), by the of structural resemblance between their connection set of activation patterns that are produced across a weights and their target domains. McClelland JL, Rumelhart DE (eds) (1986) Parallel distributed References processing: explorations in the microstructure of cognition, Vol. 2. MIT Press, Cambridge Anonymous (1976) Munsell book of color: matte finish collection. O’Brien G (1999) Connectionism, analogicity and mental content. Munsell Color Company, Inc Acta Analytica 22:111–131 Anonymous (1995) Kuopio color database. http://www.lut.fi/ltkk/ O’Brien G, Opie J (2001). Connectionist vehicles, structural tite/research/color/lutcs_database.html resemblance, and the phenomenal mind. In: Veldeman J (eds). Bechtel W, Abrahamsen A (2002) Connectionism and the mind: Naturalism and the phenomenal mind, a special issue of parallel processing, dynamics, and evolution in networks. Communication and Cognition. 34: 13–38 Blackwell, Oxford O’Brien G, Opie J (2004) Notes towards a structuralist theory of Blachowicz J (1997) Analog representation beyond mental imagery. mental representation. In: Clapin H, Staines P, Slezak P (eds) J Philosophy 94:55–84 Representation in mind: new approaches to mental represen- Bunge M (1969) Analogy, simulation, representation. Revue- tation. Elsevier Internationale-de-Philosophie 23:16–33 Palmer S (1978) Fundamental aspects of cognitive representation. Chalmers DJ (1994) On implementing a computation. Mind Mach In: Rosch E, Lloyd B (eds) Cognition and categorization. 4:391–402 Lawrence Erlbaum Churchland PS, Koch C, Sejnowski T (1990) What is compu- Parkkinen JPS, Hallikainen J, Jaaskelainen T (1989) Characteristic tational neuroscience? In: Schwartz E (eds) Computational spectra of Munsell colors. J Opt Soc A 6(2):318–322 neuroscience. MIT Press, Cambridge Pinker S (1997) How the mind works. Norton, New York Clark A (1989) Microcognition: philosophy, cognitive science, and Pinker S (2002) The blank slate: the modern denial of human parallel distributed processing. MIT Press, Cambridge nature. Viking, New York Clark A (1993) Associative engines: connectionism, concepts, and Port R, van Gelder TJ (1995) Mind as motion: explorations in the representational change. MIT Press, Cambridge dynamics of cognition. MIT Press, Cambridge Cummins R (1996) Representations, targets, and attitudes. MIT Pylyshyn ZW (1984) Computation and cognition: toward a foun- Press, Cambridge dation for cognitive science. MIT Press, Cambridge Cummins R, Schwarz G (1991). Connectionism, computation and Ramsey W (1997) Do connectionist representations earn their cognition. In: Horgan T, Tienson J (eds). Connectionism and explanatory keep? Mind Lang 12(1):34–66 the philosophy of mind. Kluwer, Dordrecht Rumelhart DE, McClelland JL (eds) (1986) Parallel distributed Dennett D (1987) The intentional stance. MIT Press, Cambridge processing: explorations in the microstructure of cognition, vol. Dietrich (1989) Semantics and the computational paradigm in 1. MIT Press, Cambridge cognitive psychology. Synthese 79:119–141 Shepard R, Chipman S (1970) Second-order isomorphism of Fodor JA (1975) The language of thought. Harvester Press, internal representations: shapes of states. Cog Psychol 1:1–17 London Shepard R, Metzler J (1971) Mental rotation of three-dimensional Fodor JA (1992) The big idea: can there be a science of the mind? objects. Science 171:701–703 Times Literary Supplement July 3: 5–7 Swoyer C (1991) Structural representation and surrogative Fodor JA (2000) The mind doesn’t work that way: the scope and reasoning. Synthese 87:449–508 limits of computational psychology. MIT Press, Cambridge Tienson J (1987) Introduction to connectionism. South J Philos Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive 26:1–16 architecture: a critical analysis. Cognition 28:3–71 Truitt TD, Rogers AE (1960) Basics of analog computers. John F. Gardenfors P (1996) Mental representation, conceptual spaces and Rider metaphors. Synthese 106:21–47 Von Eckardt B (1993) What is cognitive science? MIT Press, Johnson-Laird P (1983) Mental models. Harvard University Press Cambridge Laakso A, Cottrell G (2000) Content and cluster analysis: assessing representational similarity in neural systems. Philos Psyc 13:47–76

References (32)

  1. Anonymous (1976) Munsell book of color: matte finish collection. Munsell Color Company, Inc Anonymous (1995) Kuopio color database. http://www.lut.fi/ltkk/ tite/research/color/lutcs_database.html
  2. Bechtel W, Abrahamsen A (2002) Connectionism and the mind: parallel processing, dynamics, and evolution in networks. Blackwell, Oxford
  3. Blachowicz J (1997) Analog representation beyond mental imagery. J Philosophy 94:55-84
  4. Bunge M (1969) Analogy, simulation, representation. Revue- Internationale-de-Philosophie 23:16-33
  5. Chalmers DJ (1994) On implementing a computation. Mind Mach 4:391-402
  6. Churchland PS, Koch C, Sejnowski T (1990) What is compu- tational neuroscience? In: Schwartz E (eds) Computational neuroscience. MIT Press, Cambridge
  7. Clark A (1989) Microcognition: philosophy, cognitive science, and parallel distributed processing. MIT Press, Cambridge
  8. Clark A (1993) Associative engines: connectionism, concepts, and representational change. MIT Press, Cambridge
  9. Cummins R (1996) Representations, targets, and attitudes. MIT Press, Cambridge
  10. Cummins R, Schwarz G (1991). Connectionism, computation and cognition. In: Horgan T, Tienson J (eds). Connectionism and the philosophy of mind. Kluwer, Dordrecht
  11. Dennett D (1987) The intentional stance. MIT Press, Cambridge Dietrich (1989) Semantics and the computational paradigm in cognitive psychology. Synthese 79:119-141
  12. Fodor JA (1975) The language of thought. Harvester Press, London
  13. Fodor JA (1992) The big idea: can there be a science of the mind? Times Literary Supplement July 3: 5-7
  14. Fodor JA (2000) The mind doesn't work that way: the scope and limits of computational psychology. MIT Press, Cambridge
  15. Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28:3-71
  16. Gardenfors P (1996) Mental representation, conceptual spaces and metaphors. Synthese 106:21-47
  17. Johnson-Laird P (1983) Mental models. Harvard University Press Laakso A, Cottrell G (2000) Content and cluster analysis: assessing representational similarity in neural systems. Philos Psyc 13:47-76
  18. McClelland JL, Rumelhart DE (eds) (1986) Parallel distributed processing: explorations in the microstructure of cognition, Vol. 2. MIT Press, Cambridge O'Brien G (1999) Connectionism, analogicity and mental content. Acta Analytica 22:111-131
  19. O'Brien G, Opie J (2001). Connectionist vehicles, structural resemblance, and the phenomenal mind. In: Veldeman J (eds). Naturalism and the phenomenal mind, a special issue of Communication and Cognition. 34: 13-38
  20. O'Brien G, Opie J (2004) Notes towards a structuralist theory of mental representation. In: Clapin H, Staines P, Slezak P (eds) Representation in mind: new approaches to mental represen- tation. Elsevier
  21. Palmer S (1978) Fundamental aspects of cognitive representation. In: Rosch E, Lloyd B (eds) Cognition and categorization. Lawrence Erlbaum
  22. Parkkinen JPS, Hallikainen J, Jaaskelainen T (1989) Characteristic spectra of Munsell colors. J Opt Soc A 6(2):318-322
  23. Pinker S (1997) How the mind works. Norton, New York Pinker S (2002) The blank slate: the modern denial of human nature. Viking, New York
  24. Port R, van Gelder TJ (1995) Mind as motion: explorations in the dynamics of cognition. MIT Press, Cambridge Pylyshyn ZW (1984) Computation and cognition: toward a foun- dation for cognitive science. MIT Press, Cambridge
  25. Ramsey W (1997) Do connectionist representations earn their explanatory keep? Mind Lang 12(1):34-66
  26. Rumelhart DE, McClelland JL (eds) (1986) Parallel distributed processing: explorations in the microstructure of cognition, vol.
  27. Shepard R, Chipman S (1970) Second-order isomorphism of internal representations: shapes of states. Cog Psychol 1:1-17
  28. Shepard R, Metzler J (1971) Mental rotation of three-dimensional objects. Science 171:701-703
  29. Swoyer C (1991) Structural representation and surrogative reasoning. Synthese 87:449-508
  30. Tienson J (1987) Introduction to connectionism. South J Philos 26:1-16
  31. Truitt TD, Rogers AE (1960) Basics of analog computers. John F. Rider
  32. Von Eckardt B (1993) What is cognitive science? MIT Press, Cambridge
About the authors
University of Adelaide, Faculty Member
University of Adelaide, Faculty Member