Academia.eduAcademia.edu

Outline

Experiments with ‘HP Java’

1997, Concurrency: Practice and Experience

https://doi.org/10.1002/(SICI)1096-9128(199706)9:6<633::AID-CPE311>3.3.CO;2-C
Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 1997 Experiments with "HP Java" Bryan Carpenter Syracuse University Yuh-Jye Chang Syracuse University Geoffrey C. Fox Syracuse University Donald Leskiw Syracuse University Follow this and additional works at: https://surface.syr.edu/npac Part of the Programming Languages and Compilers Commons Recommended Citation Carpenter, Bryan; Chang, Yuh-Jye; Fox, Geoffrey C.; and Leskiw, Donald, "Experiments with "HP Java"" (1997). Northeast Parallel Architecture Center. 32. https://surface.syr.edu/npac/32 This Working Paper is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized administrator of SURFACE. For more information, please contact [email protected]. Experiments with \HPJava" Bryan Carpenter, Yuh-Jye Chang, Geo rey Fox, Donald Leskiw, Xiaoming Li, Northeast Parallel Ar hite tures Centre, Syra use University, Syra use, New York May 19, 2000 Abstra t We onsider the possible r ole of Java as a language for High Perfor- man e Computing. After dis ussing reasons why Java may be a natural andidate for a portable parallel programming language, we des ribe several ase studies. These over Java so ket programming, message-passing through a Java interfa e to MPI, and gramming in Java. 1 lass libraries for data-parallel pro- Contents 1 Introdu tion 3 2 Issues 4 3 Message-passing 3.1 Java so kets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 MPI Interfa e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Derived data types and higher-level MPI features . . . . . . . . . 6 10 12 4 Data parallellism in Java 14 5 Dis ussion 19 1.1 Overview of this arti le. . . . . . . . . . . . . . . . . . . . . . . . 2.1 Approa hes to Parallelism in Java . . . . . . . . . . . . . . . . . 2.2 Communi ation in Java . . . . . . . . . . . . . . . . . . . . . . . ase studies 4.1 Parallel arrays and olle tive ommuni ation . . . . . . . . . . . 4.2 \Array syntax" in Java. . . . . . . . . . . . . . . . . . . . . . . . 2 4 4 5 6 15 19 1 Introdu tion The explosion of interest in Java over the last year has been driven largely by its role in bringing a new generation of intera tive pages to the World Wide Web. Undoubtedly various features of the language| ompa tness, byte- ode portability, se urity, and so on|make it parti ularly attra tive as an implementation language for applets embedded in Web pages. But it is lear that the ambitions of the Java development team go well beyond enhan ing the fun tionality of HTML do uments. A ording to [20℄ \Java is designed to meet the hallenges of appli ation development in the ontext of heterogeneous, network-wide distributed environments. Paramount among these hallenges is se ure delivery of appli ations that onsume the minimum of system resour es, an run on any hardware and software platform, and an be extended dynami ally." Several of these on erns are mirrored in developments in the High Performan e Computing world over a number of years. A de ade ago the fo us of interest in the parallel omputing ommunity was on parallel hardware. A parallel omputer was typi ally built from spe ialized pro essors integrated through a proprietary high-performan e ommuni ation swit h. If the ma hine also had to be programmed in a proprietary language, that was an a eptable pri e for the bene ts of using a super omputer. This attitude was not sustainable as one parallel ar hite ture gave way to another, and the ost of porting software beame exorbitant. For several years now, portability a ross platforms had been a entral on ern in parallel omputing [4, 5, 14, 13, 23℄. More fundamentally, the assumption that high performan e omputing will be done primarily on spe ialized super omputers is questioned in reasingly. Rapid strides in performan e and onne tivity of ordinary workstations and PCs make it look equally possible that the future of parallel omputing will be on lo al area networks, or even the Internet [19, 17℄. With Java positioned to be ome a standard programming language on the Internet, and s ienti parallel pro essing edging towards network-based omputation, it is natural to ask how these two te hnologies will intera t. How suitable is Java for s ienti omputing, and do lessons from resear h in parallel omputing have impli ations for the development of Java? Popular a laim aside, there are some reasons to think that Java may be a good language for s ienti and parallel programming.  Java is a des endant of C++. C and C++ are used in reasingly in s i- enti programming; they are already used almost universally by impleompilers. In re ent years numerous variations on the theme of C++ for parallel omputing have appeared. See, for example [7, 28, 11, 2, 12, 21℄. menters of parallel libraries and 3  Java omits various features of C and C++ that are onsidered \diÆ ult"| notably, pointers. Poor ompiler analysis has often been blamed on these features. The inferen e is that Java, like Fortran, may be a suitable sour e language for highly optimizing ompilers (although dire t eviden e for this belief is still la king).  Java omes with builtin multithreading. Independent threads may be s heduled on di erent pro essors by a suitable runtime. In any ase multithreading an be very onvenient in expli it message-passing styles of parallel programming [24℄. We will return to the question of whether parallel omputing may have impliations for the development of Java in se tion 5. The a ronym \HPJava" was oined in a draft white paper produ ed by members of the PCRC onsortium in the rst half of 1996 [10℄. At NPAC we have been experimenting with some of the ideas put forward there. 1.1 Overview of this arti le. Se tion 2 outlines various options for parallel programming in Java|possible ways to express parallelism, and ways to handle inter-pro ess ommuni ation. The main te hni al ontent of the paper is in se tions 3 and 4. Se tion 3 ontains some ase studies in whi h we explore the message-passing style of programming in Java. We over parallel programming using so kets dire tly, and des ribe our Java interfa e to MPI. In se tion 4 we dis uss approa hes to data-parallel programming in Java, and outline one of our demo programs. In this arti le our emphasis is more on language bindings and interfa e issues, and less on performan e. Java ompilers are in an early stage of development, and we assume that urrent performan e gures are not indi ative of future potential. 2 2.1 Issues Approa hes to Parallelism in Java Java already supports on urren y through the thread me hanism and monitor syn hronization built into the language. In this arti le we are interested in truly parallel omputation, involving multiple CPUs. Su h parallelism ould be introdu ed into Java in a number of ways. It ould be a hieved through automati parallelization of sequential ode, but it is un lear why this would be more su essful for Java than for other languages. Alternatively, the Java virtual ma hine for a shared memory multipro essor an s hedule the threads of a multi-threaded Java program on di erent pro essors. Some su ess with these approa hes has already been demonstrated [3℄. For 4 omputation on a network (or distributed memory omputer) realisti options in lude language extensions or dire tives akin to HPF, or provision of libraries| lass libraries|to support task parallelism or data parallelism. A popular approa h in C++ has been to defer language extensions and on entrate on lass library support for parallel programming. The similarities between the two languages suggest this may be a fruitful avenue in Java too. The su ess of this analogy is by no means automati , however. Features su h as templates and user-de ned operator overloading make the C++ language inherently more ustomizable than Java. In C++ library-de ned types an be used on an identi al footing to primitive types|inline methods mean they an be almost as eÆ ient as primitive types. Less importantly, but onveniently, new ontrol onstru ts an often be simulated in C or C++ through use of ma ros1. On grounds, presumably, of simpli ity and transparen y many of these features have been omitted from Java. Su h aveats notwithstanding, this arti le will on entrate on lass libraries rather than language extensions. We will be working with lass libraries implemented in the standard Java development environment. Another open question is how multiple intera ting Java tasks are to be initiated. Conventionally in network omputing, instru tions to exe ute a parti ular user task are sent to a spe ialized daemon or se ure server running on the target host. Alternatively standard operating system daemons (rshd, rexe d, . . . ) an be used to the same e e t. Java brings some options of its own. One is to use standard or enhan ed Web servers to exe ute Java byte- odes. This approa h is predi ated on the existen e of ooperative servers, probably running suitable CGI s ripts. Another option is to exploit Java-enabled browsers. Applets downloaded from a parti ular server an perform omputations and return the results to that server. In any ase the substantive improvement over onventional network omputing is that the byte- odes an be downloaded on the y to target hosts, even in heterogenous networks. Conventionally, it was often ne essary to install and ompile an appli ation on ea h target host before starting the task remotely. 2.2 Communi ation in Java The standard Java API provides a simpli ed interfa e to Internet so kets. This interfa e hides mu h of the ugly detail involved in so ket-programming at the at the traditional C/UNIX level. The java.net interfa e provides less exibility than using the system alls dire tly. On the other hand, Java's built-in support for threads adds some exibility in s heduling ommuni ations that is missing from raw C. We will give an example of so ket programming in se tion 3.1, but traditionally this has not been a popular paradigm in the parallel pro essing world. 1 Of ourse we ould run a prepro essor over Java, but this is not a natural pro edure in existing Java environments 5 S ienti programmers have expe ted to program inter-pro ess ommuni ation at a higher level, if at all. More su essful s hemes in lude  Message-passing through language-level support [24, 15℄ or higher-level library interfa es [14℄.  Data parallelism. We restri t the de nition of \data parallelism" to over algorithms that a hieve parallelism through operations on distributed arrays. Syn hronization is usually limited to bulk syn hronization o uring naturally through olle tive operations.  Communi ation through shared memory or shared obje ts. Some more or less intri ate me hanism for inter-pro ess syn hronization is usually provided. The ase studies in the rest of this arti le restri t themselves to message-passing and data parallelism. As observed in the previous se tion, ommuni ation through true shared memory is already impli it in the Java thread model. Communi ation through remote obje ts is undoubtedly a natural and important paradigm in Java, espe ially for a ess to remote servi es [27, 22, 21℄, but we will not dis uss it further here. An orthogonal issue is whether the high-level libraries used to implement these paradigms (presuming lass-library implementations) should be written in Java on top of the standard API, or whether they should be implemented as dire t interfa es to native ode. We will have more to say about this later. 3 Message-passing ase studies Message-passing remains one of the most e e tive and widely used ommuniation paradigms in parallel omputing. In this se tion we ompare two approa hes to message-passing in Java, in the ontext of a s ienti appli ation. The rst approa h is to use the so ket interfa e in the standard Java API. The se ond is to work through a Java interfa e to the message-passing standard, MPI [14℄. To minimize distra ting details, our appli ation will be elementary: Conway's Life automaton. 3.1 Java so kets The UNIX so ket model is most suitable for programming lient-server appliations. Typi al s ienti parallel programs do not t dire tly into this model. Before a SPMD program an start two onditions must obtain: a pool of symmetri peer pro esses must have been reated, and ea h peer must be able to address a message to any other. This situation is typi ally bootstrapped as follows. The program is invoked on one host. This host reates a listening so ket. 6 It sends instru tions to servers on some other hosts (either expli itly or through a ommand su h as rsh) to start remote invo ations of the program (somehow sending the number of the port on whi h it is listening in the initiation message). Ea h new pro ess reates a listening so ket and sends its port number to the original pro ess. The original pro ess broad asts these address to all its slaves. At this point ea h pro ess knows the port number on whi h ea h of its peers is listening. Either they establish all-to-all onne tions now, before entering the SPMD main program, or the main program starts immediately, and onne tions are established dynami ally when a message needs to be sent. Figures 1, 2 give a s hemati outline of a distributed Life program using java.net. The fairly intri ate ode sket hed above for initialization and establishment of so ket onne tions has been absorbed into the de nition of an auxilliary lass hpj. Its interfa e is lass hpj { hpj(String[℄ args) ; int my_id() ; int num_pro essor() ; } DataInputStream Input(int id) ; DataOutputStream Output(int id) ; ... The members Input and Output return streams asso iated with so kets onne ted to peer pro esses. In the example an N by N Life board is divided blo kwise in one dimension, ea h pro essor holding a lo al blo k of width blo kSize. We note  Initialization is a omplex pro edure and learly it should not be oded anew for ea h appli ation program.  In this example the messages were ontiguous byte ve tors that ould be transmitted eÆ iently through the read and write methods of the Java so ket API. In general the messages will have more omplex types and the data may not be ontiguous in memory. Using the typed primitives of the standard API may then in ur extra osts of opying and type- onversion. For reasons su h as these we suspe t that dire t so ket programming will remain unattra tive to s ienti parallel programmers, even with the simpli ed Java so ket API. 7 lass life_java { stati publi void main(String[℄ args) throws Ex eption { hpj HPJava = new hpj(args); int np = HPJava.num_pro essor(); int id = HPJava.my_id(); ... ompute lo al `blo kSize', `blo kBase' (avoiding empty blo ks). // `blo k' has `blo kSize + 2' olumns. This allows for ghost ells. byte blo k[℄[℄ = new byte[blo kSize+2℄[N℄; ... initialize lo al blo k with some pattern // Main update loop. int next = (id + 1) % np; int prev = (id + np - 1) % np; for(int iter = 0 ; iter < NITER ; iter++) { // Shift this blo k's upper edge into next neighbour's lower ghost edge HPJava.Output(next).write(blo k[blo kSize℄); HPJava.Output(next).flush(); HPJava.Input(prev).read(blo k[0℄); // Shift this blo k's lower edge into prev neighbour's upper ghost edge HPJava.Output(prev).write(blo k[1℄); HPJava.Output(prev).flush(); HPJava.Input(next).read(blo k[blo kSize+1℄); } } } ... Cal ulate a blo k of neighbour sums. ... Update blo k of board values. Figure 1: Skeleton of so ket-based Life program. 8 int sums[℄[℄ = new int[blo kSize℄[N℄; .... // Cal ulate blo k of neighbour sums. for(i = 0, ib = 1 ; i < blo kSize ; i++, ib++) for(y = 0 ; y < N ; y++) { int y_n = (y+N-1) % N ; int y_p = (y+1) % N ; sums[i℄[y℄ = blo k[ib - 1℄[y_n℄ + blo k[ib - 1℄[y℄ + blo k[ib - 1℄[y_p℄ + blo k[ib℄ [y_n℄ + blo k[ib℄ [y_p℄ + blo k[ib + 1℄[y_n℄ + blo k[ib + 1℄[y℄ + blo k[ib + 1℄[y_p℄; } // Update blo k of board values. for(i = 0, ib = 1 ; i < blo kSize ; i++, ib++) for(y = 0 ; y < N ; y++) swit h (sums[i℄[y℄) { ase 2 : break; ase 3 : blo k[ib℄[y℄ = 1; break; default: blo k[ib℄[y℄ = 0; break; } Figure 2: So ket-based Life program (detail). 9 3.2 MPI Interfa e We have produ ed a Java interfa e to an existing MPI implementation [25℄ using Java native methods2 The interfa e has been tested on a luster of UltraSpar workstations running Solaris3 . Our interfa e is modelled on the proposed C++ bindings of MPI. For example, many of the most basi fun tions of the library are members of the ommuni ator lass, Comm: publi lass Comm { publi int Size(); publi int Rank(); void Send(Obje t buf, int offset, int ount, Datatype datatype, int dest, int tag) ; Status Re v(Obje t buf, int offset, int ount, Datatype datatype, int dest, int tag) ; ... } Figure 3 is a straightforward trans ription of the so ket-based program in the last se tion4 . Our MPI intefa e uses a slightly di erent model for a essing global resour es|stati members on an MPI lass: lass MPI { stati Init() ; stati Finalize() ; } publi stati Comm WORLD ; publi publi ... stati stati Datatype BYTE ; Datatype INT ; rather than dynami members of a jpi lass|but this di eren e is not parti ularly signi ant (yet another approa h will be taken in the example of se tion 4.1). Otherwise the orresponden e between this ode and the so ket ode is 2 For earlier work on interfa ing Java to PVM see http://www.isye.gate h.edu/ hmsr/JavaPVM/ and http://www. s.virginia.edu/~ajfzj/jpvm.html 3 Interfa ing Java to MPICH/P4 was not straightforward, due to unpleasant intera tions between the Java run-time and the underlying P4 implementation. For example, standard implementations of both use the UNIX SIGALRM signal. We found it ne essary to pat h the MPICH 1.0.13 release to work round these in ompatibilities. The ne essary pat hes are available from us on request. 4 MPI aÆ ionados will note that the use of standard mode send is \unsafe", and ould deadlo k if the system does not provide enough bu ering. The same remark ould be made about the so ket-based version 10 lass Life { stati publi void main(String[℄ args) { MPI.Init(args); int np = MPI.WORLD.Size(); int id = MPI.WORLD.Rank(); ... ompute lo al `blo kSize', `blo kBase' (avoiding empty blo ks). // `blo k' has `blo kSize + 2' olumns. This allows for ghost ells. byte blo k[℄[℄ = new byte[blo kSize + 2℄[N℄ ; ... initialize lo al blo k with some pattern // Main update loop. int next = (id + 1) % np; int prev = (id + np - 1) % np; for(int iter = 0 ; iter < NITER ; iter++) { // Shift this blo k's upper edge into next neighbour's lower ghost edge MPI.WORLD.Send(blo k[blo kSize℄, N, MPI.BYTE, next, 0); MPI.WORLD.Re v(blo k[0℄, N, MPI.BYTE, prev, 0); // Shift this blo k's lower edge into prev neighbour's upper ghost edge MPI.WORLD.Send(blo k[1℄, N, MPI.BYTE, prev, 0); MPI.WORLD.Re v(blo k[blo kSize + 1℄, N, MPI.BYTE, next, 0); } } } ... Cal ulate blo k of neighbour sums. ... Update blo k of board values. MPI.Finalize(); Figure 3: Simple MPI Life program. 11 dire t. In the next se tion we illustrate some of the added value that an MPI interfa e brings. We have broken with the usual MPI onvention of returning an error status from every fun tion. This pra ti e is in onvenient in Java be ause arguments annot be passed by referen e and dire tly modi ed. This makes the return value pre ious, and using it up on an error value that everybody ignores is a waste. Java has a well integrated ex eption me hanism for ignoring error information. 3.3 Derived data types and higher-level MPI features Des ription of the data bu ers passed to ommuni ation operations presents some spe ial problems in Java. Existing MPI bindings depend on a linear memory model and expli it or impli it use of pointers. Java does not have a linear memory model. Even behind the s enes a Java array has no uniquely de ned address in memory, be ause the garbage olle tor is allowed to relo ate obje ts unpredi atably during program exe ution to avoid fragmentation of its workspa e. Our Java interfa e tries to retain as mu h of the MPI derived datatype me hanism as pra ti al, but some fun tionality has been sa ri ed. The bu er argument passed to a send or re eive operation must be a one-dimensional array of primitive type. Any o set spe i ed in a derived type argument then refers to a displa ement within this one-dimensional array, never a displa ement in memory. All MPI derived types expressible through our interfa e have a uniquely de ned base type|a Java primitive type. Interfa es to MPI TYPE HVECTOR and MPI TYPE HINDEXED are provided, but the strides and displa ements are in units of the base type, not bytes. An interfa e to MPI TYPE STRUCT is provided, but all omponent types in the \stru t" must have the same base type. In the on rete Java binding of the send fun tion, for example, void Send(Obje t buf, int offset, int ount, Datatype datatype, int dest, int tag) ; the formal buf argument is presented as a generi Java Obje t. As explained above, the a tual argument must be a linear array. The se ond argument is the o set in this array of the rst element of the message5 . The remaining arguments orrespond dire tly to arguments of MPI Send. The base type of the datatype argument must be the type of the elements of buf. Figures 4, 5 sket h a version of the Life program illustrating several of these features. As well as derived types, this program uses the Cartesian topologies of MPI. The Cart lass is derived from Comm. In the example, the topology 5 This o set is in units of the buf array element|or the base type of datatype|not of any ompound type. The Obje t + o set presentation is reminis ent of the interfa e of the arrayCopy utility in the standard Java API. 12 lass Life { void main(String args) { MPI.Init(args) ; int dims [℄ = new int [2℄ ; ... Set `dims', et Cart p = new Cart(MPI.WORLD, dims, periods, false) ; int oords = new int [2℄ ; p.Get(dims, periods, oords) ; ... Compute lo al `blo kSizeX', `blo kBaseX', `blo kSizeY', `blo kBaseY'. // Create `blo k', allowing for ghost ells. int sX = blo kSizeX + 2 ; int sY = blo kSizeY + 2 ; blo k = new byte [sX * sY℄ ; ... Define initial state of Life board // Pre ompute parameters of shift ommuni ations. Datatype edgeXType = MPI.BYTE.Contiguous(sY) ; edgeXType.Commit() ; Datatype edgeYType = MPI.BYTE.Ve tor(sX, 1, sY) ; edgeYType.Commit() ; CartShift pX = p.shift(0, 1) ; CartShift nX = p.shift(0, -1) ; CartShift pY = p.shift(1, 1) ; CartShift nY = p.shift(1, -1) ; // Main update loop. for(int iter = 0 ; iter < NITER ; iter++) { ... Exe ute shifts. } } ... Cal ulate blo k of neighbour sums. ... Update blo k of board values. MPI.Finalize(); } ... Figure 4: Life program using full MPI. 13 // Exe ute shifts... // Shift this blo k's upper x edge into next neighbour's lower ghost edge p.Sendre v(blo k, blo kSizeX * sY, 1, edgeXType, pX.dst, 0, blo k, 0, 1, edgeXType, pX.sr , 0) ; // Shift this blo k's lower x edge into prev neighbour's upper ghost edge p.Sendre v(blo k, sY, 1, edgeXType, nX.dst, 0, blo k, (blo kSizeX + 1) * sY, 1, edgeXType, nX.sr , 0) ; // Shift this blo k's upper y edge into next neighbour's lower ghost edge p.Sendre v(blo k, blo kSizeY, 1, edgeYType, pY.dst, 0, blo k, 0, 1, edgeYType, pY.sr , 0) ; // Shift this blo k's lower y edge into prev neighbour's upper ghost edge p.Sendre v(blo k, 1, 1, edgeYType, nY.dst, 0, blo k, blo kSizeY + 1, 1, edgeYType, nY.sr , 0) ; Figure 5: Full MPI Life program (detail). p represents a two dimensional periodi grid of pro esses. The Get member returns the oordinates of the lo al pro ess. From these the parameters of the lo al array blo k are omputed. The values sX, sY represent the sides of the lo ally held array segment, in luding ghost regions. This segment is reated as a one-dimensional Java blo k. The derived type edgeXType des ribes the stru ture of ghost area on the upper or lower x sides: ontiguous regions of the blo k array of extent sY. The type edgeYType des ribes the y-side ghost areas: non- ontiguous regions of ount sX, regular stride sY. The shift member of Cart orresponds to the MPI fun tion MPI CART SHIFT: array, it returns the sour e and destination pro essors for a binding returns these values in an obje t of lass y li shift. The Java CartShift whi h just on- tains two integers. Finally, in the main loop, the shifts are exe uted by using the Comm member Sendre v, whi h orresponds to the standard MPI fun tion MPI Sendre v. This performs a send and a re eive on urrently (avoiding a potential deadlo k in the implementations given in the previous se tions). 4 Data parallellism in Java The most omprehensive statement of the data parallel model of omputation is the High Performan e Fortran standard [13, 23℄. That do ument is supposed to 14 embody mu h of the olle tive experien e of the s ienti parallel programming ommunity. Presumably, then, any attempt to in orporate data parallelism into Java should build on the HPF model wherever possible. The HPF de nition onsists of a large set of dire tives that an be used to annotate a standard Fortran program, a small handful of language extensions, and a library of new fun tions for operating on arrays. An initial data-parallel Java may well be implemented through a lass-library. This library would assume the roles of the dire tives and language extensions in HPF as well as the HPF library. We will loosely distinguish two di erent levels at whi h a library implementation of the HPF semanti s an operate.  The rst is the level of the so- alled run-time libraries [1, 8, 9, 6℄. This kind of library provides fun tions for s heduling and exe uting spe i patterns of olle tive ommuni ation already identi ed by a ompiler (in the HPF ase) or else by an appli ation programmer using the library dire tly. Su h a library may also provide fun tions for translating between global subs ripts and lo al, node-level subs ripts|ie, for omputing the mapping of a distributed array into the address spa es of individual pro essors.  Alternatively, a library an operate at a higher level that on eals all aspe ts of data lo alization and transfer from the user. The only responsibility of the user is to spe ify the distribution format of arrays when they are de lared. Subsequently the user just tells the library to do parti ular operations on parti ular distributed arrays. It is left to the library to work out whether or not a ommuni ation is implied. In e e t the library is operating at the same level as the HPF language. An example of su h a library is A++/P++ [26℄. In either ase a lass library version is likely to in lude lasses to des ribe the elements of the HPF data model, su h as pro essor arrangements and the distributed arrays themselves. 4.1 Parallel arrays and olle tive ommuni ation At the run-time level, a lass library implementation of the HPF model is likely to in lude  Classes to des ribe pro ess arrays and distributed data arrays.  Classes or fun tions to simplify a ess to lo ally held elements of a dis- tributed array (in luding parallel iteration).  Fun tions for olle tive ommuni ation through operations on distributed arrays: regular \ opying" operations in luding shifts and transposes, arithmeti redu tion operations, irregular gather/s atter operations, and so on. 15 Our rst experiments with a Java binding only tou h the surfa e of the full HPF semanti s, but they provide some hints about a general framework. The interfa e given here borrows from the C++ lass library, Adlib, developed by one of us [6℄. A distributed array is parametrized by a member of the Array lass. In C++ Array would naturally be a template for a ontainer lass. In Java, generi ontainer lasses are problemati . Without the template me hanism, the obvious options are that a ontainer holds items of type Obje t, the base lass for all nonprimitive types, or that a separate ontainer lass is provided for ea h allowed type of element. The rst option doesn't allow for array elements of primitive type, and prevents ompile-time type- he king (reminis ent of using void* in C). The se ond approa h presumably involves restri ting elements to the nite set of primitive types (int, oat, . . . )6 . For now we have side-stepped the issue by leaving the data elements out of the Array lass. Array de nes the shape and distribution of an array, but spa e for elements is allo ated in a separately de lared ve tor of the appropriate type7 . The onstru tor for an Array de nes its shape and distribution format. This is expressed through two auxilliary lasses: the Pro s and Range lasses. The Pro s lass orresponds dire tly to the HPF pro essor arrangement. It maps the set of physi al pro esses on whi h the program is exe uting to a multidimensional grid. A Range des ribes a single dimension of an HPF array. It embodies an array extent (the size of the array in the dimension on erned), and a mapping of the subs ript range to a dimension of a Pro s grid. In our pilot implementation any parallel Java appli ation is written as a lass extending the library lass Node. The Node lass maintains some global information and provides various olle tive operations on arrays as member fun tions. The ode for the \main program" goes in the run member of the appli ation lass8 . A simpli ed version of the ode for the \Life" demo is given in gure 6. The obje t p represents a 2 by 2 pro ess grid. The Pro s onstru tor takes the urrent Node obje t as an argument, from whi h it obtains information on the available physi al pro esses. In this simpli ed example we assume that the program exe utes on exa tly four pro essors. The obje ts x and y represent index ranges of size N distributed over the rst and se ond dimensions of the grid p. The default distribution format is blo kwise. Cy li distribution format an also be spe i ed by using a range obje t of lass CRange, whi h is derived from Range (the pilot implementation does not provide any more general distribution or alignment options). 6 Perhaps a good ompromise is to provide one ontainer lass for ea h primitive type and one7 for Obje t. Confusingly enough, this makes our Array more akin to an HPF template than an HPF array. Needless to say, there is no onne tion between C++ templates and HPF templates. 8 This approa h is modelled on the Thread and Applet lasses in the standard Java API. Other approa hes to providing library-wide resour es were illustrated in earlier se tions. 16 publi ... lass Life extends Node implements Runnable { publi void run() { Pro s p = new Pro s(this, 2, 2) ; Range x = new Range(N, p, 0) ; Range y = new Range(N, p, 1) ; Array r = new Array(p, x, y) ; int s = r.seg(); byte[℄ w = new byte[s℄; byte[℄ n_ = new byte[s℄; byte[℄ p_ = new byte[s℄; ... et , reate arrays for 8 neighbours // Initialize the Life board for(r.forall(); r.test(); r.next()) w[r.sub()℄ = fun(r.idx(0), r.idx(1)) ; // Main loop for (int k=0; k<NITER; k++) { // Get neighbours shift( n_, w, r, 0, 1, CYCLIC); shift( p_, w, r, 0, -1, CYCLIC); ... et , opy arrays for 8 neighbours // Life update rule } } } for(r.forall(); r.test(); r.next()) { int i = r.sub() ; swit h ( n_[i℄ + p_[i℄ + _n[i℄ + _p[i℄ + nn[i℄ + np[i℄ + pn[i℄ + pp[i℄) { ase 2 : break; ase 3 : w[i℄ = 1; break; default: w[i℄ = 0; break; } } Figure 6: Simpli ed ode of the Life demo program. 17 The obje t r represents the shape and distribution of a two dimensional array. This \template" is be shared by several distributed arrays|it does not ontain a data ve tor. The data ve tors that hold the lo al segments of arrays are reated separately using the inquiry fun tion seg, whi h returns the number of lo ally held elements. In the example the elements of the main data array are held in w. The extra arrays n , p , ..., nn, ... will be used to hold arrays of neighbour values9 . The \forall loop" initializing w should be read as something like forall(i in range x, j in range y) w(i, j) = fun(i, j) where fun is some fun tion of the global indi es de ning the initial state of the Life board. The members forall, test, next update internal state of r so that r.sub() returns the lo al subs ript for the urrent iteration, and r.idx(0) and r.idx(1) return the global index values for the urrent iteration. We are using r as an iterator lass10 . The main loop uses y li shift operations to obtain neighbours, ommuniating data where ne essary. The shift operation is a member of the Node lass. It should be overloaded to a ept data ve tors of any primitive type|here the array elements are bytes. Finally w is updated in terms of its neighbours. Note some hara teristi features of the data-parallel style of programming:  The distribution format of the arrays an be hanged just by altering a few parameters at the start of the program|the main program is insensitive to these details  low level message-passing is abstra ted into high-level olle tive operations on distributed array stru tures. It may be un lear that this framework has the same power as HPF. If the distributed array model is extended to the full HPF model, as it an be, and the shift operation is augmented by some more powerful olle tive operations in luding gather and s atter operations parametrized by subs ript arrays, we laim that it does. The proof lies in the observation that this is, as advertised, a simpli ed model of the kind of run-time library that various HPF translators target. This is not to say that programming in this style is always as straightforward as the example given here, or as omprehensible as the orresponding HPF program (if it was, there would be no need for HPF). 9 Here we will use whole arrays of neighbours and a shift operation. This is arguably the more onventional approa h in a data-parallel setting, but the the ghost-edge me hanism an also be tted into this framework. 10 Our Array lass is per hed somewhere between STL ontainer and iterator lasses. This is a slightly awkward position, and it may be more satifa tory to separate these fun tions into di erent lasses. 18 4.2 \Array syntax" in Java. The higher level approa h would make Array lasses look like true ontainer lasses (for a restri ted set of types) and all operations on arrays olle tive, something like: ArrayFloat a = new ArrayFloat(p, x, y) ; ArrayFloat b = new ArrayFloat(p, x, y) ; ArrayInt = new ArrayInt(p, x, y) ; a = MATMUL(b, ) ; Communi ation would be handled automati ally inside array operations like MATMUL. Individual array elements would not be a essed in the Java program ex ept, possibly, through getElement, putElement members. This s heme an be implemented on top of an SPMD Java array library of the kind outlined in the previous se tion or by making the Java run as a oordination program ontrolling a parallel ba k end. It an be ompared with [26, 16℄. So far we have not attempted to implement (or spe ify in detail) su h an approa h for Java. The la k of user-de ned operator-overloading may be parti ularly frustrating here. 5 Dis ussion We have explored the pra ti ality of doing parallel omputing in Java, and of providing Java interfa es to High Performan e Computing software. For various reasons, the su ess of this exer ise was not a foregone on lusion. Java sits on a virtual ma hine model that is signi antly di erent to the hardware-oriented model whi h C or Fortran exploit dire tly. Java dis ourages or prevents dire t a ess to the some of the fundamental resour es of the underlying hardware (most extremely, its memory). Our earliest experiments in this dire tion (in luding the work des ribed in se tion 4, whi h predates the MPI work) involved working entirely within Java, building new software on top of the ommuni ation fa ilities of the standard API. The more re ent work in se tions 3.2 and 3.3 involved reating a Java interfa e to an existing HPC pa kage. Whi h is the better strategy? In the long term Java may be ome a major implementation language for large software pa kages like MPI. It ertainly has advantages in respe t of portability that ould simplify implementations dramati ally. In the immediate term re oding these pa kages does not appear so attra tive. Java wrappers to existing software look more sensible. On a autionary note, our experien e with MPI suggests that interfa ing Java to non-trivial ommuni ation pa kages may be less easy than it sounds. Nevertheless, we intend in the future to reate a Java interfa e to an existing run-time library for data parallel omputation. 19 So is Java, as it stands, a good language for High Performan e Computing? It still has to be demonstrated that Java an be ompiled to ode of eÆ ien y omparable with C or Fortran. Many avenues are being followed simultaneously towards a higher performan e Java. Besides the Java hip e ort of Sun, it has been reported at this workshop that IBM is developing an optimizing Java ompiler whi h produ es binary ode dire tly, that Ri e University and Ro hester University are working on optimization and restru turing of byte ode generated by java , and that Indiana University is working on sour e restru turing to parallelize Java. Parallel interpretation of byte ode is also an emerging pra ti e. For example, the IBM JVM, an implementation of JVM on shared memory ar hite tures, was released in spring 1996, and UIUC has re ently started work aimed at parallel interpretation of Java byte ode for distributed memory systems. Another promising approa h under investigation [18℄ is to integrate interpretation and ompilation te hniques for parallel exe ution of Java programs. In su h a system, a partially ordered set of interpretive frames is generated by an II/CVM ompiler. A frame is a des ription of some subtask, whose granularity may range from a single s alar assignment statement to a solver for a system of equations. Under supervision of the virtual ma hine (II/CVM), the a tions spe i ed in a frame may be performed in one of three ways:  Exe uted by an interpretive module dire tly, whi h also in orporates JIT ompilation apability.  Some pre ompiled omputational library fun tion is invoked lo ally to a - omplish the task; this fun tion may be exe uted sequentially or in parallel.  The frame is sent to some registered remote system, whi h will get the work done, on e again either sequentially or in parallel. With this approa h, optimized binary odes for well formed omputation subtasks exist in runtime libraries, supporting a high level interpretive environment. Task parallelism is observed among di erent frames exe uted by the three me hanisms simultaneously, while data parallelism is observed in the exe ution of some of the runtime fun tions. Presuming these e orts satisfa torily address the performan e issue, the se ond aspe t in question on erns expressiveness of the Java language. Our nal interfa e to MPI is quite elegant, and provides mu h of the fun tionality of the standard C and Fortran bindings. But reating this interfa e was a more diÆ ult pro ess than one might hope, both in terms of getting a good spe i ation, and in terms of making the implementation work. In se tion 4 we noted that the la k of features like C++ templates (or any form of parametri polymorphism) and user-de ned operator overloading (available in many modern languages, from fun tional programming languages to Fortran) made it diÆ ult to produ e a 20 ompletely satisfying interfa e to a data parallel library. The Java language as urrently de ned imposes various limits to the reativity of the programmer. In many respe ts Java is undoubtedly a better language than Fortran. It is obje t-oriented to the ore and highly dynami , and there is every reason to suppose that su h features will be as valuable in s ienti omputing as in any other programming dis ipline. But to displa e established s ienti programming languages Java will surely have to a quire some of the fa ilities taken for granted in those languages. Referen es [1℄ A. Agrawal, A. Sussman, and J. Saltz. An integrated runtime and ompiletime approa h for parallelizing stru tured and blo k stru tured appli ations. IEEE Transa tions on Parallel and Distributed Systems, 6, 1995. [2℄ Susan Atlas, Subhankar Banerjee, Julian C. Cummings, Paul J. Hinker, M. Srikant, John V. W. Reynders, and Mary Dell Tholburn. POOMA: A high performan e distributed simulation environment for s ienti appli ations. In Super omputing `95, 1995. [3℄ Aart J.C. Bik and Dennis B. Gannon. Automati ally exploiting impli it parallelism in Java. This workshop. [4℄ J. Boyle, R. Butler, T. Disz, B. Gli kfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Pro essors. Holt, Rinehart and Winston, 1987. [5℄ Ralph Butler and Ewing Lusk. Monitors, messages, and lusters: The p4 parallel programming system. Parallel Computing, 20:547{564, April 1994. [6℄ D. B. Carpenter. Adlib: A distributed array library to support HPF translation, 1995. Presented at the 5th International Workshop on Compilers for Parallel Computers. URL: http://www.npa .syr.edu/users/db /Adlib. [7℄ K.M. Chandy and C. Kesselman. CC++: A de larative on urrent obje toriented programming notation. In Gul Agha, Peter Wegner, and Akinori Yonezawa, editors, Resear h Dire tions in Con urrent Obje t-Oriented Programming, page 24. MIT Press, 1993. ISBN: 0-262-01139-5. [8℄ A. Choudhary, G. Fox, S. Ranka, S. Hiranandani, K. Kennedy, C. Koelbel, and J. Saltz. Software support for irregular and loosely syn hronous problems. Computing Systems in Engineering, 3:43{52, 1992. [9℄ Parallel Compiler Runtime Consortium. Common runtime support for high-performan e parallel languages. In Super omputing `93. IEEE Computer So iety Press, 1993. 21 [10℄ Parallel Compiler Runtime Consortium. HPCC and Java|a report by the Parallel Compiler Runtime Consortium, 1996. URL: http://www.npa .syr.edu/users/g f/hpjava3.html. [11℄ J.J. Dongarra, R. Pozo, and D.W. Walker. An obje t oriented design for high performan e linear algebra on distributed memory ar hite tures. In Obje t Oriented Numeri s Conferen e, 1993. [12℄ Stephen J. Fink and S ott B. Baden. The KeLP User's Guide. University of California, San Diego, La Jolla, CA, Mar h 1996. URL: http://wwwse.u sd.edu/groups/hp l/s g/kelp.html. [13℄ High Performan e Fortran Forum. High Performan e Fortran language spe i ation. S ienti Programming, spe ial issue, 2, 1993. [14℄ Message Passing Interfa e Forum. MPI: A Message-Passing Interfa e Standard. University of Tenessee, Knoxville, TN, June 1995. URL: http://www.m s.anl.gov/mpi. [15℄ I. Foster and K. M. Chandy. Fortran M: A language for modular parallel programming. Journal of Parallel and Distributed Computing, 26(1):24, 1995. [16℄ G.C. Fox and W. Furmanski. Towards interpreted run-time Fortran90D environment. Te hni al report, NPAC, 1992. URL: http://www.npa .syr.edu/proje ts/hpsin/hp .html. [17℄ Geo rey C. Fox and Wojtek Furmanski. Computing on the Web: new approa hes to parallel pro essing|petaop and exaop performan e in the year 2007, 1997. URL: http://www.npa .syr.edu/users/g f/petastu /petaweb/. [18℄ Geo rey C. Fox, Xiaoming Li, Yuhong Wen, and Guansong Zhang. Studies of integration and optimization of interpreted and ompiled languages. Te hni al Report SCCS-780, NPAC, February 1997. [19℄ A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Man hek, and V. Sunderam. PVM: Parallel Virtual Ma hine|A Users' Guide and Tutorial for Networked Parallel Computing. S ienti and Engineering Series. MIT Press, 1994. ISBN: 0-262-57108-0. [20℄ James Gosling and Henry M Gilton. The Java Language Environment: A White Paper. JavaSoft, Sun Mi rosystems, In , 1996. URL: http://www.javasoft. om/do . [21℄ A.S. Grimshaw. An introdu tion to parallel obje t-oriented programming with Mentat. Te hni al Report 91 07, University of Virginia, 1991. 22 [22℄ JavaSoft, Sun Mi rosystems, In . RMI Do umentation, 1996. URL: http://java.sun. om/produ ts/JDK/1.1/. [23℄ C.H. Koelbel, D.B. Loveman, R.S. S hreiber, G.L. Steel, Jr., and M.E. Zosel. The High Performan e Fortran Handbook. MIT Press, 1994. ISBN: 0-262-61094-9. [24℄ Inmos Ltd. o am 2 Referen e Manual. Prenti e-Hall International, 1988. ISBN: 0-13-629312-3. [25℄ MPICH|a portable implementation http://www.m s.anl.gov/mpi/mpi h/. of MPI. URL: [26℄ R. Parsons and D. Quinlan. A++/P++ array lasses for ar hite ture independent nite di eren e al ulations. In Obje t Oriented Numeri s Conferen e, 1994. [27℄ Jon Siegel. CORBA Fundamentals and Programming. Wiley, 1996. ISBN: 0471-12148-7. [28℄ P. Sivilotti and P. Carlin. A tutorial for CC++. Te hni al Report CS-TR94-02, Calte h, 1994. 23
About the author

Over forty years experience in research and systems engineering for air and missile defense, and intelligence. Presently developing methods for tracking very-low observable airborne targets and hypersonic cruise missiles, also passive angle-only ranging and tracking. Co-authored book on Kalman filtering, and several research papers – topics include computational electromagnetics, radar tracking and data fusion, and high-performance computing (see scholar.google.com/citations?hl=en&user=iNkpinMAAAAJ). Patent on “Single-Scan Track Initiation for Radars Having Rotating Electronically-Scanned Antennas” (see patents.google.com/patent/US7508336). Recipient of NASA's Group Achievement Award for contributions to simulation and modeling technologies in high-performance computing (hardware, operating systems, and application software). IEEE member since 1974, also member of the Association of Old Crows and the American Mathematical Society. Founder/Co-founder of several successful small businesses: Applied Research and Engineering, Inc., The Ultra Corporation, and Leskiw Associates, LLC. And founding President of the Lexington Sinfonietta with Hisao Watanabe (now the Lexington Symphony), in Lexington, MA.

Papers
21
View all papers from Donald Leskiwarrow_forward