Experiments with ‘HP Java’
1997, Concurrency: Practice and Experience
https://doi.org/10.1002/(SICI)1096-9128(199706)9:6<633::AID-CPE311>3.3.CO;2-Cvisibility
…
description
24 pages
Sign up for access to the world's latest research
checkGet notified about relevant papers
checkSave papers to use in your research
checkJoin the discussion with peers
checkTrack your impact
Related papers
Multilanguage programming on the JVM
ACM SIGAda Ada Letters, 2000
The latest trend in our industry, "pervasive computing", predicts the proliferation of numerous, often invisible, computing devices embedded in consumer appliances connected to the ubiquitous Internet. Secure, reliable applications combined with simplicity of use will make or break a company's reputation in this market.
Syracuse University
SURFACE
Northeast Parallel Architecture Center
College of Engineering and Computer Science
1997
Experiments with "HP Java"
Bryan Carpenter
Syracuse University
Yuh-Jye Chang
Syracuse University
Geoffrey C. Fox
Syracuse University
Donald Leskiw
Syracuse University
Follow this and additional works at: https://surface.syr.edu/npac
Part of the Programming Languages and Compilers Commons
Recommended Citation
Carpenter, Bryan; Chang, Yuh-Jye; Fox, Geoffrey C.; and Leskiw, Donald, "Experiments with "HP Java""
(1997). Northeast Parallel Architecture Center. 32.
https://surface.syr.edu/npac/32
This Working Paper is brought to you for free and open access by the College of Engineering and Computer
Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized
administrator of SURFACE. For more information, please contact [email protected].
Experiments with \HPJava"
Bryan Carpenter, Yuh-Jye Chang,
Geo rey Fox, Donald Leskiw, Xiaoming Li,
Northeast Parallel Ar hite tures Centre,
Syra use University,
Syra use, New York
May 19, 2000
Abstra t
We
onsider the possible r
ole of Java as a language for High Perfor-
man e Computing. After dis ussing reasons why Java may be a natural
andidate for a portable parallel programming language, we des ribe several
ase studies. These
over Java so ket programming, message-passing
through a Java interfa e to MPI, and
gramming in Java.
1
lass libraries for data-parallel pro-
Contents
1
Introdu tion
3
2
Issues
4
3
Message-passing
3.1 Java so kets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 MPI Interfa e . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Derived data types and higher-level MPI features . . . . . . . . .
6
10
12
4
Data parallellism in Java
14
5
Dis ussion
19
1.1 Overview of this arti le. . . . . . . . . . . . . . . . . . . . . . . .
2.1 Approa hes to Parallelism in Java . . . . . . . . . . . . . . . . .
2.2 Communi ation in Java . . . . . . . . . . . . . . . . . . . . . . .
ase studies
4.1 Parallel arrays and olle tive ommuni ation . . . . . . . . . . .
4.2 \Array syntax" in Java. . . . . . . . . . . . . . . . . . . . . . . .
2
4
4
5
6
15
19
1
Introdu tion
The explosion of interest in Java over the last year has been driven largely by its
role in bringing a new generation of intera tive pages to the World Wide Web.
Undoubtedly various features of the language| ompa tness, byte- ode portability, se urity, and so on|make it parti ularly attra tive as an implementation
language for applets embedded in Web pages. But it is lear that the ambitions
of the Java development team go well beyond enhan ing the fun tionality of
HTML do uments. A ording to [20℄
\Java is designed to meet the hallenges of appli ation development
in the ontext of heterogeneous, network-wide distributed environments. Paramount among these hallenges is se ure delivery of appli ations that onsume the minimum of system resour es, an run
on any hardware and software platform, and an be extended dynami ally."
Several of these on erns are mirrored in developments in the High Performan e Computing world over a number of years. A de ade ago the fo us of
interest in the parallel omputing ommunity was on parallel hardware. A parallel omputer was typi ally built from spe ialized pro essors integrated through
a proprietary high-performan e ommuni ation swit h. If the ma hine also had
to be programmed in a proprietary language, that was an a eptable pri e for
the bene ts of using a super omputer. This attitude was not sustainable as one
parallel ar hite ture gave way to another, and the ost of porting software beame exorbitant. For several years now, portability a ross platforms had been
a entral on ern in parallel omputing [4, 5, 14, 13, 23℄.
More fundamentally, the assumption that high performan e omputing will
be done primarily on spe ialized super omputers is questioned in reasingly.
Rapid strides in performan e and onne tivity of ordinary workstations and
PCs make it look equally possible that the future of parallel omputing will be
on lo al area networks, or even the Internet [19, 17℄.
With Java positioned to be ome a standard programming language on the
Internet, and s ienti parallel pro essing edging towards network-based omputation, it is natural to ask how these two te hnologies will intera t. How
suitable is Java for s ienti
omputing, and do lessons from resear h in parallel
omputing have impli ations for the development of Java?
Popular a laim aside, there are some reasons to think that Java may be a
good language for s ienti and parallel programming.
Java is a des endant of C++. C and C++ are used in reasingly in s i-
enti
programming; they are already used almost universally by impleompilers. In re ent years numerous variations on the theme of C++ for parallel omputing have appeared. See,
for example [7, 28, 11, 2, 12, 21℄.
menters of parallel libraries and
3
Java omits various features of C and C++ that are onsidered \diÆ ult"|
notably, pointers. Poor ompiler analysis has often been blamed on these
features. The inferen e is that Java, like Fortran, may be a suitable sour e
language for highly optimizing ompilers (although dire t eviden e for this
belief is still la king).
Java omes with builtin multithreading. Independent threads may be
s heduled on di erent pro essors by a suitable runtime. In any ase multithreading an be very onvenient in expli it message-passing styles of
parallel programming [24℄.
We will return to the question of whether parallel omputing may have impliations for the development of Java in se tion 5.
The a ronym \HPJava" was oined in a draft white paper produ ed by
members of the PCRC onsortium in the rst half of 1996 [10℄. At NPAC we
have been experimenting with some of the ideas put forward there.
1.1
Overview of this arti le.
Se tion 2 outlines various options for parallel programming in Java|possible
ways to express parallelism, and ways to handle inter-pro ess ommuni ation.
The main te hni al ontent of the paper is in se tions 3 and 4. Se tion
3 ontains some ase studies in whi h we explore the message-passing style of
programming in Java. We over parallel programming using so kets dire tly,
and des ribe our Java interfa e to MPI. In se tion 4 we dis uss approa hes to
data-parallel programming in Java, and outline one of our demo programs.
In this arti le our emphasis is more on language bindings and interfa e issues,
and less on performan e. Java ompilers are in an early stage of development,
and we assume that urrent performan e gures are not indi ative of future
potential.
2
2.1
Issues
Approa hes to Parallelism in Java
Java already supports on urren y through the thread me hanism and monitor
syn hronization built into the language. In this arti le we are interested in
truly parallel omputation, involving multiple CPUs. Su h parallelism ould be
introdu ed into Java in a number of ways.
It ould be a hieved through automati parallelization of sequential ode, but
it is un lear why this would be more su essful for Java than for other languages.
Alternatively, the Java virtual ma hine for a shared memory multipro essor an
s hedule the threads of a multi-threaded Java program on di erent pro essors.
Some su ess with these approa hes has already been demonstrated [3℄. For
4
omputation on a network (or distributed memory omputer) realisti options
in lude language extensions or dire tives akin to HPF, or provision of libraries|
lass libraries|to support task parallelism or data parallelism.
A popular approa h in C++ has been to defer language extensions and
on entrate on lass library support for parallel programming. The similarities
between the two languages suggest this may be a fruitful avenue in Java too.
The su ess of this analogy is by no means automati , however. Features su h
as templates and user-de ned operator overloading make the C++ language
inherently more ustomizable than Java. In C++ library-de ned types an be
used on an identi al footing to primitive types|inline methods mean they an
be almost as eÆ ient as primitive types. Less importantly, but onveniently, new
ontrol onstru ts an often be simulated in C or C++ through use of ma ros1.
On grounds, presumably, of simpli ity and transparen y many of these features
have been omitted from Java.
Su h aveats notwithstanding, this arti le will on entrate on lass libraries
rather than language extensions. We will be working with lass libraries implemented in the standard Java development environment.
Another open question is how multiple intera ting Java tasks are to be initiated. Conventionally in network omputing, instru tions to exe ute a parti ular
user task are sent to a spe ialized daemon or se ure server running on the target
host. Alternatively standard operating system daemons (rshd, rexe d, . . . ) an
be used to the same e e t. Java brings some options of its own. One is to use
standard or enhan ed Web servers to exe ute Java byte- odes. This approa h
is predi ated on the existen e of ooperative servers, probably running suitable
CGI s ripts. Another option is to exploit Java-enabled browsers. Applets downloaded from a parti ular server an perform omputations and return the results
to that server. In any ase the substantive improvement over onventional network omputing is that the byte- odes an be downloaded on the y to target
hosts, even in heterogenous networks. Conventionally, it was often ne essary to
install and ompile an appli ation on ea h target host before starting the task
remotely.
2.2
Communi ation in Java
The standard Java API provides a simpli ed interfa e to Internet so kets. This
interfa e hides mu h of the ugly detail involved in so ket-programming at the
at the traditional C/UNIX level. The java.net interfa e provides less exibility
than using the system alls dire tly. On the other hand, Java's built-in support
for threads adds some exibility in s heduling ommuni ations that is missing
from raw C.
We will give an example of so ket programming in se tion 3.1, but traditionally this has not been a popular paradigm in the parallel pro essing world.
1 Of
ourse we
ould run a prepro essor over Java, but this is not a natural pro edure in
existing Java environments
5
S ienti programmers have expe ted to program inter-pro ess ommuni ation
at a higher level, if at all. More su essful s hemes in lude
Message-passing through language-level support [24, 15℄ or higher-level
library interfa es [14℄.
Data parallelism. We restri t the de nition of \data parallelism" to over
algorithms that a hieve parallelism through operations on distributed arrays. Syn hronization is usually limited to bulk syn hronization o uring
naturally through olle tive operations.
Communi ation through shared memory or shared obje ts. Some more
or less intri ate me hanism for inter-pro ess syn hronization is usually
provided.
The ase studies in the rest of this arti le restri t themselves to message-passing
and data parallelism. As observed in the previous se tion, ommuni ation
through true shared memory is already impli it in the Java thread model. Communi ation through remote obje ts is undoubtedly a natural and important
paradigm in Java, espe ially for a ess to remote servi es [27, 22, 21℄, but we
will not dis uss it further here.
An orthogonal issue is whether the high-level libraries used to implement
these paradigms (presuming lass-library implementations) should be written
in Java on top of the standard API, or whether they should be implemented as
dire t interfa es to native ode. We will have more to say about this later.
3
Message-passing
ase studies
Message-passing remains one of the most e e tive and widely used ommuniation paradigms in parallel omputing. In this se tion we ompare two approa hes to message-passing in Java, in the ontext of a s ienti appli ation.
The rst approa h is to use the so ket interfa e in the standard Java API. The
se ond is to work through a Java interfa e to the message-passing standard,
MPI [14℄.
To minimize distra ting details, our appli ation will be elementary: Conway's Life automaton.
3.1
Java so kets
The UNIX so ket model is most suitable for programming lient-server appliations. Typi al s ienti parallel programs do not t dire tly into this model.
Before a SPMD program an start two onditions must obtain: a pool of symmetri peer pro esses must have been reated, and ea h peer must be able to
address a message to any other. This situation is typi ally bootstrapped as follows. The program is invoked on one host. This host reates a listening so ket.
6
It sends instru tions to servers on some other hosts (either expli itly or through
a ommand su h as rsh) to start remote invo ations of the program (somehow
sending the number of the port on whi h it is listening in the initiation message).
Ea h new pro ess reates a listening so ket and sends its port number to the
original pro ess. The original pro ess broad asts these address to all its slaves.
At this point ea h pro ess knows the port number on whi h ea h of its peers is
listening. Either they establish all-to-all onne tions now, before entering the
SPMD main program, or the main program starts immediately, and onne tions
are established dynami ally when a message needs to be sent.
Figures 1, 2 give a s hemati outline of a distributed Life program using
java.net. The fairly intri ate ode sket hed above for initialization and establishment of so ket onne tions has been absorbed into the de nition of an auxilliary
lass hpj. Its interfa e is
lass hpj {
hpj(String[℄ args) ;
int my_id() ;
int num_pro essor() ;
}
DataInputStream Input(int id) ;
DataOutputStream Output(int id) ;
...
The members Input and Output return streams asso iated with so kets onne ted to peer pro esses. In the example an N by N Life board is divided
blo kwise in one dimension, ea h pro essor holding a lo al blo k of width blo kSize.
We note
Initialization is a omplex pro edure and learly it should not be oded
anew for ea h appli ation program.
In this example the messages were ontiguous byte ve tors that ould be
transmitted eÆ iently through the read and write methods of the Java
so ket API. In general the messages will have more omplex types and the
data may not be ontiguous in memory. Using the typed primitives of the
standard API may then in ur extra osts of opying and type- onversion.
For reasons su h as these we suspe t that dire t so ket programming will remain
unattra tive to s ienti parallel programmers, even with the simpli ed Java
so ket API.
7
lass life_java {
stati publi void main(String[℄ args) throws Ex eption {
hpj HPJava = new hpj(args);
int np = HPJava.num_pro essor();
int id = HPJava.my_id();
...
ompute lo al `blo kSize', `blo kBase' (avoiding empty blo ks).
// `blo k' has `blo kSize + 2'
olumns.
This allows for ghost
ells.
byte blo k[℄[℄ = new byte[blo kSize+2℄[N℄;
... initialize lo al blo k with some pattern
// Main update loop.
int next = (id + 1) % np;
int prev = (id + np - 1) % np;
for(int iter = 0 ; iter < NITER ; iter++) {
// Shift this blo k's upper edge into next neighbour's lower ghost edge
HPJava.Output(next).write(blo k[blo kSize℄);
HPJava.Output(next).flush();
HPJava.Input(prev).read(blo k[0℄);
// Shift this blo k's lower edge into prev neighbour's upper ghost edge
HPJava.Output(prev).write(blo k[1℄);
HPJava.Output(prev).flush();
HPJava.Input(next).read(blo k[blo kSize+1℄);
}
}
}
... Cal ulate a blo k of neighbour sums.
... Update blo k of board values.
Figure 1: Skeleton of so ket-based Life program.
8
int sums[℄[℄ = new int[blo kSize℄[N℄;
....
//
Cal ulate blo k of neighbour sums.
for(i = 0, ib = 1 ; i < blo kSize ; i++, ib++)
for(y = 0 ; y < N ; y++) {
int y_n = (y+N-1) % N ;
int y_p = (y+1) % N ;
sums[i℄[y℄ =
blo k[ib - 1℄[y_n℄ + blo k[ib - 1℄[y℄ + blo k[ib - 1℄[y_p℄ +
blo k[ib℄
[y_n℄ +
blo k[ib℄
[y_p℄ +
blo k[ib + 1℄[y_n℄ + blo k[ib + 1℄[y℄ + blo k[ib + 1℄[y_p℄;
}
// Update blo k of board values.
for(i = 0, ib = 1 ; i < blo kSize ; i++, ib++)
for(y = 0 ; y < N ; y++)
swit h (sums[i℄[y℄) {
ase 2 : break;
ase 3 : blo k[ib℄[y℄ = 1; break;
default: blo k[ib℄[y℄ = 0; break;
}
Figure 2: So ket-based Life program (detail).
9
3.2
MPI Interfa e
We have produ ed a Java interfa e to an existing MPI implementation [25℄ using
Java native methods2
The interfa e has been tested on a luster of UltraSpar workstations running
Solaris3 . Our interfa e is modelled on the proposed C++ bindings of MPI. For
example, many of the most basi fun tions of the library are members of the
ommuni ator lass, Comm:
publi
lass Comm {
publi int Size();
publi int Rank();
void Send(Obje t buf, int offset, int ount,
Datatype datatype, int dest, int tag) ;
Status Re v(Obje t buf, int offset, int ount,
Datatype datatype, int dest, int tag) ;
...
}
Figure 3 is a straightforward trans ription of the so ket-based program in
the last se tion4 . Our MPI intefa e uses a slightly di erent model for a essing
global resour es|stati members on an MPI lass:
lass MPI {
stati Init() ;
stati Finalize() ;
}
publi
stati
Comm WORLD ;
publi
publi
...
stati
stati
Datatype BYTE ;
Datatype INT ;
rather than dynami members of a jpi lass|but this di eren e is not parti ularly signi ant (yet another approa h will be taken in the example of se tion
4.1). Otherwise the orresponden e between this ode and the so ket ode is
2 For
earlier work on interfa ing Java to PVM see http://www.isye.gate h.edu/ hmsr/JavaPVM/
and http://www. s.virginia.edu/~ajfzj/jpvm.html
3 Interfa ing Java to MPICH/P4 was not straightforward, due to unpleasant intera tions
between the Java run-time and the underlying P4 implementation. For example, standard
implementations of both use the UNIX SIGALRM signal. We found it ne essary to pat h
the MPICH 1.0.13 release to work round these in ompatibilities. The ne essary pat hes are
available from us on request.
4 MPI aÆ ionados will note that the use of standard mode send is \unsafe", and ould
deadlo k if the system does not provide enough bu ering. The same remark ould be made
about the so ket-based version
10
lass Life {
stati publi void main(String[℄ args) {
MPI.Init(args);
int np = MPI.WORLD.Size();
int id = MPI.WORLD.Rank();
...
ompute lo al `blo kSize', `blo kBase' (avoiding empty blo ks).
// `blo k' has `blo kSize + 2'
olumns.
This allows for ghost
ells.
byte blo k[℄[℄ = new byte[blo kSize + 2℄[N℄ ;
... initialize lo al blo k with some pattern
// Main update loop.
int next = (id + 1) % np;
int prev = (id + np - 1) % np;
for(int iter = 0 ; iter < NITER ; iter++) {
// Shift this blo k's upper edge into next neighbour's lower ghost edge
MPI.WORLD.Send(blo k[blo kSize℄, N, MPI.BYTE, next, 0);
MPI.WORLD.Re v(blo k[0℄,
N, MPI.BYTE, prev, 0);
// Shift this blo k's lower edge into prev neighbour's upper ghost edge
MPI.WORLD.Send(blo k[1℄,
N, MPI.BYTE, prev, 0);
MPI.WORLD.Re v(blo k[blo kSize + 1℄, N, MPI.BYTE, next, 0);
}
}
}
... Cal ulate blo k of neighbour sums.
... Update blo k of board values.
MPI.Finalize();
Figure 3: Simple MPI Life program.
11
dire t. In the next se tion we illustrate some of the added value that an MPI
interfa e brings.
We have broken with the usual MPI onvention of returning an error status
from every fun tion. This pra ti e is in onvenient in Java be ause arguments
annot be passed by referen e and dire tly modi ed. This makes the return
value pre ious, and using it up on an error value that everybody ignores is
a waste. Java has a well integrated ex eption me hanism for ignoring error
information.
3.3
Derived data types and higher-level MPI features
Des ription of the data bu ers passed to ommuni ation operations presents
some spe ial problems in Java. Existing MPI bindings depend on a linear memory model and expli it or impli it use of pointers. Java does not have a linear
memory model. Even behind the s enes a Java array has no uniquely de ned
address in memory, be ause the garbage olle tor is allowed to relo ate obje ts unpredi atably during program exe ution to avoid fragmentation of its
workspa e. Our Java interfa e tries to retain as mu h of the MPI derived datatype me hanism as pra ti al, but some fun tionality has been sa ri ed. The
bu er argument passed to a send or re eive operation must be a one-dimensional
array of primitive type. Any o set spe i ed in a derived type argument then
refers to a displa ement within this one-dimensional array, never a displa ement
in memory.
All MPI derived types expressible through our interfa e have a uniquely
de ned base type|a Java primitive type. Interfa es to MPI TYPE HVECTOR
and MPI TYPE HINDEXED are provided, but the strides and displa ements
are in units of the base type, not bytes. An interfa e to MPI TYPE STRUCT
is provided, but all omponent types in the \stru t" must have the same base
type.
In the on rete Java binding of the send fun tion, for example,
void Send(Obje t buf, int offset, int ount,
Datatype datatype, int dest, int tag) ;
the formal buf argument is presented as a generi Java Obje t. As explained
above, the a tual argument must be a linear array. The se ond argument is
the o set in this array of the rst element of the message5 . The remaining
arguments orrespond dire tly to arguments of MPI Send. The base type of the
datatype argument must be the type of the elements of buf.
Figures 4, 5 sket h a version of the Life program illustrating several of these
features. As well as derived types, this program uses the Cartesian topologies
of MPI. The Cart lass is derived from Comm. In the example, the topology
5 This o set is in units of the buf array element|or the base type of datatype|not of
any ompound type. The Obje t + o set presentation is reminis ent of the interfa e of the
arrayCopy utility in the standard Java API.
12
lass Life {
void main(String args) {
MPI.Init(args) ;
int dims [℄ = new int [2℄ ;
... Set `dims', et
Cart p = new Cart(MPI.WORLD, dims, periods, false) ;
int oords = new int [2℄ ;
p.Get(dims, periods, oords) ;
... Compute lo al `blo kSizeX', `blo kBaseX', `blo kSizeY', `blo kBaseY'.
// Create `blo k', allowing for ghost
ells.
int sX = blo kSizeX + 2 ;
int sY = blo kSizeY + 2 ;
blo k = new byte [sX * sY℄ ;
... Define initial state of Life board
// Pre ompute parameters of shift
ommuni ations.
Datatype edgeXType = MPI.BYTE.Contiguous(sY) ;
edgeXType.Commit() ;
Datatype edgeYType = MPI.BYTE.Ve tor(sX, 1, sY) ;
edgeYType.Commit() ;
CartShift pX = p.shift(0, 1) ;
CartShift nX = p.shift(0, -1) ;
CartShift pY = p.shift(1, 1) ;
CartShift nY = p.shift(1, -1) ;
// Main update loop.
for(int iter = 0 ; iter < NITER ; iter++) {
... Exe ute shifts.
}
}
... Cal ulate blo k of neighbour sums.
... Update blo k of board values.
MPI.Finalize();
}
...
Figure 4: Life program using full MPI.
13
// Exe ute shifts...
// Shift this blo k's upper x edge into next neighbour's lower ghost edge
p.Sendre v(blo k, blo kSizeX * sY, 1, edgeXType, pX.dst, 0,
blo k, 0,
1, edgeXType, pX.sr , 0) ;
// Shift this blo k's lower x edge into prev neighbour's upper ghost edge
p.Sendre v(blo k, sY,
1, edgeXType, nX.dst, 0,
blo k, (blo kSizeX + 1) * sY, 1, edgeXType, nX.sr , 0) ;
// Shift this blo k's upper y edge into next neighbour's lower ghost edge
p.Sendre v(blo k, blo kSizeY, 1, edgeYType, pY.dst, 0,
blo k, 0,
1, edgeYType, pY.sr , 0) ;
// Shift this blo k's lower y edge into prev neighbour's upper ghost edge
p.Sendre v(blo k, 1,
1, edgeYType, nY.dst, 0,
blo k, blo kSizeY + 1, 1, edgeYType, nY.sr , 0) ;
Figure 5: Full MPI Life program (detail).
p represents a two dimensional periodi grid of pro esses. The Get member
returns the
oordinates of the lo al pro ess. From these the parameters of the
lo al array blo k are omputed.
The values
sX, sY represent the sides of the lo ally held array segment,
in luding ghost regions.
This segment is
reated as a one-dimensional Java
blo k. The derived type edgeXType des ribes the stru ture of ghost area
on the upper or lower x sides: ontiguous regions of the blo k array of extent sY.
The type edgeYType des ribes the y-side ghost areas: non- ontiguous regions of
ount sX, regular stride sY.
The shift member of Cart orresponds to the MPI fun tion MPI CART SHIFT:
array,
it returns the sour e and destination pro essors for a
binding returns these values in an obje t of
lass
y li
shift.
The Java
CartShift whi h just on-
tains two integers. Finally, in the main loop, the shifts are exe uted by using
the
Comm member Sendre v, whi h orresponds to the standard MPI fun tion
MPI Sendre v. This performs a send and a re eive
on urrently (avoiding a
potential deadlo k in the implementations given in the previous se tions).
4
Data parallellism in Java
The most omprehensive statement of the data parallel model of omputation is
the High Performan e Fortran standard [13, 23℄. That do ument is supposed to
14
embody mu h of the olle tive experien e of the s ienti parallel programming
ommunity. Presumably, then, any attempt to in orporate data parallelism into
Java should build on the HPF model wherever possible.
The HPF de nition onsists of a large set of dire tives that an be used to
annotate a standard Fortran program, a small handful of language extensions,
and a library of new fun tions for operating on arrays. An initial data-parallel
Java may well be implemented through a lass-library. This library would assume the roles of the dire tives and language extensions in HPF as well as the
HPF library.
We will loosely distinguish two di erent levels at whi h a library implementation of the HPF semanti s an operate.
The rst is the level of the so- alled run-time libraries [1, 8, 9, 6℄. This kind
of library provides fun tions for s heduling and exe uting spe i patterns
of olle tive ommuni ation already identi ed by a ompiler (in the HPF
ase) or else by an appli ation programmer using the library dire tly.
Su h a library may also provide fun tions for translating between global
subs ripts and lo al, node-level subs ripts|ie, for omputing the mapping
of a distributed array into the address spa es of individual pro essors.
Alternatively, a library an operate at a higher level that on eals all
aspe ts of data lo alization and transfer from the user. The only responsibility of the user is to spe ify the distribution format of arrays when they
are de lared. Subsequently the user just tells the library to do parti ular
operations on parti ular distributed arrays. It is left to the library to work
out whether or not a ommuni ation is implied. In e e t the library is
operating at the same level as the HPF language. An example of su h a
library is A++/P++ [26℄.
In either ase a lass library version is likely to in lude lasses to des ribe the
elements of the HPF data model, su h as pro essor arrangements and the distributed arrays themselves.
4.1
Parallel arrays and
olle tive
ommuni ation
At the run-time level, a lass library implementation of the HPF model is likely
to in lude
Classes to des ribe pro ess arrays and distributed data arrays.
Classes or fun tions to simplify a ess to lo ally held elements of a dis-
tributed array (in luding parallel iteration).
Fun tions for olle tive ommuni ation through operations on distributed
arrays: regular \ opying" operations in luding shifts and transposes, arithmeti redu tion operations, irregular gather/s atter operations, and so on.
15
Our rst experiments with a Java binding only tou h the surfa e of the full
HPF semanti s, but they provide some hints about a general framework. The
interfa e given here borrows from the C++ lass library, Adlib, developed by
one of us [6℄.
A distributed array is parametrized by a member of the Array lass. In C++
Array would naturally be a template for a ontainer lass. In Java, generi ontainer lasses are problemati . Without the template me hanism, the obvious
options are that a ontainer holds items of type Obje t, the base lass for all nonprimitive types, or that a separate ontainer lass is provided for ea h allowed
type of element. The rst option doesn't allow for array elements of primitive
type, and prevents ompile-time type- he king (reminis ent of using void* in
C). The se ond approa h presumably involves restri ting elements to the nite
set of primitive types (int, oat, . . . )6 . For now we have side-stepped the issue
by leaving the data elements out of the Array lass. Array de nes the shape
and distribution of an array, but spa e for elements is allo ated in a separately
de lared ve tor of the appropriate type7 .
The onstru tor for an Array de nes its shape and distribution format. This
is expressed through two auxilliary lasses: the Pro s and Range lasses. The
Pro s lass orresponds dire tly to the HPF pro essor arrangement. It maps
the set of physi al pro esses on whi h the program is exe uting to a multidimensional grid. A Range des ribes a single dimension of an HPF array. It
embodies an array extent (the size of the array in the dimension on erned),
and a mapping of the subs ript range to a dimension of a Pro s grid.
In our pilot implementation any parallel Java appli ation is written as a
lass extending the library lass Node. The Node lass maintains some global
information and provides various olle tive operations on arrays as member
fun tions. The ode for the \main program" goes in the run member of the
appli ation lass8 .
A simpli ed version of the ode for the \Life" demo is given in gure 6.
The obje t p represents a 2 by 2 pro ess grid. The Pro s onstru tor takes
the urrent Node obje t as an argument, from whi h it obtains information on
the available physi al pro esses. In this simpli ed example we assume that the
program exe utes on exa tly four pro essors.
The obje ts x and y represent index ranges of size N distributed over the
rst and se ond dimensions of the grid p. The default distribution format is
blo kwise. Cy li distribution format an also be spe i ed by using a range
obje t of lass CRange, whi h is derived from Range (the pilot implementation
does not provide any more general distribution or alignment options).
6 Perhaps a good ompromise is to provide one ontainer lass for ea h primitive type and
one7 for Obje t.
Confusingly enough, this makes our Array more akin to an HPF template than an HPF
array. Needless to say, there is no onne tion between C++ templates and HPF templates.
8 This approa h is modelled on the Thread and Applet lasses in the standard Java API.
Other approa hes to providing library-wide resour es were illustrated in earlier se tions.
16
publi
...
lass Life extends Node implements Runnable {
publi void run() {
Pro s p = new Pro s(this, 2, 2) ;
Range x = new Range(N, p, 0) ;
Range y = new Range(N, p, 1) ;
Array r = new Array(p, x, y) ;
int s = r.seg();
byte[℄ w = new byte[s℄;
byte[℄ n_ = new byte[s℄;
byte[℄ p_ = new byte[s℄;
... et , reate arrays for 8 neighbours
// Initialize the Life board
for(r.forall(); r.test(); r.next())
w[r.sub()℄ = fun(r.idx(0), r.idx(1)) ;
// Main loop
for (int k=0; k<NITER; k++) {
// Get neighbours
shift( n_, w, r, 0, 1, CYCLIC);
shift( p_, w, r, 0, -1, CYCLIC);
... et , opy arrays for 8 neighbours
// Life update rule
}
}
}
for(r.forall(); r.test(); r.next()) {
int i = r.sub() ;
swit h ( n_[i℄ + p_[i℄ + _n[i℄ + _p[i℄ +
nn[i℄ + np[i℄ + pn[i℄ + pp[i℄) {
ase 2 : break;
ase 3 : w[i℄ = 1; break;
default: w[i℄ = 0; break;
}
}
Figure 6: Simpli ed ode of the Life demo program.
17
The obje t r represents the shape and distribution of a two dimensional
array. This \template" is be shared by several distributed arrays|it does not
ontain a data ve tor. The data ve tors that hold the lo al segments of arrays
are reated separately using the inquiry fun tion seg, whi h returns the number
of lo ally held elements. In the example the elements of the main data array
are held in w. The extra arrays n , p , ..., nn, ... will be used to hold arrays
of neighbour values9 .
The \forall loop" initializing w should be read as something like
forall(i in range x, j in range y)
w(i, j) = fun(i, j)
where fun is some fun tion of the global indi es de ning the initial state of the
Life board. The members forall, test, next update internal state of r so that
r.sub() returns the lo al subs ript for the urrent iteration, and r.idx(0) and
r.idx(1) return the global index values for the urrent iteration. We are using r
as an iterator lass10 .
The main loop uses y li shift operations to obtain neighbours, ommuniating data where ne essary. The shift operation is a member of the Node lass.
It should be overloaded to a ept data ve tors of any primitive type|here the
array elements are bytes.
Finally w is updated in terms of its neighbours.
Note some hara teristi features of the data-parallel style of programming:
The distribution format of the arrays an be hanged just by altering a few
parameters at the start of the program|the main program is insensitive
to these details
low level message-passing is abstra ted into high-level olle tive operations
on distributed array stru tures.
It may be un lear that this framework has the same power as HPF. If the
distributed array model is extended to the full HPF model, as it an be, and
the shift operation is augmented by some more powerful olle tive operations
in luding gather and s atter operations parametrized by subs ript arrays, we
laim that it does. The proof lies in the observation that this is, as advertised,
a simpli ed model of the kind of run-time library that various HPF translators
target. This is not to say that programming in this style is always as straightforward as the example given here, or as omprehensible as the orresponding
HPF program (if it was, there would be no need for HPF).
9 Here we will use whole arrays of neighbours and a shift operation. This is arguably the
more onventional approa h in a data-parallel setting, but the the ghost-edge me hanism an
also be tted into this framework.
10 Our Array lass is per hed somewhere between STL ontainer and iterator lasses. This
is a slightly awkward position, and it may be more satifa tory to separate these fun tions into
di erent lasses.
18
4.2 \Array syntax" in Java.
The higher level approa h would make Array lasses look like true ontainer
lasses (for a restri ted set of types) and all operations on arrays olle tive,
something like:
ArrayFloat a = new ArrayFloat(p, x, y) ;
ArrayFloat b = new ArrayFloat(p, x, y) ;
ArrayInt
= new ArrayInt(p, x, y) ;
a = MATMUL(b,
) ;
Communi ation would be handled automati ally inside array operations like
MATMUL. Individual array elements would not be a essed in the Java program
ex ept, possibly, through getElement, putElement members.
This s heme an be implemented on top of an SPMD Java array library
of the kind outlined in the previous se tion or by making the Java run as a
oordination program ontrolling a parallel ba k end. It an be ompared with
[26, 16℄.
So far we have not attempted to implement (or spe ify in detail) su h an
approa h for Java. The la k of user-de ned operator-overloading may be parti ularly frustrating here.
5
Dis ussion
We have explored the pra ti ality of doing parallel omputing in Java, and of
providing Java interfa es to High Performan e Computing software. For various
reasons, the su ess of this exer ise was not a foregone on lusion. Java sits on
a virtual ma hine model that is signi antly di erent to the hardware-oriented
model whi h C or Fortran exploit dire tly. Java dis ourages or prevents dire t
a ess to the some of the fundamental resour es of the underlying hardware
(most extremely, its memory).
Our earliest experiments in this dire tion (in luding the work des ribed in
se tion 4, whi h predates the MPI work) involved working entirely within Java,
building new software on top of the ommuni ation fa ilities of the standard
API. The more re ent work in se tions 3.2 and 3.3 involved reating a Java
interfa e to an existing HPC pa kage. Whi h is the better strategy? In the long
term Java may be ome a major implementation language for large software
pa kages like MPI. It ertainly has advantages in respe t of portability that
ould simplify implementations dramati ally. In the immediate term re oding
these pa kages does not appear so attra tive. Java wrappers to existing software
look more sensible. On a autionary note, our experien e with MPI suggests
that interfa ing Java to non-trivial ommuni ation pa kages may be less easy
than it sounds. Nevertheless, we intend in the future to reate a Java interfa e
to an existing run-time library for data parallel omputation.
19
So is Java, as it stands, a good language for High Performan e Computing?
It still has to be demonstrated that Java an be ompiled to ode of eÆ ien y
omparable with C or Fortran. Many avenues are being followed simultaneously
towards a higher performan e Java. Besides the Java hip e ort of Sun, it has
been reported at this workshop that IBM is developing an optimizing Java ompiler whi h produ es binary ode dire tly, that Ri e University and Ro hester
University are working on optimization and restru turing of byte ode generated
by java , and that Indiana University is working on sour e restru turing to parallelize Java. Parallel interpretation of byte ode is also an emerging pra ti e.
For example, the IBM JVM, an implementation of JVM on shared memory
ar hite tures, was released in spring 1996, and UIUC has re ently started work
aimed at parallel interpretation of Java byte ode for distributed memory systems.
Another promising approa h under investigation [18℄ is to integrate interpretation and ompilation te hniques for parallel exe ution of Java programs. In
su h a system, a partially ordered set of interpretive frames is generated by an
II/CVM ompiler. A frame is a des ription of some subtask, whose granularity
may range from a single s alar assignment statement to a solver for a system
of equations. Under supervision of the virtual ma hine (II/CVM), the a tions
spe i ed in a frame may be performed in one of three ways:
Exe uted by an interpretive module dire tly, whi h also in orporates JIT
ompilation apability.
Some pre ompiled omputational library fun tion is invoked lo ally to a -
omplish the task; this fun tion may be exe uted sequentially or in parallel.
The frame is sent to some registered remote system, whi h will get the
work done, on e again either sequentially or in parallel.
With this approa h, optimized binary odes for well formed omputation subtasks exist in runtime libraries, supporting a high level interpretive environment.
Task parallelism is observed among di erent frames exe uted by the three me hanisms simultaneously, while data parallelism is observed in the exe ution of
some of the runtime fun tions.
Presuming these e orts satisfa torily address the performan e issue, the se ond aspe t in question on erns expressiveness of the Java language. Our nal
interfa e to MPI is quite elegant, and provides mu h of the fun tionality of the
standard C and Fortran bindings. But reating this interfa e was a more diÆ ult
pro ess than one might hope, both in terms of getting a good spe i ation, and
in terms of making the implementation work. In se tion 4 we noted that the la k
of features like C++ templates (or any form of parametri polymorphism) and
user-de ned operator overloading (available in many modern languages, from
fun tional programming languages to Fortran) made it diÆ ult to produ e a
20
ompletely satisfying interfa e to a data parallel library. The Java language as
urrently de ned imposes various limits to the reativity of the programmer.
In many respe ts Java is undoubtedly a better language than Fortran. It
is obje t-oriented to the ore and highly dynami , and there is every reason to
suppose that su h features will be as valuable in s ienti
omputing as in any
other programming dis ipline. But to displa e established s ienti programming languages Java will surely have to a quire some of the fa ilities taken for
granted in those languages.
Referen es
[1℄ A. Agrawal, A. Sussman, and J. Saltz. An integrated runtime and ompiletime approa h for parallelizing stru tured and blo k stru tured appli ations. IEEE Transa tions on Parallel and Distributed Systems, 6, 1995.
[2℄ Susan Atlas, Subhankar Banerjee, Julian C. Cummings, Paul J. Hinker,
M. Srikant, John V. W. Reynders, and Mary Dell Tholburn. POOMA: A
high performan e distributed simulation environment for s ienti appli ations. In Super omputing `95, 1995.
[3℄ Aart J.C. Bik and Dennis B. Gannon. Automati ally exploiting impli it
parallelism in Java. This workshop.
[4℄ J. Boyle, R. Butler, T. Disz, B. Gli kfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens. Portable Programs for Parallel Pro essors. Holt,
Rinehart and Winston, 1987.
[5℄ Ralph Butler and Ewing Lusk. Monitors, messages, and lusters: The p4
parallel programming system. Parallel Computing, 20:547{564, April 1994.
[6℄ D. B. Carpenter. Adlib: A distributed array library to support HPF translation, 1995. Presented at the 5th International Workshop on Compilers
for Parallel Computers. URL: http://www.npa .syr.edu/users/db /Adlib.
[7℄ K.M. Chandy and C. Kesselman. CC++: A de larative on urrent obje toriented programming notation. In Gul Agha, Peter Wegner, and Akinori
Yonezawa, editors, Resear h Dire tions in Con urrent Obje t-Oriented Programming, page 24. MIT Press, 1993. ISBN: 0-262-01139-5.
[8℄ A. Choudhary, G. Fox, S. Ranka, S. Hiranandani, K. Kennedy, C. Koelbel, and J. Saltz. Software support for irregular and loosely syn hronous
problems. Computing Systems in Engineering, 3:43{52, 1992.
[9℄ Parallel Compiler Runtime Consortium. Common runtime support for
high-performan e parallel languages. In Super omputing `93. IEEE Computer So iety Press, 1993.
21
[10℄ Parallel Compiler Runtime Consortium.
HPCC and Java|a report by the Parallel Compiler Runtime Consortium, 1996. URL:
http://www.npa .syr.edu/users/g f/hpjava3.html.
[11℄ J.J. Dongarra, R. Pozo, and D.W. Walker. An obje t oriented design for
high performan e linear algebra on distributed memory ar hite tures. In
Obje t Oriented Numeri s Conferen e, 1993.
[12℄ Stephen J. Fink and S ott B. Baden. The KeLP User's Guide. University
of California, San Diego, La Jolla, CA, Mar h 1996. URL: http://wwwse.u sd.edu/groups/hp l/s g/kelp.html.
[13℄ High Performan e Fortran Forum. High Performan e Fortran language
spe i ation. S ienti Programming, spe ial issue, 2, 1993.
[14℄ Message Passing Interfa e Forum. MPI: A Message-Passing Interfa e
Standard. University of Tenessee, Knoxville, TN, June 1995. URL:
http://www.m s.anl.gov/mpi.
[15℄ I. Foster and K. M. Chandy. Fortran M: A language for modular parallel
programming. Journal of Parallel and Distributed Computing, 26(1):24,
1995.
[16℄ G.C. Fox and W. Furmanski. Towards interpreted run-time Fortran90D environment.
Te hni al report, NPAC, 1992.
URL:
http://www.npa .syr.edu/proje ts/hpsin/hp .html.
[17℄ Geo rey C. Fox and Wojtek Furmanski. Computing on the Web: new approa hes to parallel pro essing|petaop and exaop performan e in the year
2007, 1997. URL: http://www.npa .syr.edu/users/g f/petastu /petaweb/.
[18℄ Geo rey C. Fox, Xiaoming Li, Yuhong Wen, and Guansong Zhang. Studies of integration and optimization of interpreted and ompiled languages.
Te hni al Report SCCS-780, NPAC, February 1997.
[19℄ A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Man hek, and V. Sunderam. PVM: Parallel Virtual Ma hine|A Users' Guide and Tutorial
for Networked Parallel Computing. S ienti and Engineering Series. MIT
Press, 1994. ISBN: 0-262-57108-0.
[20℄ James Gosling and Henry M Gilton. The Java Language Environment: A White Paper. JavaSoft, Sun Mi rosystems, In , 1996. URL:
http://www.javasoft. om/do .
[21℄ A.S. Grimshaw. An introdu tion to parallel obje t-oriented programming
with Mentat. Te hni al Report 91 07, University of Virginia, 1991.
22
[22℄ JavaSoft, Sun Mi rosystems, In . RMI Do umentation, 1996. URL:
http://java.sun. om/produ ts/JDK/1.1/.
[23℄ C.H. Koelbel, D.B. Loveman, R.S. S hreiber, G.L. Steel, Jr., and M.E.
Zosel. The High Performan e Fortran Handbook. MIT Press, 1994. ISBN:
0-262-61094-9.
[24℄ Inmos Ltd. o am 2 Referen e Manual. Prenti e-Hall International, 1988.
ISBN: 0-13-629312-3.
[25℄ MPICH|a portable implementation
http://www.m s.anl.gov/mpi/mpi h/.
of
MPI.
URL:
[26℄ R. Parsons and D. Quinlan. A++/P++ array lasses for ar hite ture independent nite di eren e al ulations. In Obje t Oriented Numeri s Conferen e, 1994.
[27℄ Jon Siegel. CORBA Fundamentals and Programming. Wiley, 1996. ISBN:
0471-12148-7.
[28℄ P. Sivilotti and P. Carlin. A tutorial for CC++. Te hni al Report CS-TR94-02, Calte h, 1994.
23
Donald Leskiw