Academia.eduAcademia.edu

Language bindings for a data-parallel runtime

1998, Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments

https://doi.org/10.1109/HIPS.1998.665142

Abstract

The NPAC kernel runtime, developed in the PCRC (Parallel Compiler Runtime Consortium) project, is a runtime library with special support for the High Performance Fortran data model. It provides array descriptors for a generalized class of HPF-like distributed arrays, support for parallel access to their elements, and a rich library of collective communication and arithmetic operations for manipulating these arrays. The library has been successfully used as a component in experimental HPF translation systems. With prospects for early appearance of fully-featured, ecient HPF compilers looking questionable, we discuss a class of more easily implementable data-parallel language extensions that preserve many of the attractive features of HPF, while providing the programmer with direct access to runtime libraries such as the NPAC PCRC kernel. PCRC Java interface MPI Distributed data and control ad++ interface (Adlib) Kernel run-time Communication and arithmetic ranges Distributed Groups Distributed control ''where'' Process ''on''

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Syracuse University Research Facility and Collaborative Environment Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 1998 Language Bindings for a Data-Parallel Runtime Bryan Carpenter Syracuse University, [email protected] Geoffrey C. Fox Syracuse University Donald Leskiw Syracuse University Xinying Li Syracuse University, [email protected] Yuhong Wen Syracuse University, [email protected] Follow this and additional works at: https://surface.syr.edu/npac Part of the Programming Languages and Compilers Commons Recommended Citation Carpenter, Bryan; Fox, Geoffrey C.; Leskiw, Donald; Li, Xinying; and Wen, Yuhong, "Language Bindings for a Data-Parallel Runtime" (1998). Northeast Parallel Architecture Center. 54. https://surface.syr.edu/npac/54 This Article is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized administrator of SURFACE. For more information, please contact [email protected]. Language Bindings for a Data-Parallel Runtime Bryan Carpenter Geo rey Fox Donald Leskiw Xinying Li Guansong Zhang NPAC at Syracuse University Syracuse, NY 13244 fdbc,gcf,leskiwd,xli,wen,[email protected] Abstract The NPAC kernel runtime, developed in the PCRC (Parallel Compiler Runtime Consortium) project, is a runtime library with special support for the High Performance Fortran data model. It provides array descriptors for a generalized class of HPF-like distributed arrays, support for parallel access to their elements, and a rich library of collective communication and arithmetic operations for manipulating these arrays. The library has been successfully used as a component in experimental HPF translation systems. With prospects for early appearance of fully-featured, ecient HPF compilers looking questionable, we discuss a class of more easily implementable data-parallel language extensions that preserve many of the attractive features of HPF, while providing the programmer with direct access to runtime libraries such as the NPAC PCRC kernel. 1 Introduction As part of the PCRC [10] project we completed development of a high-level runtime library for dataparallel languages [4]. The motivating goal was to simplify translation of High Performance Fortran (HPF) [6] by providing a coherent interface to the distributed array descriptors and collective operations needed for straightforward, ecient translation of parallel constructs like FORALL and array assignments. This goal was achieved quite successfully, and two experimental subset HPF translators have used the library to manage their communications [13, 7]. Unfortunately it is evident that implementing compilers for a language as complex as full HPF is a formidable task. On the other hand the runtimes that have evolved to support HPF and its kin are powerful and have an underlying elegance, using them without Yuhong Wen a compiler|through direct calls from an SPMD program written in a standard language|is clumsy and error prone. This is due in part to the large number of parameters needed to describe distributed arrays. We face the possibility that|in the near term future at least|neither HPF or conventional languages will permit full exploitation of libraries such as the one developed in PCRC. In this paper we discuss possibilities for enhancing programming languages like Fortran with relatively simple extensions to support declaration and manipulation of HPF-like distributed arrays. In contrast to HPF, our extensions assume the programmer speci es in full logical detail exactly where computations and communications are performed. This makes compilers a much more straightforward proposition. In spite of this simpli cation, we argue that|supplemented by a binding to a suitable collective communication library such as ours|the resulting hybrid data-parallel/SPMD programming models can achieve a level of elegance and expressivity comparable to full HPF. 2 Background: runtime kernel The kernel of NPAC library is a C++ class library. It is most directly descended from the run-time library of an earlier research implementation of HPF [7] with in uences from the Fortran 90D run-time and the CHAOS/PARTI libraries [1, 11, 5]. The kernel is currently implemented on top of MPI. The library design is solidly object-oriented, but eciency is maintained as a primary goal. The overall architecture of the library is illustrated in gure 1. At the top level there are several compilerspeci c interfaces to a common run-time kernel. The four interfaces shown in the gure are illustrative. They include two di erent Fortran interfaces (used by di erent HPF compilers), a user-level C++ interface PCRC F77 interface PCRC Java interface ad++ interface are complete, and others are in progress. Results of a preliminary benchmarks reported in [13] suggest that an HPF compiler based on the high-level NPAC runtime can be competetive with commercial compilers. SHPF F90 interface Kernel run-time (Adlib) Distributed control ‘‘where’’ Distributed Arrays Iterators on ranges Distributed ranges 3 The language model Communication and arithmetic Distributed data and control Distributed control ‘‘on’’ ‘‘remap’’, ‘‘shift’’, etc reductions ‘‘gather’’/‘‘scatter’’ etc Message schedules Tree schedules Random access Schedules The next section gives an outline of a Fortran dialect for explicit SPMD programming with distributed arrays. Before attempting a concrete syntax, we will discuss some of the general goals and features of such a language. We aim to provide a exible hybrid of the data parallel and SPMD approaches. To this end HPF-like distributed arrays should appear as language primitives. New distributed control constructs will be added to facilitate access to the local elements of these arrays. In the SPMD mold, the model should allow processors the freedom to independently execute complex procedures on local elements: programs should not be constrained by SIMD-style array syntax. A design decision is made that all access to non-local array elements should go through library functions| typically collective communication operations. This puts an extra onus on the programmer; but making communication explicit encourages the programmer to write algorithms that exploit locality, and simpli es the task of the compiler writer. For the newcomer to HPF, a great strength of the language lies in the fact that the semantic e ect of a particular operation is generally identical to the e ect in the corresponding sequential program. This means that, so long as the programmer understands conventional Fortran, it is very easy for him or her to understand the behaviour of a program at the level of what values are held in program variables, and the nal results of procedures and programs. Of course the ease of understanding this \value semantics" of a program is counterbalanced by the diculty in knowing exactly how the compiler translates the program. Understanding the performance of an HPF program may require the programmer to have quite detailed knowledge of how arrays are distributed over processor memories, and what strategy the compiler adopts for distributing computations across processors. The language model we discuss has some super cial (and some deeper) similarities to the HPF model, but the HPF-style semantic equivalence between the dataparallel program and a sequential program is abandoned in favour of a more direct equivalence between the data-parallel program and an SPMD program. Because understanding an SPMD program is presumably more dicult than understanding a sequential pro- Process Groups MPI Figure 1. NPAC runtime architecture called ad++1, and a Java interface under development. The development of several top-level interfaces has produced a robust kernel interface, on which we anticipate other language- and compiler- speci c interfaces can be constructed relatively straightforwardly. The largest part of the kernel is concerned with global communication and arithmetic operations on distributed arrays. These are represented on the righthand side of gure 1. The communication operations supported include HPF/F90 array intrinsic operations such as CSHIFT, the function write halo, which updates ghost areas of a distributed array, the function remap, which is equivalent to a Fortran 90 array assignment between two conforming sections of two arbitrarily distributed HPF arrays, and various gather- and scatter- type operations allowing irregular patterns of data access. Arithmetic operations supported include all F95 array reduction and matrix arithmetic operations, and HPF combining scatter. A complete set of HPF standard library functions is under development. All the data movement schedules are dependent on the infra-structure on the left-hand side of the gure 1. This provides the distributed array descriptor, and basic support for traversing distributed data (\distributed control"). Important substructures in the array descriptor are the range object, which describes the distribution of an array global index over a process dimension, and the group object, which describes the embedding of an array in the active processor set. At the time of writing the kernel is fully functional and quite mature, two of the four interfaces illustrated ad++ is currently implemented as a set of header les de ning distributed arrays as type-secure container class templates, function templates for collective array operations, and macros for distributed control constructs. 1 2 gram, learning naive use of our language will certainly be harder than for HPF. Our claim is that once a set of related concepts about distributed data and distributed control are mastered, the kind of language we discuss gives the programmer more intricate control over the behaviour of a program, and this may ultimately lead to better performance. On the other hand, by retaining many of the array-level features of HPF as language primitives, we still enable a higher-level of programming than is possible in the direct message-passing style. We will adopt a distributed data model semantically equivalent to to the HPF data model. However we describe the distributed arrays in terms of a slightly different set of basic concepts. In general HPF describes the decomposition of an array through alignment to some template, which is in turn distributed over a processor arrangement. A processor arrangement is a multidimensional grid of abstract processors. A template is a an abstraction of the index space of a distributed array|it is like a multi-dimensional array mapped to the process grid, but has no data associated with its elements. The analogous concepts in our parametrization of the distributed array are the distributed range (or simply range) and the process group (or simply group). A distributed range is like a single dimension of an HPF template (or of some triplet-selected subset of that dimension). It de nes a map from an integer subscript interval into a single dimension of an HPF-like processor arrangement. A process group is equivalent to an HPF processor arrangement, or to a certain subset of such an arrangement. We emphasize that switching from templates to ranges and groups is a change of parametrization only. In itself it does not change the set of allowed ways to decompose an array. The reason for the change is that groups and ranges seem to be better suited to specifying the distribution of program control. Our language model di ers materially from HPF in its dependence on distributed control constructs. HPF 2.0 provides some similar mechanisms, but only as optional directives. In our language, by contrast, they are required to specify which process performs each action. The simplest example of a distributed control construct is the ON construct. This is a control construct parametrized by a process group. It speci es that the enclosed code block is only executed on processes in the group. The second distributed control construct is called AT. This is very similar to ON, except that it is parametrized by an element of a distributed range. The enclosed code block is only executed on processes that hold the range element concerned (because a distributed range corresponds to a single dimension of an HPF template, this is equivalent to restricting the operation to some slice of a processor arrangement). The last and most important distributed control construct is the OVERALL distributed loop. This construct is parametrized by a range or a subrange. It is a direct abstraction of the loops in low-level SPMD programs that iterate over elements of the local portion of some distributed data structure. Each of these control constructs restricts the active group of processors to some subset. In the case of ON and AT this restriction is obvious. In the case of OVERALL the group of processors is e ectively partitioned along the process dimension associated with the range. With certain restrictions which are easily stated in terms of the current active process group, the distributed control constructs can be nested, and collective operations can be called inside distributed control constructs. In other words, collective library operations need not imply global synchronization|only synchronization among members of the current active process group is assumed. The fundamental constraint that forces the use of the distributed control constructs is the requirement that all access to array elements must be local. Although arrays are subscripted with global subscripts as in HPF, a subscripting operation must not imply access to an element not held on this processor. Meeting this constraint sounds onerous. We will try to illustrate with concrete examples that the distributed control constructs match the requirements of typical parallel algorithms well, and once the basic ideas are grasped, programming around this constraint becomes quite natural. A further advantage of this style is that, because the underlying programming model is SPMD, the switch between data parallel programming and lowlevel message-passing is merely a change in point of view. Unlike HPF, there is no need to pass through an awkward \extrinsic" interface if a particular subcomputation cannot easily be handled by collective dataparallel operations. We simply need to provide inquiry functions that provide access to the local sequential array component of a distributed array, and to the local physical process id. 4 Outline of an extended Fortran dialect 4.1 Example syntax The examples in the following sections use certain syntax extensions to Fortran. The syntax is illustra3 tive, and by no means nalized. A distinguishing property of the proposed system, compared to HPF, is that it includes ordinary Fortran as a strict subset, and ordinary Fortran constructs are unchanged by the translator. Our system will not attempt to exploit parallelism even in \explicitly parallel" constructs such as the array syntax of Fortran 90 or the FORALL statement of Fortran 95. This policy drastically simpli es the translator, and gives the programmer much ner control over the generated code. Processor arrangements will be declared as in HPF. The explicit TEMPLATE concept of HPF is abandoned in favour a distributed range concept. A distributed range represents a range of subscript values, and incorporates a mapping of that range into a dimension of a processor arrangement. The mapping options will be similar to HPF: block, cyclic, block-cyclic, etc. A distributed array declaration is distinguished from a sequential array declaration by using a di erent kind of brackets: CALL DA_CSHIFT(DST, SRC, DIM, SHIFT) RES = DA_SUM(SRC) CALL DA_REMAP(DST, SRC) Here DST and SRC are distributed arrays. The subroutine DA CSHIFT is closely analogous to the Fortran 90 CSHIFT intrinsic. The function DA SUM sums all elements of the argument and broadcasts the result value. The subroutine DA REMAP takes two distributed arrays of the same shape which may have any, unrelated mapping and copies the elements of one to the other4 . The syntax for declaration of distributed data is complemented by syntax extensions for distributed control. The simplest example is the AT construct. A distributed array is subscripted with a global subscript. Subscripting expressions are only meaningful on the processors that hold the elements selected. To make sure a statement using a distributed array element is only executed on a processor that holds copies of the element, a stylized form of conditional called the AT construct is provided. The code fragment below illustrates a pair of nested at constructs. PROCESSORS P(4, 4) RANGE X(N), Y(M) DISTRIBUTE X ONTO P(BLOCK, *) DISTRIBUTE Y ONTO P(*, CYCLIC) AT(X(n)) AT(Y(17)) A [n, 17] = B [17] + 23 ; ENDAT ENDAT REAL A [X, Y] INTEGER B [Y] INTEGER C [X, Y, 10] REAL D [X(1 : N / 2), Y(::2)] This construct is an abstraction of the if statement in a low-level SPMD program that tests whether the local processor contains a desired array element. The body of the construct only executes on processors that hold the speci ed range element. A more important and powerful distributed control construct is the OVERALL distributed loop. This construct super cially resembles the Fortran 95 FORALL construct. It di ers from FORALL in several respects  The index ranges are distributed ranges, not triplets.  There are no restriction on what kind of statements can appear in the body of the construct. Any executable statements are allowed. This reects the explicit SPMD emphasis of the language, in constrast to the SIMD heritage of HPF.  Each \iteration" of the construct is localized to a well-de ned processor (or group of processors) through the use of distributed ranges for the loop indices. An iteration should only access array elements held locally on a processor executing the iteration. This restriction sounds inconvenient: Ranges X and Y are each distributed over a single dimensions of the processor arrangement. The array A is an N by M distributed array, distributed blockwise in its rst dimension and cyclically in its second. In general the brackets in a distributed array declaration contain a list of orthogonal ranges2. B is a one dimensional array distributed cyclically in the second dimension of P and implicitly replicated over the rst. C is a three dimensional array with two distributed and one collapsed range. The declaration of D illustrates how HPF-like non-trivial alignment relations can be introduced by using subranges in array declarations. D is an N / 2 by N / 2 distributed array3 Provision of distributed arrays as language primitives provides a clean, systematic interface to libraries of collective operations. Examples of standard collective operations are 2 Two ranges are considered orthogonal if they are distributed over di erent dimensions of the same processor arrangement. As a special case a collapsed, on-processor array range is orthogonal to any other. 3 Triplet subscripting of ranges works in the same was as triplet subscripting of arrays in Fortran. If lower or upper bounds are omitted they default to the bounds of the subscripted object, so Y(::2) is equivalent to Y(1:N:2)|a range including every 2nd element of Y. 4 As a rule operations like DA CSHIFT and DA REMAP cannot perform in-place updates. Their arguments should be distinct and non-overlapping. 4 in practise it can usually be accomodated quite painlessly by limiting access to arrays aligned with the loop ranges. In this simple example of an OVERALL construct is used to initialize the array A with some expression involving the global subscripts OVERALL(I = X, J = Y) A [I, J] = I + J ENDOVERALL In a slightly more complex example we add together elements from two arrays INTEGER, PARAMETER :: N = 100 PROCESSORS P(NP) RANGE, DISTRIBUTE ONTO P(CYCLIC) :: X(N) OVERALL(I = X, J = Y) A [I, J] = B [J] + C [I, J, 17] ENDOVERALL REAL A [N, X] REAL B [N] This operation is legal due to the alignment relation between the A, B and C arrays. ! used as a buffer ! ... initialize the array ON(P) DO K = 1, N - 1 AT(X (K)) A [K, K] = SQRT(A [K, K]) 4.2 Example 1: Cholesky decomposition Figure 2 gives a parallel implementation of Cholesky decomposition in the extended language. In the declaration of A the rst dimension is speci ed with an integer range rather than a distributed range. This means that this dimension is on-processor|collapsed in HPF terminology. In general the collective operation DA REMAP copies the elements of one distributed array or section to another of the same shape. In the current example, because B has replicated mapping, it implements a broadcast. In spite of the fact that we have demanded that the programmer explicitly specify which processor performs every operation, and exactly how communications are to be inserted, we claim that this implementation is at least as simple as any that could be achieved in HPF (and clearer than anything that could be written in MPI). Other features illustrated by this example are construction of sections of distributed arrays (directly analogous to construction of sections of Fortran 90 arrays) and use of a subrange to parametrize an OVERALL loop. DO L = K + 1, N A [L, K] = A [L, K] / A [K, K] ENDDO ENDAT CALL DA_REMAP(B [K + 1 :], A [K + 1 : , K]) OVERALL(I = X (K + 1 :)) DO J = I, N A [J, I] = A [J, I] - B [J] * B [I] ENDDO ENDOVERALL ENDDO AT(X (N)) A [N, N] = SQRT(A [N, N]) ENDAT ENDON Figure 2. Implementation of Cholesky decomposition in mooted syntax. 4.3 Example 2: Red-Black relaxation Figure 3 gives a parallel implementation of red-black relaxation in the new language. Following the proposals of HPF 2.0 standard, ghost regions are allowed for arrays. In the example the width of these regions is de ned by specifying the SHADOW attribute (in our language this attribute is speci ed for ranges rather than arrays). Ghost regions are extensions of the locally 5 held block of a distributed array, used to cache values of elements held on adjacent processors. The ghost regions are explicitly brought up to date using the subroutine DA WRITE HALO from the standard library. The arguments following the array itself de ne the mode of updating ghost regions at the extremes of the array (cyclic wraparound in this slightly unrealistic example) and the actual width of the halo to be written. Subsequently the OVERALL construct can access elements of the array displaced by up to one place from the alignment of the range parametrizing the construct. Note the ease with which sites of a particular colour are selected by using nested OVERALL constructs, parametrizing the inner one with a subrange. In HPF this could be expressed using array syntax and a WHERE construct, but that would be relatively inecient. Alternatively it could be expressed using nested INDEPENDENT DO loops or FORALLs, but nesting these constructs makes it dicult for a compiler to analyse them and emit ecient parallel code. In our scheme no special analysis is necessary. A simple ecient scheme is applied universally to translate subscripts in OVERALL constructs. PROCESSORS P(NP, NP) RANGE X(N), Y(N) DISTRIBUTE X ONTO P(BLOCK, *) DISTRIBUTE Y ONTO P(*, BLOCK) SHADOW (1) X, Y REAL A [X, Y] ON(P) ! Initialize the array... OVERALL(I = X, J = Y) A [I, J] = ... ENDOVERALL 5 Discussion and related work DO ITER = 1, NITER DO ICOLOUR = 0, 1 CALL DA_WRITE_HALO(A, (/CYCL, CYCL/), (/1, 1/)) The language model described in this paper has previously been investigated using C++ class libraries [2]. OVERALL(I = X) Currently we are moving similar syntactic ideas to to OVERALL(J = Y(MOD(ICOLOUR + I, 2) : : 2)) an extended Java dialect. This is seen as an interA [I, J] = (A [I, J - 1] + A [I, J + 1] + & mediate stage, preliminary to implementing a Fortran & A [I - 1, J] + A [I + 1, J]) / 4 translator. For a more complete exposition of the unENDOVERALL ENDOVERALL derlying parallel programming model proposed here, ENDDO the reader is refered to [3], which gives examples in an ENDDO extended Java syntax. The C++ and Java versions are ENDON essentially prototype systems, not o ering the immediate expectation of very high performance. Our hope is Figure 3. Implementation red-black relaxation that a Fortran translator for a language such as the one in mooted syntax. described here will be an order of magnitude easier to implement than a compiler for full HPF. We also suspect it may o er higher performance than many existing HPF systems, relying on reasonably sophisticated programmers, rather than on very advanced compilers. Moreover, we hope that for certain applications the explicitly SPMD language model espoused here will be more exible and convenient than HPF. An initial implementation of our language will take the form of a translator to ordinary Fortran. The distributed arrays of the extended language will appear in the emitted code as a pair|an ordinary Fortran array of local elements and a handle to a Distributed Array Descriptor (DAD). Details of the distribution format, including non-trivial details of global-to-local transla6 tion of the subscripts, are managed in the run-time library. Acceptable performance should nevertheless be achievable, because we expect that in useful parallel algorithms most work on distributed arrays will occur inside OVERALL constructs. In normal usage, the formulae for address translation can then be linearized. The non-trivial aspects of address translation (including array bounds checking) can be absorbed into the startup overheads of the loop. Since distributed arrays are usually large, the loop ranges are typically large, and the startup overheads (including all the run-time calls associated with address translation) can be amortized. This approach to translation of parallel loops is discussed in detail in [4]. Note that if array accesses are genuinely irregular, the necessary subscripting cannot usually be directly expressed in our language, because subscripts cannot be computed randomly in parallel loops without violating the fundamental SPMD restriction that all accesses be local. This is not regarded as a shortcoming: on the contrary it forces explicit use of an appropriate library package for handling irregular accesses (such as CHAOS [5]). Of course a suitable binding of such a package is needed in our language. A complementary approach to communication in a distributed array environment is the one-sidedcommunication model of Global Arrays (GA) [8]. For task-parallel problems this approach is often more convenient than the schedule-oriented communication of CHAOS (say). Again, the language model we advocate here appears quite compatible with GA approach| there is no obvious reason why a binding to a version of GA could not be straightforwardly integrated with the the distributed array extensions of the language described here. We mention two projects that have some similarity to the work described here. ZPL [12] is a new programming language for scienti c computations. Like Fortran 90, it is an array language. It has an idea of performing computations over a region, or set of indices. Within a compound statement pre xed by a region speci er, aligned elements of arrays distributed over the same region can be accessed. This idea has certain similarities to our OVERALL construct. In ZPL, parallelism and communication are more implicit than in our proposed languge. The connection between ZPL programming and SPMD programming is not explicit. While there are certainly attractions to the more abstract point of view, the language we are proposing deliberately provides lowerlevel access to the parallel machine. F- - [9] is an extended Fortran dialect for SPMD programming. The approach is quite di erent to the one proposed here. In F- -, array subscripting is local by default, or involves a combination of local subscripts and explicit process ids. There is no analogue of global subscripts, or HPF-like distribution formats. In F- the logical model of communication is built into the language|remote memory access with intrinsics for synchronization. In our proposal there are no communication primitives in the language itself. We follow the MPI philosophy of providing communication through separate libraries. While F- - and our approach share an underlying programming model, we believe that our framework o ers greater opportunities for exploiting established software technologies, such as the PCRC libraries. References [1] A. Agrawal, A. Sussman, and J. Saltz. An integrated runtime and compile-time approach for parallelizing structured and block structured applications. IEEE Transactions on Parallel and Distributed Systems, 6, 1995. [2] B. Carpenter. Programming in ad++, 1998. http://www.npac.syr.edu/projects/pcrc/doc. [3] B. Carpenter, G. Zhang, G. Fox, X. Li, and Y. Wen. Introduction to Java-Ad, 1997. http://www.npac.syr.edu/projects/pcrc/doc. [4] B. Carpenter, G. Zhang, and Y. Wen. NPAC PCRC runtime kernel de nition. Technical Report CRPCTR97726, Center for Research on Parallel Computation, 1997. Up-to-date version maintained at http://www.npac.syr.edu/projects/pcrc/doc. [5] R. Das, M. Uysal, J. Salz, and Y.-S. Hwang. Communication optimizations for irregular scienti c computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462{479, Sept. 1994. [6] High Performance Fortran Forum. High Performance Fortran language speci cation. Scienti c Programming, special issue, 2, 1993. [7] J. Merlin, B. Carpenter, and T. Hey. shpf: a subset High Performance Fortran compilation system. Fortran Journal, pages 2{6, Mar. 1996. [8] J. Nieplocha, R. Harrison, and R. Little eld. The Global Array: Non-uniform-memory-access programming model for high-performance computers. The Journal of Supercomputing, 10:197{220, 1996. [9] R. Numrich and J. Steidel. F{: A simple parallel extension to Fortran 90. SIAM News, page 30, 1997. [10] Parallel Compiler Runtime Consortium. Common runtime support for high-performance parallel languages. In Supercomputing `93. IEEE Computer Society Press, 1993. [11] R. Ponnusamy, Y.-S. Hwang, R. Das, J. H. Saltz, A. Choudhary, and G. Fox. Supporting irregular distributions using data-parallel languages. IEEE Parallel and Distributed Technology, Spring, 1995. 7 [12] L. Snyder. A ZPL programming guide. Technical report, University of Washington, May 1997. http://www.cs.washington.edu/research/projects/zpl/. [13] G. Zhang, B. Carpenter, G. Fox, X. Li, X. Li, and Y. Wen. PCRC-based HPF compilation. In 10th International Workshop on Languages and Compilers for Parallel Computing, 1997. To appear in Lecture Notes in Computer Science. 8

References (13)

  1. A. Agrawal, A. Sussman, and J. Saltz. An integrated runtime and compile-time approach for parallelizing structured and block structured applications. IEEE Transactions on Parallel and Distributed Systems, 6, 1995.
  2. B. Carpenter. Programming in ad++, 1998. http://www.npac.syr.edu/projects/pcrc/doc.
  3. B. Carpenter, G. Zhang, G. Fox, X. Li, and Y. Wen. Introduction to Java-Ad, 1997. http://www.npac.syr.edu/projects/pcrc/doc.
  4. B. Carpenter, G. Zhang, and Y. Wen. NPAC PCRC runtime kernel de nition. Technical Report CRPC- TR97726, Center for Research on Parallel Com- putation, 1997. Up-to-date version maintained at http://www.npac.syr.edu/projects/pcrc/doc.
  5. R. Das, M. Uysal, J. Salz, and Y.-S. Hwang. Commu- nication optimizations for irregular scienti c compu- tations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462{479, Sept. 1994.
  6. High Performance Fortran Forum. High Performance Fortran language speci cation. Scienti c Program- ming, special issue, 2, 1993.
  7. J. Merlin, B. Carpenter, and T. Hey. shpf: a subset High Performance Fortran compilation system. For- tran Journal, pages 2{6, Mar. 1996.
  8. J. Nieplocha, R. Harrison, and R. Little eld. The Global Array: Non-uniform-memory-access program- ming model for high-performance computers. The Journal of Supercomputing, 10:197{220, 1996.
  9. R. Numrich and J. Steidel. F{: A simple parallel ex- tension to Fortran 90. SIAM News, page 30, 1997.
  10. Parallel Compiler Runtime Consortium. Common runtime support for high-performance parallel lan- guages. In Supercomputing `93. IEEE Computer Soci- ety Press, 1993.
  11. R. Ponnusamy, Y.-S. Hwang, R. Das, J. H. Saltz, A. Choudhary, and G. Fox. Supporting irregular dis- tributions using data-parallel languages. IEEE Paral- lel and Distributed Technology, Spring, 1995.
  12. L. Snyder. A ZPL programming guide. Tech- nical report, University of Washington, May 1997. http://www.cs.washington.edu/research/projects/zpl/.
  13. G. Zhang, B. Carpenter, G. Fox, X. Li, X. Li, and Y. Wen. PCRC-based HPF compilation. In 10th In- ternational Workshop on Languages and Compilers for Parallel Computing, 1997. To appear in Lecture Notes in Computer Science.
About the author

Over forty years experience in research and systems engineering for air and missile defense, and intelligence. Presently developing methods for tracking very-low observable airborne targets and hypersonic cruise missiles, also passive angle-only ranging and tracking. Co-authored book on Kalman filtering, and several research papers – topics include computational electromagnetics, radar tracking and data fusion, and high-performance computing (see scholar.google.com/citations?hl=en&user=iNkpinMAAAAJ). Patent on “Single-Scan Track Initiation for Radars Having Rotating Electronically-Scanned Antennas” (see patents.google.com/patent/US7508336). Recipient of NASA's Group Achievement Award for contributions to simulation and modeling technologies in high-performance computing (hardware, operating systems, and application software). IEEE member since 1974, also member of the Association of Old Crows and the American Mathematical Society. Founder/Co-founder of several successful small businesses: Applied Research and Engineering, Inc., The Ultra Corporation, and Leskiw Associates, LLC. And founding President of the Lexington Sinfonietta with Hisao Watanabe (now the Lexington Symphony), in Lexington, MA.

Papers
21
View all papers from Donald Leskiwarrow_forward