printed 1996-6-10 http://www.informatik.uni-stuttgart.de/ipvr/bv/p3 All Rights Reserved Copies of this report may be ordered from: Dekanat der Fakultät Informatik, Universität Stuttgar t Breitwiesenstr. 20-22, D-70565 Stuttgar t phone: +49 (711) 7816-371 fax: +49 (711) 7816-424 Parallaxis-III User Manual Thomas Bräunl Computer Science Report Bericht Nr. 1996/08 June 1996 Universität Stuttgart Fakultät Informatik Preface The Parallaxis project started as my Ph.D. project in 1987 at USC, Los Angeles, and continued during my time at the Univ. Stuttgart, Germany. It has gone through several stages of improvements and complete new starts from scratch between the major versions listed below. The idea persisted, however, to formulate a machine independent data-parallel programming language: version 0 1988 implemented by Thomas Bräunl version 1 1989 implemented by Ingo Barth (compiler), Frank Sembach (simulator) version 2 1990 implemented by Barth, Sembach, and Stefan Engelhardt (MasPar) version 3 1995 implemented by Eduard Kappel, Harald Lampke, Hartmut Keller (all p3c compiler), Jörg Stippa (debugger) The Parallaxis software, comprising compiler, debugger, numerous parallel application programs, and documentation may be copied over the Internet via "ftp", while some information is available from the p3 web page: http://www.informatik.uni-stuttgart.de/ipvr/p3 ftp://ftp.informatik.uni-stuttgart.de/ipvr/p3 Several conference contributions, research reports, and books have been published on Parallaxis, most prominently: Bräunl: Parallel Programming, Prentice-Hall, 1993 Bräunl et al.: Parallele Bildverarbeitung, Addison-Wesley, 1995 Stuttgart, June 1996 Thomas Bräunl Contents 1 Language Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Specifying Virtual Processors and Connections . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Multiple Configurations and Iterative Connections. . . . . . . . . . . . . . . . . . . 9 1.4 Data Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Processor Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.6 Parallel Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.7 Structured Data Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.8 Unstructured Data Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.9 Exchange between Scalar and Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.10 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.11 Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.12 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.13 Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.14 Relation to Modula-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.15 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1 Basic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1.1 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1.2 Generation of Fractals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.3 Sorting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2 Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1 Laplace Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.2 Dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3 Compiler and Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Debugger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Built-in Functions and Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Graphics Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4 Parallaxis-III Syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.5 Literature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4 Contents Chapter 1 Language Definition Parallaxis is a machine-independent language for data-parallel programming. Sequential Modula-2 is used as the base language. Programming in Parallaxis is done on a level of abstraction with virtual processors and virtual connections, which may be defined by the application programmer. 1.1 Introduction Parallaxis [Bräunl 89], [Bräunl 91], [Bräunl 93] is based on Modula-2 [Wirth 83], extended by data-parallel concepts. The language is fully machine-independent across SIMD architectures; therefore programs written in Parallaxis run on different SIMD parallel computer systems. For a large number of (single-processor) workstations and personal computers, there is also a Parallaxis simulation system with source level debugging and tools for visualization and timing. A compiler for data-parallel programming of MIMD computer systems in SPMD style (same program, multiple data) is being developed. Parallel programs with small data sets can be developed, tested and debugged with this simulation system. Then, Parallaxis compilers can be used to generate parallel code for the MasPar MP-1 / MP-2 or the Connection Machine CM-2. The simulation environment allows both the study of data parallel fundamentals on simple computer systems and the development of parallel programs which can later be executed on expensive parallel computer systems. The programming environment for Parallaxis is available as public domain software. We had two major goals in developing a new parallel programming language: 1. We believe that `structured programming' has a number of advantages in developing sequential and parallel software, as well as for learning programming concepts in general. Therefore, we chose Modula-2 as base language and not C like many other approaches. 2. Almost all commercial SIMD programming languages are machine-dependent, since they have been specifically designed for a single hardware platform to achieve maximum performance. Therefore, one may not easily port a program from one SIMD system to another. To avoid this problem, we designed Parallaxis to be completely machine-independent. This refers to SIMD architectures only, for switching from SIMD to MIMD will almost always require to change the algorithm of your program, in order to keep it efficient. The central point of Parallaxis is programming on a level of abstraction with virtual PEs and virtual connections. In addition to the algorithmic description, every program includes a semi-dynamic connection declaration in functional form. This means that the 6 1 Language Definition desired PE network topology is specified in advance for each program (or for each procedure) and can be addressed in the algorithmic section with symbolic names (instead of complicated arithmetic index or pointer expressions). However, for completeness, also full-dynamic data exchange operations may be performed. The following describes the latest version of the language, Parallaxis-III, which is not fully compatible to older versions of the language. 1.2 Specifying Virtual Processors and Connections One or several `virtual machines' consisting of processors and a connection network may be defined for every Parallaxis program. This is done in two simple steps. First, the keyword CONFIGURATION is used to specify the number of PEs and their arrangement in analogy to an array declaration. However, at this point, no specification has been made as to the connection structure between the PEs. This follows by specifying mapping relations, introduced by the keyword CONNECTION (this second step may be omitted if no connections are required). Every connection has a symbolic name and defines a mapping from a PE (any PE) to the corresponding neighbor PE. The specification of this relative neighbor is accomplished by providing an arithmetic expression for the index of the destination PE. Data exchanges can now be carried out using these symbolic connection names in the parallel program. Figure 1.1 shows a PE arrangement as a two-dimensional grid structure in a simple Parallaxis example. The CONFIGURATION declaration provides 4 ? 5 virtual processors, which are virtually connected to one another in the following CONNECTION declaration. In Figure 1.1, PE no. [1,1] is in the upper left corner; however, you may also choose to put the `origin' in the lower left corner ? it does not matter as long as you use it consistently in your program. Since homogeneous connection structures or topologies are easy to declare, four connection declarations are sufficient to construct a grid of any size. One connection is defined for each cardinal direction. The connection to the north, for example, decrements the first index. Some connections from the border PEs lead `nowhere', which means that these connections do not exist, and will not participate in any data exchange operation. Figure 1.1: Two-dimensional grid topology with representative PE 1,1 1,5 4,1 4,5 north south east west CONFIGURATION grid [1..4],[1..5]; CONNECTION north: grid[i,j] fi grid[i-1, j]; south: grid[i,j] fi grid[i+1, j]; east : grid[i,j] fi grid[i, j+1]; west : grid[i,j] fi grid[i, j-1]; 1.2 Specifying Virtual Processors and Connections 7 In case one prefers a wrapped-around grid (torus) instead of an open grid for some application, this can be easily accomplished by using the modulo-operator (remainder). Identifiers h and w denote constants: CONFIGURATION torus [0..h-1],[0..w-1]; CONNECTION north: torus[i,j] fi torus[(i-1) MOD h, j]; south: torus[i,j] fi torus[(i+1) MOD h, j]; east : torus[i,j] fi torus[i, (j+1) MOD w]; west : torus[i,j] fi torus[i, (j-1) MOD w]; There is a list of extensions to this simple process of defining virtual machine structures: Several destination expressions may be specified after the arrow symbol, separated by commata. Connections may be parameterized, as with the hypercube in Figure 1.2. With these, it is possible to perform a data exchange in a computed direction. For the definition of the binary tree network, also in Figure 1.2, bi-directional connections (`?' or `<->' in ASCII notation) have been used instead of uni-directional connections (`fi' or `->'). A bi-directional connection is an abbreviation for two unidirectional connections and, therefore, requires a second connection name on the right hand side of the connection. So-called `compound connections' may be used to have a case distinction inside a connection. The following example connects local pairs of PEs: A case distinction is made for the next connection. If the PE-number is odd, a connection to the right neighbor is established, while if it is even, a connection to the left neighbor is established. Using compound connections, arbitrary connection Figure 1.2: Tree and hypercube topology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [ 0 ,0, 0,0 ] [ 0,0 ,0, 1] [ 0 ,0,1 ,1] [ 0 ,1,1 ,1] [ 1,1 ,1,1 ] [ 0,0 ,1,0 ] [ 0,1 ,0 ,1] Binary Tree Hypercube CONFIGURATION tree [1..15]; CONNECTION lchild: tree[i] ? tree[2*i] :parent; rchild: tree[i] ? tree[2*i+1] :parent; an alternative specification without distinguishing between left and right children is: CONFIGURATION tree [1..15]; CONNECTION child: tree[i] fi tree[2*i], tree[2*i+1]; CONFIGURATION hyper [0..1],[0..1],[0..1],[0..1]; CONNECTION go[1]: hyper[i,j,k,l] fi hyper[(i+1)MOD 2, j,k,l]; go[2]: hyper[i,j,k,l] fi hyper[i, (j+1)MOD 2, k,l]; go[3]: hyper[i,j,k,l] fi hyper[i,j, (k+1)MOD 2, l]; go[4]: hyper[i,j,k,l] fi hyper[i,j,k, (l+1)MOD 2]; 8 1 Language Definition structures, even with irregularities, may be defined. If several parts of a compound connection evaluate to TRUE for a particular PE, then for this PE all of these connections are defined (one-to-many or 1:n connection). The connection structures as defined by the CONNECTION declaration do not have to be 1:1 (one-to-one) connections. For 1:n connections, an implicit broadcast is executed. In the following example, the first element of each row is connected to all elements in this row. CONFIGURATION grid [1..100],[1..100]; CONNECTION one2many: grid[i,1] fi grid[i,1..100]; If a one-to-many connection is to be established to all PEs of a dimension, the range may be substituted by an asterisk `*'. So the following is equivalent to the previous example. CONFIGURATION grid [1..100],[1..100]; CONNECTION one2many: grid[i,1] fi grid[i,*]; However, for n:1 (many-to-one) connections (and the general m:n connections, many-to-many) one must ensure that only a single value arrives at any PE's entry port. Therefore, each data exchange operation may include a vector reduction, as will be discussed later. In the following example, all elements of a row are connected to the first element in their column. CONFIGURATION grid [1..3],[1..5]; CONNECTION many2one: grid[i,j] fi grid[i,1]; Figure 1.3: Compound connections Figure 1.4: Multiple connections 1 2 3 4 5 6 7 8 CONFIGURATION list [1..8]; CONNECTION next: list[i] fi {ODD (i)} list[i+1], {EVEN(i)} list[i-1]; one-to-many many-to-one 1.3 Multiple Configurations and Iterative Connections 9 1.3 Multiple Configurations and Iterative Connections In addition to the constructs shown so far, the definition of multiple topologies in a program is possible. These may be defined independently of each other on separate groups of PEs ? in which case the topologies may have different vector data structures. Or the topologies can be defined as `different views' of the same set of PEs with identical data structure. Furthermore, local topologies may be defined in procedures, thus allowing semi-dynamic connection structures. Different configuration definitions denote different sets of PEs. For example, the following declaration defines two distinct sets of PEs: CONFIGURATION grid [1..200],[1..50]; tree [1..10000]; On the other hand, configurations may be defined as a different view of the same set of PEs. In this case, the numbers of PEs have to be identical. In that case, each PE a CONFIGURATION grid [1..200],[1..50]; tree [1..10000] = grid; Connections may be specified between multiple configurations, whether they belong to the same or distinct sets of PEs: CONFIGURATION grid [0..199],[0..49]; tree [1..10000]; CONNECTION mix: grid[i,j] -> tree[i*50 + j]; However, a connection may only link PEs within the same or two different configurations. A connection between three or more configurations is not allowed, and a connection linking PEs within one configuration and PEs to another configuration (by the same connection name) is not allowed. Configurations only come to life, when they are used for the declaration of a vector variable, which is later used in a computation. A vector variable based on a configuration with multiple views can make use of all connection structures defined for any of the views. Such a variable is also assignment compatible to a variable of the same type but of a different view. An extension to the definition of connections is the use of iterative connection functions, as exemplified by the following alternative definition for a hypercube network of arbitrary size (the exponentiation function is denoted by `**', n is a constant): CONFIGURATION hyper [0..(2**n-1)]; CONNECTION FOR k := 0 TO n-1 DO dir[k]: hyper[i] ? {EVEN(i DIV 2**k)} hyper[i + 2**k] :dir[k]; END; If n equals 10, there are 1,024 PEs defined together with ten bi-directional connections. Expression EVEN(i DIV 2**k) tests, whether the k-th bit of i equals 0. A large program may be split into several modules, which are compiled separately. So, e.g. for a module containing library functions, it may be desirable not to specify the size of a configuration. When writing routines for image processing, the size of the 10 1 Language Definition grid structure should be left unspecified and will be defined later by the module importing these routines. An open configuration is indicated by using an asterisk `*' instead of a value range. The configuration size may be determined dynamically at run time, e.g. by passing a parameter that is subsequently used as an upper bound in the configuration declaration. DEFINITION MODULE Open; CONFIGURATION grid[*],[*]; CONNECTION left: grid[i,j] <-> grid[i ,j-1] :right; up : grid[i,j] <-> grid[i+1,j ] :down; PROCEDURE sum_3x3(input: grid OF INTEGER): grid OF INTEGER; END Open. MODULE Main; FROM Open IMPORT grid, sum_3x3; CONFIGURATION my_grid = grid[1..10],[1..10]; VAR a,b: my_grid OF INTEGER; BEGIN a := 1; b := sum_3x3(a); WriteInt(b,5); END Main. Open configurations are needed when a procedure is to work on a vector of unspecified size, but has to make use of connections for data exchange or position data (DIM, see section on processor positions). If connections and position data are not required in a procedure, which is to be used for different configurations (different size or arbitrary configuration), then the simpler concept of generic vector parameters may be used (VECTOR, see section on data declaration). 1.4 Data Declaration Parallaxis differentiates between scalar and vector variables in data declarations as well as in procedure parameters and results. Scalar data is placed on the control processor, while vectors are distributed component-wise among the virtual PEs (in Figure 1.5 configurations are not connected). Since a program may contain several PE Figure 1.5: Allocation of scalar and vector data on control processor a distributed on PEs b b b b b b b b b b b b b b b b b b b b distributed on PEs c c c c c c c c c c c c c c c VAR a: INTEGER; (* scalar *) b: grid OF REAL; (* vector *) c: tree OF CHAR; (* vector *) 1.4 Data Declaration 11 configurations with different numbers of PEs, the name of a configuration is used for specifying the vector type of a variable. There is, of course a fundamental difference between declaring an array of vectors and declaring a vector of arrays, as exemplified in Figure 1.6. However, both indexed expressions x[1] and y[1] refer to a data structure of the same type "grid OF INTEGER". Unfortunately, this declaration semantics has some unpleasant effect for procedure arguments. Imagine, e.g. writing a function factorial, for computing the factorial value for an argument of type INTEGER. Now, a factorial function would have to be declared for scalar arguments, and for every configuration defined in a program. Since there is no way of knowing them in advance, it would be impossible to write general library routines. To remedy this situation, parameters and variable declarations inside such a procedure may use the keyword VECTOR instead of a particular configuration name. This indicates that a parameter will be used in a parallel computation, without specifying a particular configuration (this results in a generic procedure). All parameters declared as generic vectors or variables in such a procedure have to belong to the same configuration. Since no particular configuration has been specified, no data exchange may be performed in such a procedure. PROCEDURE s_factorial(a: INTEGER): INTEGER; VAR b: INTEGER; (* scalar *) ... END s_factorial; PROCEDURE v_factorial(a: VECTOR OF INTEGER): VECTOR OF INTEGER; VAR b: VECTOR OF INTEGER; (* any vector *) ... END v_factorial; It is possible that several procedure parameters use the "open vector" or the "open configuration" construct for possibly different configurations. In this case, a local variable declaration with the method used above is not unique. Therefore, the name Figure 1.6: Array of vectors and vector of arrays x [ 1 ] x [ 1 ] x[ 1 ] x[ 1 ] x[ 1 ] x [ 1 ] x [ 1 ] x[ 1 ] x[ 1 ] x[ 1 ] x [ 1 ] x [ 1 ] x[ 1 ] x[ 1 ] x[ 1 ] x [ 1 ] x [ 1 ] x[ 1 ] x[ 1 ] x[ 1 ] x [ 2 ] x[ 2 ] x[ 2 ] x[ 2 ] x [ 2 ] x [ 2 ] x[ 2 ] x[ 2 ] x[ 2 ] x [ 2 ] x [ 2 ] x[ 2 ] x[ 2 ] x[ 2 ] x [ 2 ] x [ 2 ] x[ 2 ] x[ 2 ] x[ 2 ] x [ 2 ] x[ 3 ] x[ 3 ] x[ 3 ] x [ 3 ] x [ 3 ] x[ 3 ] x[ 3 ] x[ 3 ] x [ 3 ] x [ 3 ] x[ 3 ] x[ 3 ] x[ 3 ] x [ 3 ] x [ 3 ] x[ 3 ] x[ 3 ] x[ 3 ] x [ 3 ] x [ 3 ] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1, 2, 3] y [ 1, 2 , 3] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1, 2, 3] y [ 1, 2 , 3] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1, 2, 3] y [ 1, 2 , 3] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1 , 2, 3 ] y [ 1, 2, 3] y [ 1, 2 , 3] VAR x: ARRAY[1..3] OF grid OF INTEGER; y: grid OF ARRAY[1..3] OF INTEGER; array of vector vector of arrays 12 1 Language Definition of the "open vector" or open configuration parameter may follow the keyword VECTOR or the open configuration name in the type specification. PROCEDURE vec(a: VECTOR OF INTEGER; b: VECTOR OF INTEGER); VAR x: VECTOR a OF INTEGER; (* vector type corresponding to a *) y: VECTOR b OF INTEGER; (* vector type corresponding to b *) ... END vec; CONFIGURATION grid[*],[*]; PROCEDURE open(a: grid OF REAL; b: grid OF REAL); VAR x: grid b OF REAL; (* vector type corresponding to b *) ... END open; 1.5 Processor Positions There are two ways to determine a PE's current position. The first is by using the (vector-valued) standard function ID, which returns the virtual processor position as a single number in row-major ordering (or `highest-dimension-major' for more than two dimensions): This results in each component of x being assigned the number of its virtual PE, always starting with `1' (independent of the configuration range and number of dimensions): The second way of determining the position of a virtual PE, now with respect to its configuration declaration, is to use the standard function DIM. This function takes the configuration name and the number of the dimension as arguments and returns the position of a PE within this dimension. Dimensions are numbered from right to left, that is, the highest dimension is at the leftmost position. The position values returned by function DIM depend on the ranges of the configuration definition. The DIM function will return exactly this range of values as position data. In the following, position data for rows and columns of the grid example above are shown: CONFIGURATION grid [1..4],[-2..+2]; ... VAR x: grid OF INTEGER; ... x := ID(grid); 2nd dimension 1st dimension ID grid ( ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 = 1.5 Processor Positions 13 Functions, ID and DIM, as well as the following functions, do not depend on any PE connection. Further functions working on configuration data are shown below. These functions take either configurations or vector variables as arguments. LEN(grid) = 20 returns the total number of PEs in a configuration LEN(grid,1) = 5 returns the size of a dimension (here dim. 1) LEN(grid,2) = 4 returns the size of a dimension (here dim. 2) RANK(grid) = 2 returns the number of dimensions LOWER(grid,1) = ?2 returns the lower bound of a dim. (here dim. 1) UPPER(grid,1) = 2 returns the upper bound of a dim. (here dim. 1) A function and a procedure are provided to transfer ID values into DIM values and vice versa. An array is used for DIM values. Both routines may be used with scalar or vector parameters. These routines are especially useful for converting ID and DIM values of multiple configurations. VAR dims: ARRAY[1..2] OF INTEGER; s : INTEGER; ... dims[1] := 1; dims[2] := 2; s := DIM2ID(grid,dims); s becomes 9 (referring to grid definition above) ... ID2DIM(grid,12,dims); dims[1] becomes ?1 dims[2] becomes +3 An individual PE may be selected by using an IF-selection involving parallel position data like ID and DIM. However, supplying the position values directly for a vector is also possible and gives a simpler way of selecting PEs. A single PE may be selected using a component expression with scalar arguments after the identifier of a vector variable: VAR x : grid OF INTEGER; (* 2-dim. *) s,t: INTEGER; (* scalar *) ... s := x <<12>>; get the value of the PE with ID 12 x <<12>> := s; set the value of the PE with ID 12 s := x <:3,t+1:>; get value of the PE in row 3 and column t+1, according to CONFIGURATION ranges specified x <:t,1:> := s; set the value of the PE in row t and column 1 DIM grid 2 , ( ) 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 = DIM grid 1 , ( ) 2 ? 1 ? 0 1 + 2 + 2 ? 1 ? 0 1 + 2 + 2 ? 1 ? 0 1 + 2 + 2 ? 1 ? 0 1 + 2 + = 14 1 Language Definition 1.6 Parallel Execution Parallel execution is implicit in Parallaxis-III, depending on the declaration of variables involved in a statement or expression. PE-selection (determining which PEs will be active during a certain statement) is also implicit. Any selection or iteration instruction (IF, FOR, WHILE, REPEAT, CASE, LOOP) with a vector argument may be used. Figure 1.7 shows the data-parallel execution of a statement on a selected group of PEs. So whenever a selection is performed, e.g. by an IF statement with vector condition, only those PEs are active during execution of the THEN branch, whose local condition evaluates to TRUE. A THEN branch or an ELSE branch will only be executed if the condition (or its negation, respectively) will be satisfied by at least one PE. Primarily, this has an effect on any scalar statements that may be in such a branch, since in this case they will be skipped. In the general case, when the condition evaluates to TRUE for some PEs, but evaluates to FALSE for some other PEs, then both THEN branch and ELSE branch will be executed subsequently (first THEN, afterwards ELSE) with the appropriate group of PEs being active (this also holds for any scalar statements that may be contained in these branches). If vector IF statements are nested, then in one of the inner branches only a subset of the PEs of the corresponding outer branch is active. VAR x: grid OF INTEGER; ... IF x>5 THEN x := x - 3 ELSE x := 2 * x END; PE-ID: 1 2 3 4 5 initial values of x: 10 4 17 1 20 starting then-branch: 10 ? 17 ? 20 (`?' means inactive) after then-branch: 7 ? 14 ? 17 starting else-branch: ? 4 ? 1 ? after else-branch: ? 8 ? 2 ? selection done after if-selection: 7 8 14 2 17 Figure 1.7: Data parallel instruction x :=a+b x :=a+b x :=a+b x :=a+b x :=a+b x :=a+b x :=a+b x :=a+b x :=a+b x :=a+b VAR x,a,b: grid OF REAL; ... IF DIM(grid,2) IN {2,3} THEN x := a+b END; 1.6 Parallel Execution 15 The possibly subsequent execution of THEN- und ELSE-branches may lead to unexpected side effect, which are shown in the following sample program: VAR x: grid OF INTEGER; s: INTEGER, (* scalar *) ... IF x>5 THEN x := x - 3; INC(s); ELSE x := 2 * x; INC(s); END; initial values of s: 1 after then-branch: 2 after else-branch /if: 3 When entering a loop with vector condition (e.g. WHILE loop), only those PEs are active which satisfy the condition. In subsequent iterations of this loop, the number of PEs is always decreasing. The loop iterates until no PE is left to satisfy the loop condition (see the following example). VAR x: grid OF INTEGER; ... WHILE x>5 DO x:= x DIV 2; END; PE-ID: 1 2 3 4 5 initial values of x: 10 4 17 1 20 starting 1st iteration: 10 ? 17 ? 20 (`?' means inactive) after 1st iteration: 5 ? 8 ? 10 starting 2nd iteration: ? ? 8 ? 10 after 2nd iteration: ? ? 4 ? 5 starting 3rd iteration: ? ? ? ? ? loop terminates after loop: 5 4 4 1 5 Other control structures, known from sequential Modula-2 may be used as well in vector context. The CASE-selection can be treated as a nested chain of IF-THEN-ELSIF- selections, while FOR- and REPEAT-loops can be regarded as modifications of a WHILE- loop. The parallel LOOP-EXIT construct differs from its sequential counterpart, for it has to specify the name of the selected configuration, e.g.: LOOP OF grid DO ... IF x>0 THEN EXIT END; ... END; (* loop *) 16 1 Language Definition In case one would like to perform an operation on all PEs inside a nested selection or loop, the ALL block may be used to re-activate all PEs of a configuration. IF x>0 THEN ... (* only grid PEs are active, which satisfy condition *) ALL grid DO ... (* all grid PEs are active, regardless of condition *) END ELSE ... (* only grid PEs are active, which do not satisfy cond. *) END; 1.7 Structured Data Exchange Data exchanges between processors can be accomplished with simple symbolic names, thanks to the network declaration described earlier. Data exchange of a local vector variable between all or just a group of PEs can be invoked by calling system function MOVE with the name of a previously defined connection. Only active PEs participate in a data exchange operation. Figure 1.8 shows an example of a data exchange in the grid structure defined in section 1.2. The expression returns vector variable x shifted one step (one PE) to the east. For the data exchange operation shown above, sender-PE and receiver-PE of a data exchange have to be active. For the operations SEND and RECEIVE shown below, it is sufficient for only the sender (or only the receiver, respectively) to be active. These operations are especially needed for the data exchange between different topologies, since due to the SIMD model only one PE structure can be active at a time. Unlike the other data exchange operations, SEND is a procedure (not returning a value) and therefore takes two arguments, first the expression to be sent, and second the variable to receive the expression. SEND.east(4*x, y); y := RECEIVE.north(x); The following comparison shows the differences in data exchange operations by using a simple list topology. In figures, an arrow represents data transport, while a white space marks an inactive PE. Let us assume for the context of the following data exchange operations that all PEs are active but one (e.g. the data exchange might occur inside an IF-selection, which deactivated one of the PEs). CONFIGURATION list[1..max]; CONNECTION right: list[i] -> list[i+1]; VAR x,y: list OF INTEGER; Figure 1.8: Synchronous data exchange y := MOVE.east(x); 1.7 Structured Data Exchange 17 a) Only the sender has to be active. All active PEs, which have a successor, send data. SEND.right(x,y); active inactive active active Example: before after x: 1 2 3 4 x: 1 2 3 4 y: 0 0 0 0 y: 0 1 0 3 b) Only the receiver has to be active. All active PEs, which have a predecessor, receive data. y := RECEIVE.right(x); Example: before after x: 1 2 3 4 x: 1 2 3 4 y: 0 0 0 0 y: 1 0 2 3 c) Both sender and receiver have to be active. All active PEs, which have an active successor, send data. y := MOVE.right(x); Example: before after x: 1 2 3 4 x: 1 2 3 4 y: 0 0 0 0 y: 1 0 3 3 d) Neither sender nor receiver have to be active. All PEs, which have a successor, send data ? independent of their activation status. This version does not seem to make much sense, so it is not included in Parallaxis (use ALL plus data exchange statement instead). Please note: - SEND has to be a procedure for it writes to inactive PEs, while MOVE and RECEIVE are functions returning vector data. - All data exchange operations shift data in the direction indicated by the connection identifier (here: right). This also holds for the RECEIVE function, so it does not have the semantics "receive from a direction" (e.g. RECEIVE.right shifts data to the right, not to the left). Additional data exchange modifiers may be specified for some of the data exchange operations. One is for multiple data movement steps and one is for an implicit reduction; both are separated by colons. 18 1 Language Definition A data exchange operation may be executed several steps (subsequently) in a specified direction for the SEND statement. After each step, all PEs that received data will be active, sending data in the next step. In case one data element is sent to multiple PEs, then all of these will be active in the next step. This feature is extremely useful when exchanging data over a tree structure. VAR x,y: list OF INTEGER; ... SEND.right:2 (x,y); move data two steps to the east Example: (only PEs 1 and 2 are active, PEs 3 and 4 are inactive) before copy expr. 1st step 2nd step assignment x: 1 2 3 4 ? ? ? 1 2 3 4 intermediate variable: ? 1 2 3 4 1 1 2 4 1 1 1 2 ? y: 0 0 0 0 ? ? ? 0 0 1 2 A problem might occur, if several PEs are connected to a single one (many-to-one connection). There are two possibilities to avoid an undetermined result (any of the incoming data values could be chosen). One can either deactivate unwanted PEs (IF- selection), so they cannot participate in a data exchange, or one can use a reduction function with the data exchange. For example, in the tree network shown before, one might want to send only the left children's data to the parents and discard the right children's data: VAR u,v,w: tree OF INTEGER; ... v := u; initializes all components of v IF EVEN(ID(tree)) THEN moves only the left children's data up the tree SEND.parent (u,v) END; Now assume, one does not want to discard information, but one would like to add the left and right child's data before sending to the parent node: w := MOVE.parent:#SUM (u); moves data from both children to the parent, resolving multiple arriving data by adding There is a number of system-defined operations to do this reduction, and also userdefined operations may be specified (see section on reduction below). Figure 1.9: Move without and with reduction u before 1 5 3 2 4 1 0 v after 5 2 0 2 4 1 0 w after 8 6 1 2 4 1 0 1.8 Unstructured Data Exchange 19 Configuration boundaries often cause trouble in SIMD programming, for they frequently require special treatment to avoid undefined data. This is not the case for Parallaxis. Here, it is allowed to send data off the edge and try to receive data from beyond the edge of a configuration. After initializing the send expression with the vector parameter value supplied, data sent offside a configuration is deleted, while an attempt to read from beyond leaves the particular PE's data unchanged. Therefore, within the same configuration, undefined values cannot occur in a data exchange operation. 1.8 Unstructured Data Exchange Structured data exchange makes application programs easy to write and understand. In some cases it also makes them faster, when better use can be made of the physical connection structure of a particular parallel system. However, it may be desirable to perform an unstructured data exchange. This reflects an arbitrary permutation of the components of a vector variable, which may be difficult to write down using only structured data exchanges with user-defined connections. For example, each component of a two-dimensional vector (a matrix) is to be sent to a destination address, which is being computed at run-time. When only structured data exchange is possible, e.g. via a grid, one has to program a communication procedure, which shifts the matrix elements in several steps over the grid. This approach will work, however, some parallel computer systems (like the MasPar MP-1 and MP-2 [MasPar 91] and the Connection Machine CM-2 [Thinking Machines 89]) have a global connection structure, which allows an arbitrary unstructured data exchange. In this case, specifying direct destination addresses for each component of a vector variable may result in a faster program. Please note, that despite the availability of specialized commands for unstructured data exchange, execution of those may be quite expensive. For example, a grid operation at the MasPar MP-1 requires about the same time as a simple arithmetic operation (addition), but a nongrid data exchange takes about 100 times longer to execute. Please note that the unstructured data exchange is nevertheless a machine-independent operation. If a certain SIMD architecture does not provide a general communication structure, then this data exchange will be routed transparently over the simpler network provided (e.g. a grid or a ring) taking several execution steps. In Parallaxis, the SEND and the RECEIVE operations may take an index expression instead of a connection name. As before, when using SEND, only active PEs send data, and when using RECEIVE, only active PEs receive data. However, these two operations differ in their index semantics, as is shown for an example in Figure 1.10. In order to avoid confusion, operation MOVE may not be used with an index expression. VAR x,y,index: grid OF INTEGER; ... SEND.<>(x,y); sends data from all components of x to a destination, determined by vector index y := RECEIVE.<>(x); receives data from all components of x to a destination, determined by vector index, however, on the receiver's side 20 1 Language Definition Besides using a single index, referring to the ID position of PEs, several indices referring to DIM positions may be used as well. Also, this kind of data exchange does not have to be a one-to-one correspondance. If several indices refer to the same PE position, for RECEIVE (one-to-many) this results in a broadcast, while for SEND (manyto-one) an arbitrary component is selected ? unless a reduction operation (see section on reduction) is specified for resolving collisions. CONFIGURATION grid [1..2],[1..3]; VAR x,y, index,d1,d2: grid OF INTEGER; ... SEND.<:d2,d1:> (x,y); sends data from all components of x to a destination, determined in the first dimension by d1, and in the second dimension by d2 SEND.<>:#SUM (x,y); in case the expression index does not provide a 1:1 permutation, it may be desirable to perform a reduction of multiple values arriving in one port, in order to avoid the assignment of an arbitrary one of these; positions not indexed get the sender's original elements Figure 1.10: Unstructured data exchange PE 1 2 3 4 x index 7 4 9 5 3 1 4 2 y 4 5 7 9 PE 1 2 3 4 index y 3 1 4 2 9 7 5 4 PE 1 2 3 4 PE 1 2 3 4 x 7 4 9 5 SEND.<> (x,y); y := RECEIVE.<> (x); x 7 3 5 4 2 3 = d2 1 2 1 2 1 2 = d1 3 2 1 3 2 1 = then E.g. y 5 2 7 3 3 4 = x 7 3 5 4 2 3 = index 1 3 2 6 6 6 = then E.g. y 7 5 3 4 2 9 = 1.9 Exchange between Scalar and Vector Data 21 Two kinds of abbreviations are possible for data exchanges with index expressions: a) If the positions in one dimension are to remain unchanged, one should use the expression DIM(conf_name, dim_no) as an index. This may be abbreviated with the symbol `*'. SEND.<:DIM(grid,2),d1:> (x,y); is equivalent to: SEND.<:*,d1:> (x,y); b) If a dimension is to be collapsed, a many-to-one data exchange may be used in combination with a reduction. For example, a matrix may be collapsed to a single column by reducing all of its rows. This can be done by sending all row elements to the first element in the same row with a reduction operation. This may be abbreviated by using a reduction operation (e.g. #SUM) instead of an index expression. Only a single reduction may be specified in an index expression. SEND.<:d2,1:> :#SUM (x,y); is equivalent to the general: SEND.<:d2,LOWER(grid,1):> :#SUM (x,y); is equivalent to: SEND.<:d2,#SUM:> (x,y); dimension 1 is reduced by addition (rows accumulate data in their first positions) with permutation in dimension 2 according to d2 1.9 Exchange between Scalar and Vector Data Communication between the control processor and the parallel PEs also requires additional language constructs or in some cases an adapted semantics. Transferring a scalar field into a parallel vector is invoked with procedure LOAD, while transferring data back into a scalar field from a vector is accomplished with STORE (Figure 1.8). Only active PEs participate in this sequential data exchange. STORE with inactive PEs does not result in gaps in the scalar array, but data elements are stored subsequently. LOAD with inactive PEs assigns the next array value to the next active PE, no scalar array elements will be skipped. Surplus elements will not be used, too few elements leave the corresponding array elements (or vector components, respectively) unchanged. The execution of this operation usually requires n time steps for a data array with n elements. A scalar integer variable may be specified as an optional third parameter for LOAD and STORE, which limits the number of data items transferred and also receives the number of data items actually transferred after the operation. SUM x 7 3 5 4 2 3 = d2 1 1 1 2 2 2 = E.g. then y 15 3 5 9 2 3 = 22 1 Language Definition Figure 1.8 (bottom) also shows an assignment in which a (constant or variable) scalar data value is copied into all or a group of PEs. Every component of the vector contains the same value as the scalar. This operation is implemented by an implicit broadcast and therefore requires only a single time step. 1.10 Reduction The reduction of a vector to a scalar is another important operation. The REDUCE operation handles this task in conjunction with a system-defined or user-defined (programmable) reduction operation (see Figure 1.12). System-defined operators are: SUM, PRODUCT, MAX, MIN, AND, OR, FIRST, LAST The operators FIRST and LAST return the value of the first or last currently active PE, respectively, according to its identification number (ID). Example: s := REDUCE.FIRST(x) is identical to: pos := REDUCE.MIN(ID(x)); s := x<>; All other reduction operators' functions can easily be deduced from their names. The execution of a reduction operation requires about log2 n time steps for a vector with n active components. However, this time estimation depends on the physical connection structure of the PEs. Figure 1.11: Data exchanges between PEs and control processor STORE(v, s); (* from vector to scalar *) STORE(v, s, t); (* here, t becomes num. active PEs *) (* require n steps each *) Scalar Array Vector Components 4 2 6 3 1 2 inactive 4 2 6 3 1 surplus LOAD STORE v := t; (* requires 1 step *) Scalar Value Vector Components 7 7 7 7 7 7 inactive CONFIGURATION list[1..n]; VAR s: ARRAY[1..n] OF INTEGER; t: INTEGER; v: list OF INTEGER; ... LOAD (v, s); (* from scalar to vector *) 1.11 Modules 23 The following example shows the use of the REDUCE operation with a user-defined function. Such a function has to have two vector input parameters and has to return a vector value of the same type. Note that the reduction function implemented by the user should be associative and commutative, or unpredictable results may occur, e.g. (1 ? 2) ? 3 ? 1 ? (2 ?3 ). VAR v: grid OF BOOLEAN; s: BOOLEAN; ... PROCEDURE xor (a,b: VECTOR OF BOOLEAN): VECTOR OF BOOLEAN; BEGIN RETURN(a <> b); END xor; ... s := REDUCE.xor(v); There are a few places, where substituting a scalar constant in lieu of a vector variable makes sense, but lacks information about the configuration to be used. Consider the problem of counting the number of active PEs for some configuration. Instead of using a vector variable, the constant 1 can be used for each PE, however, it has to be type cast to the appropriate configuration: s := REDUCE.SUM( grid(1) ); 1.11 Modules In Parallaxis, like in Modula-2, each module consists of two files: a definition module and an implementation module. The only exception is the main module, which starts a program and does not have a definition module. Parallaxis also offers a FOREIGN module, which serves as the definition part for linking routines written in another language. The sample module constellation above shows declaration, export, and import of the user-defined procedure myproc. It is exported in the definition module together with its procedure head, showing number and type of parameters. The actual implementation of myproc is hidden in the corresponding implementation module (this is also used for including routines from different programming languages). If an export list is missing in a definition module, then its whole contents is being exported. Figure 1.12: Vector reduction in Parallaxis + + + 7 6 0 3 13 3 16 VAR s: INTEGER; x: grid OF INTEGER; s := REDUCE.SUM(x); Control Processor PEs 24 1 Language Definition The module importing and using procedure myproc in the example is the main module (it has neither keyword DEFINITION nor IMPLEMENTATION). There are two ways of importing objects. The first imports individual objects from a module, the second includes all objects from a module (in that case, however, an object name always requires the module name as a prefix, in order to avoid name conflicts): FROM sample IMPORT myproc; IMPORT sample; ... ... myproc(k); sample.myproc(k); 1.12 Input and Output I/O operations in Parallaxis are very similar to Modula-2. However, in Parallaxis they are "built-in" and need not be explicitly imported as in Modula-2. This change was required, since read- and write-operations in Parallaxis may take either scalar or vector arguments. For vector output, spaces and line breaks are inserted to make the printout more readable. The major text input and output operations in Parallaxis are shown below (ASCII data). I/O operations do not generate run-time errors (e.g. in case of inappropriate data to be read or in case of insufficient disk space during a write operation). Instead, the scalar boolean variable Done is set according to the success of the I/O operation. This variable can be checked to perform appropriate actions, in case its value is FALSE. When reading strings, integers, cardinals, reals, or booleans, the scalar character variable termCH is set to the next unread character in the input stream, which was responsible for terminating the read operation. EOL is a constant of type character, which stands for the system-dependent end-of-line character. Numbers are printed right-adjusted (add leading blanks), while strings/booleans are printed left-adjusted (add trailing blanks). WriteLn; Start a new line (no arguments) Write(c); Write character c WriteString(s) Write string s WriteInt(i,l); Write integer i using l print spaces min. WriteCard(c,l); Write cardinal c using min. l print spaces min. WriteReal(r,l); Write real r using l print spaces min. DEFINITION MODULE sample; EXPORT myproc; PROCEDURE myproc(VAR i: INTEGER); END sample. IMPLEMENTATION MODULE sample; PROCEDURE myproc(VAR i: INTEGER); BEGIN i := 2*i + 1 END myproc; END sample. MODULE mymain; FROM sample IMPORT myproc; VAR k: INTEGER; BEGIN k:=0; myproc(k); END mymain. 1.13 Control Structures 25 WriteFixPt(r,l,m); Write real r using l print sp. min. and m decimals WriteBool(b,l); Write boolean b using l print spaces min. Read(c); Read character c ReadString(s) Read string s ReadInt(i); Read integer i ReadCard(c); Read cardinal c ReadReal(r); Read real r ReadBool(b); Read boolean b OpenOutput(s); Open file with name s for writing (following write operations write to file) CloseOutput; Close file, redirect output back to stdout OpenInput(s); Open file with name s for reading (following read operations read from file) CloseInput; Close file, redirect input back to stdin The following gives a number of sample write operations; read operations work the same way. VAR i: INTEGER; r: REAL; ... WriteString("Hello"); (* write string on screen *) WriteInt(i,7); (* write integer value using 7 print spaces *) WriteLn; (* start writing in a new line *) OpenOutput("myfile"); (* open file, redirect output to file *) WriteString("Hello");(* write string to file *) WriteFixPt(r,9,2); (* write r to file, 9 print spaces, 2 dec.*) CloseOutput; (* close file, output back to screen *) WriteString("Hello"); (* write string on screen *) 1.13 Control Structures Sequential control structures are also identical to Modula-2. Please note, that in Modula-2 (unlike Pascal) all control structures have to be terminated with keyword END (with the exception of REPEAT, which has a different terminating keyword) and all procedures/functions repeat the procedure name with the final END (see the following examples). All control structures may be used with scalar or vector arguments. IF x=0 THEN y:=1; z:=5 FOR x:=1 TO 10 DO ELSE y:=2 y:=2*y END; END; WHILE x>0 DO REPEAT x:=x-1; y:=2*y x:=x DIV 2 END; UNTIL x<7; 26 1 Language Definition LOOP x:=x-1; CASE x OF IF x<7 THEN EXIT 1,3,7: z:=5; y:=3 | END; 8..15: z:=1 | END; ELSE z:=3; y:=4 END; PROCEDURE abc(VAR c:CHAR); PROCEDURE def(x: INTEGER): INTEGER; (* reference parameter *) (* value param. + return value *) BEGIN (* procedure *) BEGIN (* function *) ... RETURN(x+1) END abc; END def; 1.14 Relation to Modula-2 Parallaxis is a true extension of sequential Modula-2, with one exception. Nesting of local modules (these are nested modules within the same file) is not allowed, as an implementation restriction. However, nested imports of several modules are possible. Besides this limitation, there are also some sequential extensions to the Modula-2 syntax. First, a comparison may have an arbitrary number of expressions. This allows range checks not possible in Modula-2, like the following: IF 7 < x < 12 THEN ... END; Next, the power operator `**' has been included. It is also possible to specify constant records or constant arrays by using the type name as a function identifier (only for record and arrays of simple types). This is a useful feature for initializing record or array variables: TYPE colorR = RECORD red, green, blue: INTEGER; END; colorA = ARRAY [1..3] OF INTEGER; VAR c: colorR; d: colorA; ... c := colorR(255,255,0); d := colorA(0,255,255); Like in the programming language C, the number and string contents of the command line parameters can be read in Parallaxis: VAR i : INTEGER; buf: ARRAY [1..20] OF CHAR; ... FOR i:=1 TO argc DO argv(i,buf); WriteString(buf); WriteLn; END; Mathematical and I/O operations are pre-defined in Parallaxis and do not have to be explicitly imported, as in Modula-2. This is necessary, since all the procedures and functions may take either scalar or vector parameters. 1.15 Efficiency 27 There is some confusion about the MOD operator in Modula-2 and C implementations. In Parallaxis, the result of a modulo operation will always be positive, especially an expression like (-1) MOD 5 equals 4 in Parallaxis (as does 1 MOD (-5)), and not -1 as in C. This feature is heavily used in connection specifications, e.g. to construct a torus: CONFIGURATION ring[0..max-1]; CONNECTION left: ring[i] -> ring[(i-1) MOD max]; 1.15 Efficiency Most of the following comments apply only to the generation of parallel code for a SIMD system and not for the simulation on a single-processor workstation. Naturally, some Parallaxis statements require more computation time than others. This overview summarizes remarks made throughout the text. 1. PE Data Exchange structured data exchange may be faster than unstructured data exchange 1.a especially a structured data exchange on a grid may be much faster (if it corresponds to the hardware, e.g. 100 times on a MasPar MP-2 system) 1.b a structured data exchange on a torus may be as fast as on a grid, only if the torus size equals a possible torus in hardware 1.c using the open configuration construct will cause some run-time overhead, since the connections have to be initialized 2. Reduction 2.a reduction operations require log2 n steps for n PEs, each step comprising a data exchange and an arithmetic operation (in general this data exchange will be unstructured) 2.b using a data exchange with reduction may be very expensive and should therefore be avoided if possible (connection groups may have to be executed sequentially) 3. Front End Data Exchange data exchange to and from the front end (LOAD and STORE), as well as read and write operations with vector data require n steps for n PEs (sequential execution) 4. Broadcasting 4.a broadcasting a single value from the front end to all PEs requires only 1 step 4.b broadcasting a single value from one PE to all other PEs requires 2 steps (1 from PE to front end plus 1 from front end to all PEs) 28 1 Language Definition Chapter 2 Applications The following sections demonstrate a number of sample algorithms programmed in Parallaxis-III. They cover a broad range of application areas and should inspire to get the feeling of data parallel programming. 2.1 Basic Applications 2.1.1 Cellular Automata Cellular automata are an application area well suited to SIMD systems. Every cell can be assigned a processor and carries out the same processing instructions. One of the most prominent cellular automata is Conway's `Game of Life', which is a two-dimensional structure changing in time. The cellular automaton shown here is simpler and only onedimensional; however, it generates a two-dimensional image during execution (one line on every iteration). The processing instructions are conceptually simple: Every cell has only two possible states and computes its successor state from an exclusive-or of the states of its left and right neighbor. The middle cell is initialized with TRUE (printed as `X'), while all of the other cells are initialized with FALSE (printed as empty spaces). MODULE cellular_automaton; CONST n = 79; (* number of elements *) m = (n+1) DIV 2; (* number of loops := middle *) CONFIGURATION list [1..n]; CONNECTION left: list[i] <-> list[i-1] :right; VAR i : INTEGER; val: list OF BOOLEAN; c : list OF ARRAY BOOLEAN OF CHAR; (* = ARRAY[FALSE..TRUE] OF CHAR *) BEGIN val := ID(list) = m; (* Init *) c[FALSE]:= " "; c[TRUE] := "X"; FOR i := 1 TO m DO Write(c[val]); val := MOVE.left(val) <> MOVE.right(val); END; END cellular_automaton. The CONFIGURATION and CONNECTION declarations define a doubly linked list of PEs. For screen output of the current state of all PEs, every boolean state is converted to a 30 2 Applications symbol of type CHAR. The complete character vector is then written to the screen sequentially by a vector valued Write operation. The initialization of the main program assigns the value TRUE only to the middle PE with number (n+1 DIV 2); all other PEs receive the value FALSE. Finally, there is a scalar loop which prints the current state and calculates the next cell state in each pass, using data exchange with the left and right neighbors. Figure 2.1 shows the states of this cellular automaton, in which time progresses from top to bottom. X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Figure 2.1: Output of cellular automaton 2.1.2 Generation of Fractals The algorithm presented here is a problem that can be solved by a data parallel program with the `divide-and-conquer' method. This is, however, not possible with all divide-and-conquer algorithms, since the different branches usually have to carry out different program parts. The algorithm presented here generates a onedimensional fractal curve using midpoint displacement (see [Peitgen, Saupe 88]). It starts with a straight line, which has its midpoint displaced up or down according to a weighted random value. Two line segments with different slopes arise from this process, and these are handled recursively in parallel in exactly the same way during the following steps (see Figure 2.2). The number of line segments to be processed doubles at every step, until the required resolution is reached. The processor structure used here is a binary tree structure. Beginning with the root, in each step the following tree level is activated until the leaves are reached. After computing the leaf level, the program prints the result values of the whole tree and terminates. Since with the exception of the communication there is only one tree level active at a time, only half as many PEs would be sufficient with a more complicated connection structure. A simple tree structure is declared in the Parallaxis program, through which start and end points of the lines are passed (`**' denotes the exponential operator). Only procedure MidPoint is shown, where the actual processing takes place. The tree levels are activated successively, and the midpoint displacement is carried out for each level 2.1 Basic Applications 31 (using function Gauss not shown here, which returns a gaussian distributed random value). While the leaves have not yet been reached, the data values low and high are passed to the child nodes. In this method, the left child receives the values low and x as the start and end points of its line segment, while the right child receives x and high. CONFIGURATION tree [1..maxnode]; CONNECTION lchild : tree[i] <-> tree[2*i] :parent; rchild : tree[i] <-> tree[2*i+1] :parent; VAR low, high, x: tree OF REAL; PROCEDURE MidPoint(delta: REAL; level: INTEGER); BEGIN (* select level *) IF 2**(level-1) <= ID(tree) <= 2**level-1 THEN x := 0.5 * (low + high) + delta*Gauss(); IF level < maxlevel THEN SEND.lchild (low,low); (* values for children *) SEND.lchild (x,high); SEND.rchild (x,low); SEND.rchild (high,high); END; END; END MidPoint; The computation time required for this program is log2 n steps for n leaf nodes (equal to the tree's height). Figure 2.3 shows a fractal curve generated by this program with 127 PEs, while Figure 2.3 shows the beginning of another fractal number sequence interpreted as fractal music. Figure 2.2: Divide-and-conquer implemented with tree topology Figure 2.3: Fractal curve generated by program starting state after 1st step after 2nd step after nth step PEs Arranged as a Binary Tree 32 2 Applications 2.1.3 Sorting A number of different SIMD algorithms exist for the problem of sorting. The `odd-even transposition sort' (OETS) is presented here as a representative, which can be understood as a parallel version of bubble-sort. OETS sorts n data elements with n PEs in n steps. Figure 2.5 shows the progression of the parallel algorithm. Every PE holds one of the numbers to be sorted. During processing, the algorithm differentiates between odd and even steps. In odd steps, all of the PEs with odd identification numbers compare their element values with that of their right neighbor (PEs: 1?2, 3?4, 5?6, etc.) and carry out a data exchange if the value of their own element is larger than that the one of their neighbor. In the even steps, all of the PEs with even identification numbers carry out an analogous comparison and, if necessary, a data exchange with their right neighbor (PEs: 2?3, 4?5, 6?7, etc.). After n iterations, the list is sorted. The right part of Figure 2.5 visualizes the algorithm, with each line in the image representing the current state of the list of PEs, while the time is progressing from top to bottom. Each pixel in a line is represented by a gray value according to its data value (low values are dark, high values are bright). In the beginning (top line), the gray values (data values) are unsorted, while they become slowly disentangled during progression of the algorithm, until the bottom line contains the gray values in perfect order. The movement of individual cells to the left or right reminds of bubble-sort. In the Parallaxis program, first the variable lhs determines whether a PE is in the role of the left or the right partner of a comparison. This role changes for each loop iteration. During each pass, the PEs get the data values of their left or right neighbor in comp. The partner on the left-hand side uses the right Figure 2.4: Fractal music Figure 2.5: Example of odd-even transposition sorting Numbers distributed among the PEs 4 1 5 2 3 3. odd 4. even Steps: 4 5 2 1 3 4 5 1 2 3 2. even 1. odd 1 4 2 5 3 1 2 4 3 5 1 2 3 4 5 5. odd 2.1 Basic Applications 33 neighbor's value for comparison, while the partner on the right-hand side uses the value of its left neighbor. The complicated comparison `lhs = (comp list [i-1] :right; VAR step : INTEGER; a : ARRAY[1..n] OF INTEGER; val,comp: list OF INTEGER; lhs : list OF BOOLEAN; BEGIN WriteString('Enter 10 values: '); ReadInt(val); lhs := ODD(ID(list)); (* PE is left-hand-side of comparison *) FOR step:=1 TO n DO IF lhs THEN comp := RECEIVE.left(val) ELSE comp := RECEIVE.right(val) END; IF lhs = (comp=val) *) END; WriteInt(val,5); END sort. Figure 2.6 shows the runtime behaviour of the OETS sorting algorithm for sorting of 100 numbers. The graph displays the number of active PEs (y-axis) over time (simulated program steps on x-axis). The PE load curve shows a continuous high processor utilization with repetitive steps at half load, reflecting the IF statement. Figure 2.6: Processor utilization of the sorting algorithm 34 2 Applications 2.2 Image Processing Many image processing operations are ideal for SIMD processing. This holds especially for local operators, which use image data only from a limited neighborhood with fast data exchange. We demonstrated the versatility of our approach for a wide range of image operations in a textbook (Bräunl et al.: Parallele Bildverarbeitung), which is also very well suited for course work in this area. Two of these operators will be discussed in the following. 2.2.1 Laplace Filter The Laplace operator is one possible operator for emphasizing edges in a gray-scale image (edge detection). The operator carries out a simple local difference pattern and is therefore well suited to parallel execution. The Laplace operator is applied in parallel to each pixel with its four neighbors. Figure 2.7 shows the application of the Laplace operator with a subsequent threshold. A procedure for the Laplace operator with threshold in Parallaxis, using an open grid structure is defined in the following: CONFIGURATION grid[*],[*]; (* open grid *) CONNECTION north: grid[i,j] <-> grid[i-1,j] :south; east : grid[i,j] <-> grid[i,j-1] :west; ... PROCEDURE Laplace_thres(x: grid OF INTEGER): grid OF BOOLEAN; VAR temp: grid OF INTEGER; BEGIN temp := 4*x -MOVE.north(x) -MOVE.south(x) -MOVE.east (x) -MOVE.west (x); RETURN temp > 150; END Laplace_thres; The data from neighbor PEs is obtained by local data exchange operations, which are directly used in the arithmetic expression to be returned. This simple procedure does not handle range limits for gray-scale values, e.g. [0..255]. Figure 2.7: Edge detection with Laplace operator ?1 ?1 ?1 4 ?1 2.2 Image Processing 35 2.2.2 Dithering Dithering transforms a gray scale image to a binary image by converting the original gray scale intensities to black and white patterns. The apparent increase of intensity levels created by the patterns is being traded against the lower resolution in the binary image. A simple technique with fixed patterns is ordered dithering or halftoning. Figure 2.8 shows dithering with 2?2 patterns, enabling the use of five different intensities. The following program is a parallel implementation of ordered dithering. The computation is actually performed only on every fourth PE. Three quarters of the PEs remain inactive, while only the gray scale values of the PEs in the upper left corners of each 2?2 pattern are used. The binary result pattern is set according to this gray scale value. A constant threshold (thres) is used to divide the whole gray scale range in five areas. The binary result value for the upper left position (res) is set (black pixel) if the gray scale input value is less than the threshold. The binary result value for the right neighbor is set (black pixel) if the gray scale input value is less than three times the threshold. The remaining two neighbors are determined the same way, according to the patterns in Figure 2.8. The movement of the neighbor pixels to the right position is performed by standard procedure SEND (moving data to inactive PEs). PROCEDURE dither_ordered(img: grid OF gray): grid OF binary; CONST thres = g_white DIV 5; VAR res: grid OF binary; BEGIN IF ODD(DIM(grid,2)) AND ODD(DIM(grid,1)) THEN res := img < thres; (* upper left corner *) SEND.right (img < 3*thres,res); SEND.down (img < 4*thres,res); SEND.down_r(img < 2*thres,res); END; RETURN res; END dither_ordered; Figure 2.8: Dithering with 2x2 pattern Figure 2.9: Ordered dithering Intensity 0 Intensity 1 Intensity 2 Intensity 3 Intensity 4 black white 36 2 Applications The resulting image of ordered dithering for 2?2 patterns can be seen in Figure 2.9. Larger patterns, e.g. 3?3 or 4?4, may also be used. 2.3 Simulation Many simulation models require only SIMD style computation and, furthermore, exhibit a local data exchange pattern. The simulation presented here models a very simplified behavior of cars on a single lane street. If the car concentration exceeds a certain threshold, sudden and unmotivated traffic jams occur. For this simulation in Parallaxis, two disjoint configurations have been used. One configuration for the cars and one for the street segments. Cars may not take over each other, so they keep their linear order. The street is modeled as a closed ring. CONFIGURATION cars[0..max_cars-1]; CONNECTION next: cars[i] <-> cars[(i+1) MOD max_cars] :back; CONFIGURATION street[0..width-1]; VAR pos, dist, speed, accel: cars OF REAL; collision : cars OF BOOLEAN; my_car : street OF BOOLEAN; time, z : INTEGER; At initialization all cars are started at equal distance across the street. pos := FLOAT(DIM(cars,1)) / FLOAT(max_cars); The simulation itself is a large FOR-loop, which generates one graphics line for each iteration. If there is sufficient space in front of a car, it accelerates up to a maximum speed by a constant value plus a small random term. The randomness prevents all cars from maintaining identical distances from each other. Collisions are detected in parallel by measuring the distance of all pairs of subsequent cars. They cause a sudden stop, from which the cars can again accelerate in the subsequent simulation step. The integration required for determining velocity and position from acceleration has been simplified to summation. FOR time := 1 TO steps DO ... (* show "collision" at line "time" *) my_car := DIM(street,1) = TRUNC(pos<:0:> * FLOAT(width)); ... (* show "my_car" at line "time" *) dist := MOVE.back(pos) - pos; IF dist < 0.0 THEN dist := dist + 1.0 END; (* close street to loop *) collision := dist < 0.0; IF collision THEN speed := 0.0; ELSE (* no collision, accelerate *) accel := max_accel + rand_fac * (RandomReal(cars)-0.5); (* brake, if necessary *) IF dist < min_dist THEN accel := - max_accel END; 2.3 Simulation 37 (* update speed, apply speed limit *) speed := min(speed + accel, max_speed); (* do not back up on autobahn ! *) IF speed < 0.0 THEN speed := 0.0 END; (* update position *) pos := pos + speed; (* leaving right, coming back in left *) IF pos >= 1.0 THEN pos := pos - 1.0 END; END; END; A sample simulation run of the traffic program is shown in Figure 2.10. The street is modeled as a closed ring, displayed as a horizontal line, while time flows from top to bottom in the figure. Standing cars are marked as bright spots. Also, the route of one individual car is shown, starting in the upper left corner. It is easy to recognize the acceleration phase of the individual car (parabolic curve), leading to a phase of continuous speed (straight line). Sudden breaks occur due to heavy traffic, simply caused by too many cars on the street. Several spontaneous traffic jams occur in this simulation, all slowly propagating in the direction opposite to the driving direction. Some congestions are increasing, while others are decreasing. Figure 2.10: Simulation of traffic congestion 38 2 Applications Chapter 3 Compiler and Debugger Several Compilers and a source-level Debugger are the tools for the Parallaxis environment. The debugger also contains the features for performance analysis, which used to be a separate tool in earlier versions. Figure 3.1 shows the interaction of the Parallaxis tools (shaded boxes) with standard Unix tools (white boxes) on workstations and the MasPar massively parallel system. 3.1 Compiler We have developed several compilers for Parallaxis-III. Here, the compiler for generating sequential C-code (simulation system) will be discussed. There are further Parallaxis compilers for the MasPar MP-1/MP-2 (SIMD) and the Intel Paragon (MIMD, programmed in SPMD mode) or workstation clusters using PVM (parallel Figure 3.1: Parallaxis Tools Parallaxis source program C program MPL program seq. p3 compiler par. p3 compiler gnu C compiler MPL compiler sequential executable parallel executable run xp3gdb debugger run MPPE debugger 40 3 Compiler and Debugger virtual machine). All compilers generate C code (or MPL code in case of the MasPar), so a subsequent compilation step is necessary to generate object code. The Cocktail compiler construction tools from GMD/Univ. Karlsruhe have been used to build the Parallaxis-III compilers. The compiler option list is shown in Figure 3.2. NAME p3 -- Parallaxis-III Compiler User Interface V0.5 DESCRIPTION Compile some Parallaxis-III programs and call backend compiler. SYNOPSIS p3 [options] [file] ... OPTIONS -C Generate C-code for simulation (default) -casts Generate type casts to make C-programs lint free -cc name Name of the backend compiler to use -g Generate debug code (also passed to backend compiler) -h, -H, -help Print this usage -headers Generate header files for imported modules -Ipath Add path to import/include list (Par. and backend) -indent i Set indent of generated code to i blanks -koption Pass option directly to backend compiler -Lpath Add path to library path (backend only) -m, -mem Print statistics about used memory -MPL, -mpl Generate MPL-code for MasPar -n, -nocompile Don't compile, just show comands (implies -v) -nop3inc Don't use standard include paths -nop3lib Don't use standard library paths -nodefaults Same as -nop3inc -nop3lib -o name Name of the generated executable -p Parallaxis compile only, don't call backend compiler -c Paral. and backend compile only, don't call linker -Ppath Add path to import list (Parallaxis only) -PVM, -pvm Generate PVM-code for Paragon -r, -rchecks Don't generate runtime range checks -s, -small Generate small MPL-only model (max. 128KB) -t, -time Print statistics about used time (Parallaxis only) -tt, -total Like -t, but also for backend compiler -v Print version of p3 and the resulting compiler calls -vv Like -v, passes also -v to backend compiler -w Don't generate warnings -ww Like -w, passes also -w to backend compiler OPTIONS ONLY AVAILABLE DURING DEVELOPMENT -Zw Write code tree -Zs Write symbols tree -Zq Query code tree -Zc Check code tree -Z1 Run parser only, no semantic check -Z2 Run parser and semantic check only, no code generation Every other option is passed unchanged to the backend compiler. ENVIRONMENT P3CC Name of the backend compiler P3INC ":"-seperated list of paths where to find sources P3LIB ":"-seperated list of paths where to find libraries P3OPT Default options always to set Figure 3.2: Compiler options 3.2 Debugger 41 The configurations of Parallaxis, i.e. the PEs, are implemented by linear arrays. Each configuration keeps track about which (virtual) PEs are active and which are not (the "active-set" of the configuration). 3.2 Debugger A compiler just by itself is not sufficient for parallel program development or even for education purposes. Therefore, we decided to develop also a source level debugger for Parallaxis. Despite starting from scratch, we used the gnu debugger gdb and its graphics interface xxgdb as a base. First, this C debugger had to be taught to behave as if being a Parallaxis source level debugger. This affects not only the source line window and the positioning of break points, but also (and more difficult) the presentation of Parallaxis data types, especially vector data. The command names are: p3gdb for the Parallaxis/gnu debugger in text mode xp3gdb for the Parallaxis/gnu debugger in graphics mode Figure 3.3 shows an excerpt of the debugger man page, while Figure 3.4 shows a typical sample debugging session. Second, we added a number of graphics facilities. Especially for large vectors (e.g. two-dimensional image or simulation data), it is not very entertaining to examine large lists of data. Instead we provided the possibility to look at vector data directly in a graphics window. One- or two-dimensional data is displayed in a window with little boxes representing individual PEs (Figure 3.5). Each box is colored (rainbow colors or gray scale) according to its data value, and drawn hollow if inactive. Position numbers may be added and the data range may be fixed. The vector window can display a static state (command print) or adapt dynamically to changing data (command display). The PE usage may also be displayed graphically. Here, the program is executed in single step mode and the number of active PEs is determined at each step. Due to the overhead of stepwise evaluation, execution time slows down when using this feature. The PE usage values produce a tell-tale curve of the application program's parallel characteristics and are a valuable help in localizing critical program regions for optimization of the execution time. Figure 3.6 shows the PE usage curve for the prime sieve sample program. MODULE prime; CONFIGURATION list [2..200]; CONNECTION (* none *); VAR next_prime: INTEGER; removed : list OF BOOLEAN; BEGIN REPEAT next_prime:= REDUCE.FIRST(DIM(list,1)); WriteInt(next_prime,10); WriteLn; removed := DIM(list,1) MOD next_prime =0 UNTIL removed END prime. 42 3 Compiler and Debugger NAME xp3gdb - X window system interface to the p3gdb debugger. SYNOPSIS xp3gdb [ -toolkitoption ... ] [-xp3gdboption ... ] [- gdboption ... ] [objfile [ corefile ]] DESCRIPTION Xp3gdb is a graphical user interface to the gdb debugger under the X Window System. It provides visual feedback and mouse input for the user to control program execution through breakpoints, to examine and traverse the function call stack, to display values of variables and data struc- tures, and to browse source files and functions. ... Special Vector Commands PE USAGE Pop up a window with a graphical representation of the amount of processors in use for a special configura- tion. When clicking in the command widget on top of the window, a menu will pop up. The style of view can be changed through the Settings entry between solid and point, absolute and relative. Scale lines can be activated. Also horizontal zooming can be modified. Vertical zooming is done automatically, depending on the window size. Through the menu entry Save values the current usage values can be saved into a file. The format of the file is: # Maximum: integer # PEs in use follow this line integer integer ... Lines starting with # are comment lines. The first non-comment line specifies the total number of prozes- sors within this configuration, all other lines specify the number of active prozessors through the steps done so far. print VECTOR Pop up a window with a graphical display of the current content for a vector variable. Currently only grids and lists are supported for processor configurations. All other configurations will be mapped to one of these. When clicking in the command widget on top of the win- dow, a menu will pop up with various selections. disp VECTOR Will pop up the same window as for print VECTOR, except that the displayed variable will be refreshed automati- cally each time execution is stopped. Figure 3.3: Debugger commands 3.2 Debugger 43 Figure 3.4: Debugger Control Window Figure 3.5: Vector display 44 3 Compiler and Debugger This tiny program represents the parallel version of the sieve of Eratosthenes. The list of active PEs resembles the candidates for prime numbers not yet removed. In the beginning all PEs are active, which is reflected by the initial peak in Figure 3.6. But in each step of the REPEAT loop, variable removed becomes true for all multiples of the just found prime, whose PEs will no longer be active in the next iteration of the loop. This explains the exponentially-like decrease in the PE usage diagram. Figure 3.6: PE usage Chapter 4 Appendix 4.1 Data Types Parallaxis supports all data types available in Modula-2: INTEGER integer numbers (4 bytes) CARDINAL integer numbers greater or equal to 0 (4 bytes) REAL floating point numbers (8 bytes) CHAR character values (1 byte), ordinal values ranging from 0 to 255 BOOLEAN truth values (1 byte), equal to (FALSE,TRUE) ARRAY .. OF .. array of data values RECORD .. END collecting several entries to a structured type SET OF .. set of a simple type BITSET equivalent to SET OF [0..31] POINTER TO .. pointer to a type, data has to be allocated dynamically PROCEDURE (..) procedure or function type, e.g.TYPE function = PROCEDURE (REAL): REAL; enumeration enumerating all values of a type (1 byte) e.g. TYPE day = (mo, tu, we, th, fr, sa, su); subrange limited range of values (from 1 to 4 bytes) e.g. TYPE workday = [mo..fr]; digit = [0..9]; 4.2 Built-in Functions and Procedures All built-in functions and procedures may be called with either scalar or vector arguments. The return value of such a function is scalar if its argument is scalar, and vector if its argument is vector. Type INTEGER always includes type CARDINAL. In Modula-2, mathematical functions and I/O-procedures are separate modules and their objects have to be imported. In Parallaxis, however, all these functions and procedures can be used with either scalar or vector arguments. Therefore, they had to be included as built-in functions and procedures. General Functions FLOAT(i) return real value of integer i TRUNC(r) return value of real r, truncated to an integer 46 4 Appendix ABS(i) return absolute value of i, type Integer or Real CHR(i) return character with ordinal number i VAL(t,i) return value of type t with ordinal number i ORD(c) return ordinal number of c for character or enumeration type, starting with 0 MAX(t) return maximum value of type t MIN(t) return minimum value of type t SIZE(t) return number of bytes required for a variable of type t HIGH(a) return upper bound of open array a ODD(i) return "i MOD 2 <> 0", type integer or char EVEN(i) return "i MOD 2 = 0", type integer or char CAP(c) return capital character, corresponding to c General Procedures NEW(p) allocate memory for a new data element, set p to address DISPOSE(p) deallocate memory for the element pointed to by p pointer data type is POINTER TO base_type nil element is NIL for NEW and DISPOSE, it is required (like in Modula-2) to import procedures ALLOCATE and DEALLOCATE from module STORAGE DEC(i) i := i ?1, type integer or character DEC(i,n) i := i ?n, type integer or character INC(i) i := i +1, type integer or character INC(i,n) i := i +n, type integer or character EXCL(s,n) exclude element n from set s INCL(s,n) add element n to set s HALT terminate program execution Command Line Arguments PROCEDURE argc: CARDINAL; number of command line parameters PROCEDURE argv(index: CARDINAL; VAR arg: ARRAY OF CHAR); get command line parameters (range 1..argc) Mathematical Functions The following functions have real arguments and return real values (scalar or vector). pi constant p = 3.1415926535897932385 sqrt(r) square root, exp(r) exponent, er ln(r) natural logarithm, ln(r) r 4.2 Built-in Functions and Procedures 47 sin(r) sine arcsin(r) arcus sine, range -1 .. +1 cos(r) cosine arccos(r) arcus cosine, range -1 .. +1 tan(r) tangent arctan(r) arcus tangent, range ?p/2 .. +p/2 arctan2(r1,r2) arcus tangent of r1 / r2 in range ?p .. +p The following functions have real arguments and return integer values. ceiling(r) round r to the next higher integer floor(r) round r to the next lower integer round(r) round r to the closest integer The following functions generate random numbers. For the parallel versions, a vector variable may be used in lieu of a configuration name conf . RandomInt() generate a scalar integer random (MIN(INTEGER) .. MAX(INTEGER)) RandomCard() generate a scalar cardinal random (0..MAX(CARDINAL)) RandomReal() generate a scalar real random number (0.0 .. 1.0) RandomChar() generate a scalar character random value (CHR(0..255)) RandomBool() generate a scalar boolean random value (TRUE or FALSE) RandomInt(conf) generate a vector integer random (0..MAX(INTEGER)) RandomCard(conf) generate a vector cardinal random (0..MAX(CARDINAL)) RandomReal(conf) generate a vector real random number (0.0 .. 1.0) RandomChar(conf) generate a vector character random value (CHR(0..255)) RandomBool(conf) generate a vector boolean random value (TRUE or FALSE) Input/Output Procedures The following procedures may take either scalar or vector arguments. I/O operations do not generate run-time errors, but set the scalar boolean variable Done according to the success of the operation. All read operations besides Read set the scalar character variable termCH. Vector data is printed in ascending ID order for all active PEs. Some rudimentary formatting is performed, when printing vector data. All integer, real, boolean, or string vector data (but not vector characters) will be separated by blanks when printed. A carriage return (new line) is inserted after the end of each dimension (e.g. this will print a two-dimensional vector in matrix format). Numbers are printed right-adjusted (add leading blanks), while strings/booleans are printed left-adjusted (add trailing blanks). WriteLn Start a new line (no arguments) Write(c) Write character c WriteString(s) Write string s WriteInt(i,l) Write integer i using l print spaces WriteCard(c,l) Write cardinal c using l print spaces WriteReal(r,l) Write real r using l print spaces WriteFixPt(r,l,m) Write real r using l print spaces and m decimals WriteBool(b,l) Write boolean b using l print spaces Read(c) Read character c 48 4 Appendix ReadString(s) Read string s ReadInt(i) Read integer i ReadCard(c) Read cardinal c ReadReal(r) Read real r ReadBool(b) Read boolean b The following procedures take scalar arguments only. OpenOutput(s) Open file with name s for writing (if s is empty, then the user is prompted for a file name; following write operations write to file) CloseOutput Close file, redirect output back to stdout OpenInput(s) Open file with name s for reading (if s is empty, then the user is prompted for a file name; following read operations read from file) CloseInput Close file, redirect input back to stdin Vector Functions For the following functions and procedures, who take a configuration name as argument, a vector variable may be used in lieu of a configuration name. Argument num is a positive integer, dims is a vector array of integers. ID(conf) returns a vector of identification numbers from 1 to LEN(conf) for configuration conf DIM(conf,num) returns a vector of position numbers according to the dimension declaration of conf, dimensions are numbered from right to left from 1 to RANK(conf) LEN(conf) returns the total number of PEs of a configuration LEN(conf,num) returns the size of dimension num RANK(conf) returns the number of dimensions UPPER(conf,num) returns the upper bound of dimension num LOWER(conf,num) returns the lower bound of dimension num DIM2ID(conf,dims) transforms the appropriate number of dim-values into the corresponding id-value, may be used with scalar or vector parameters MOVE.dir(val) data transfer in predefined direction dir, sender and receiver have to be active RECEIVE.dir(val) data transfer in predefined direction dir, only receiver has to be active REDUCE.func(val) reduces vector val to a scalar value, by applying func (predefined: SUM, PRODUCT, MAX, MIN, AND, OR, FIRST, LAST, or user function with two input parameters and a return value of identical type) Vector Procedures ID2DIM(conf,num,dims) transforms an id-value into an array of corresponding dim-values, may be used with scalar or vector parameters SEND.dir(val, var) data transfer in predefined direction dir, variable var receives expression val, only sender has to be active 4.3 Graphics Interface 49 4.3 Graphics Interface X Window Graphics Module Graphics provides a convenient interface in Parallaxis for generating graphics, based on the X window system. Type CARDINAL restricts INTEGER to positive numbers (? 0). 1 DEFINITION MODULE Graphics; 2 (* ************************************************ *) 3 (* created: T. Braunl, 12.Apr.94 *) 4 (* ************************************************ *) 5 6 FROM ImageIO IMPORT binary, gray, color; 7 8 TYPE window = CARDINAL; 9 CONST GrOK = 0; 10 GrWrongWindowSize = 1; 11 GrMemoryTrouble = 2; 12 GrWindowCreationFailure = 3; 13 GrNotExistingWindow = 4; 14 GrNoWindowSelected = 5; 15 GrWrongCoordinates = 6; 16 GrWrongColors = 7; 17 18 VAR GraphicsError : CARDINAL; 19 (* will be set to the outcome of each graphics operation, 0 = OK *) 20 21 PROCEDURE OpenWindow(title: ARRAY OF CHAR; width,height: CARDINAL): window; 22 PROCEDURE OpenWindowP(title: ARRAY OF CHAR; width,height: CARDINAL): window; 23 (* open new window, version "P" for private colormap *) 24 PROCEDURE CloseWindow(window_num: window); 25 PROCEDURE SelectWindow(window_num: window); 26 (* OpenWindow: Open a new window of specified title and size in *) 27 (* pixels, return window number, activate new window *) 28 (* OpenWindowP: same a OpenWindow, but use private colormap for *) 29 (* this window *) 30 (* CloseWindow: close window; SelectWindow: activate window *) 31 32 PROCEDURE GetScreenSize(VAR width,height: CARDINAL); 33 PROCEDURE GetWindowSize(VAR width,height: CARDINAL); 34 (* Returns size in pixels of whole screen or active window, resp. *) 35 36 PROCEDURE SetArea (c: VECTOR OF color); 37 PROCEDURE SetgArea(c: VECTOR OF gray); 38 PROCEDURE SetbArea(c: VECTOR OF binary); 39 PROCEDURE SetAreaXYZ (c: VECTOR OF color; x,y,zoom: INTEGER); 40 PROCEDURE SetgAreaXYZ(c: VECTOR OF gray; x,y,zoom: INTEGER); 41 PROCEDURE SetbAreaXYZ(c: VECTOR OF binary; x,y,zoom: INTEGER); 42 (* All active PEs write one pixel each, assuming a 2-dim. grid *) 43 (* configuration of appropriate size, image will be truncated or *) 44 (* left partly unchanged if PE grid size does not match image size *) 50 4 Appendix 45 (* origin is top left corner with coordinates 0, 0 *) 46 (* version "g" for gray data, version "b" for binary data *) 47 (* versions "XYZ" with offset and zoom (both pos. or neg.) *) 48 (* zoom range determines pixel size, negative zoom reduces size *) 49 50 PROCEDURE SetLine (c: VECTOR OF color; line: CARDINAL); 51 PROCEDURE SetgLine(c: VECTOR OF gray; line: CARDINAL); 52 PROCEDURE SetbLine(c: VECTOR OF binary; line: CARDINAL); 53 PROCEDURE SetLineZ (c: VECTOR OF color; line: CARDINAL; offset, zoom: INTEGER); 54 PROCEDURE SetgLineZ(c: VECTOR OF gray; line: CARDINAL; offset, zoom: INTEGER); 55 PROCEDURE SetbLineZ(c: VECTOR OF binary; line: CARDINAL; offset, zoom: INTEGER); 56 (* print PE data along specified line, top line has no. 0 *) 57 (* versions "Z" get offset and zoom according to specification *) 58 (* from SetArea routines. *) 59 60 PROCEDURE SetColumn (c: VECTOR OF color; col: CARDINAL); 61 PROCEDURE SetgColumn(c: VECTOR OF gray; col: CARDINAL); 62 PROCEDURE SetbColumn(c: VECTOR OF binary; col: CARDINAL); 63 PROCEDURE SetColumnZ (c: VECTOR OF color; col: CARDINAL; offset, zoom: INTEGER); 64 PROCEDURE SetgColumnZ(c: VECTOR OF gray; col: CARDINAL; offset, zoom: INTEGER); 65 PROCEDURE SetbColumnZ(c: VECTOR OF binary; col: CARDINAL; offset, zoom: INTEGER); 66 (* print PE data along specified column, leftmost line has no. 0 *) 67 (* versions "Z" get offset and zoom according to specification *) 68 (* from SetArea routines. *) 69 70 PROCEDURE SetColor (c: color); 71 PROCEDURE SetgColor(c: gray); 72 PROCEDURE SetbColor(c: binary); 73 (* Specify color for subsequent pixel/line/draw commands *) 74 (* "g" and "b" version for gray and binary data *) 75 76 PROCEDURE SetPixel(x,y: CARDINAL); 77 PROCEDURE GetPixel(x,y: CARDINAL): color; 78 PROCEDURE Line(x1,y1, x2,y2: CARDINAL); 79 (* scalar functions for writing or reading a single pixel *) 80 (* scalar line: straight line from x1,y1 to x2,y2 *) 81 82 PROCEDURE DrawAt(x,y: CARDINAL); 83 (* Set drawing position for subsequent drawing commands *) 84 (* initialized with lower left character position (1,1) *) 85 (* position will be updated by each drawing command *) 86 87 PROCEDURE Draw (c: CHAR); 88 PROCEDURE DrawString(s: ARRAY OF CHAR); 89 PROCEDURE DrawInt (i,l: INTEGER); 90 PROCEDURE DrawCard (c,l: CARDINAL); 91 PROCEDURE DrawReal (r: REAL; l: CARDINAL); 92 PROCEDURE DrawFixPt (r: REAL; l,m: CARDINAL); 93 PROCEDURE DrawBool (b: BOOLEAN; l:CARDINAL); 94 PROCEDURE DrawLn; 95 (* amount of l print spaces is always minimum *) 96 (* Draw character c in window *) 4.3 Graphics Interface 51 97 (* Draw string s in window *) 98 (* Draw integer/cardinal i in window *) 99 (* Draw real r in window *) 100 (* Draw with fix point, using m decimals *) 101 (* Draw boolean value b in window left adjusted *) 102 (* DrawLn set drawing position to new line *) 103 (* Total line width depends on used font ($PARALLAXIS_FONT), *) 104 (* new line vertical placement also. *) 105 106 END Graphics. Image File Transfer Module ImageIO contains a number of operations for reading images from files and writing images to files (no X window interface is needed). The binary versions of the PPM (portable pixel map) file format are used for color, gray scale, and binary images. 1 DEFINITION MODULE ImageIO; 2 (* created: T. Braunl, Oct.94 *) 3 (* ************************************************* *) 4 (* read_c_image read color image in ppm P6 format *) 5 (* into a vector variable *) 6 (* read_g_image read gray image in ppm P5 format *) 7 (* read_b_image read binary image in ppm P4 format *) 8 (* *) 9 (* write_c_image write color image in P6 format *) 10 (* write_g_image write gray image in P5 format *) 11 (* write_b_image write binary image in P4 format *) 12 (* *) 13 (* PE-ID 1 (origin) is top left *) 14 (* ************************************************* *) 15 16 TYPE binary = BOOLEAN; 17 gray = [0..255]; 18 color = RECORD 19 red, green, blue: gray 20 END; 21 22 CONST b_black = TRUE; 23 b_white = FALSE; 24 g_black = 0; 25 g_white = 255; 26 c_black = color( 0, 0, 0); 27 c_white = color(255,255,255); 28 c_red = color(255, 0, 0); 29 c_green = color( 0,255, 0); 30 c_blue = color( 0, 0,255); 31 32 PROCEDURE read_c_image (VAR im: VECTOR OF color; filename: ARRAY OF CHAR; 33 VAR width,height: CARDINAL); 34 PROCEDURE read_g_image (VAR im: VECTOR OF gray; filename: ARRAY OF CHAR; 35 VAR width,height: CARDINAL); 52 4 Appendix 36 PROCEDURE read_b_image (VAR im: VECTOR OF binary; filename: ARRAY OF CHAR; 37 VAR width,height: CARDINAL); 38 39 PROCEDURE write_c_image (im: VECTOR OF color; filename: ARRAY OF CHAR; 40 width,height: CARDINAL); 41 PROCEDURE write_g_image (im: VECTOR OF gray; filename: ARRAY OF CHAR; 42 width,height: CARDINAL); 43 PROCEDURE write_b_image (im: VECTOR OF binary; filename: ARRAY OF CHAR; 44 width,height: CARDINAL); 45 46 END ImageIO. 4.4 Parallaxis-III Syntax 53 4.4 Parallaxis-III Syntax Parallaxis-III Syntax is specified in EBNF (Extended Backus-Naur-Form). 1 CompilationUnit = ProgramModule | DefinitionModule | ImplementationModule | ForeignModule . 2 ProgramModule = MODULE Ident ';' { Import } Block Ident '.' . 3 DefinitionModule = DEFINITION MODULE Ident ';' { Import } [ Export ] { Definition } END Ident '.' . 4 ImplementationModule= IMPLEMENTATION MODULE Ident ';' { Import } Block Ident '.' . 5 ForeignModule = FOREIGN MODULE Ident ';' { Import } [ Export ] { Definition } END Ident '.' . 6 Definition = CONFIGURATION {ConfigDeclaration ';'}' | CONNECTION { ConnectionDeclaration ';'} | CONST {ConstantDeclaration ';'} | TYPE {Ident [ '=' GeneralType ] ';'} | VAR {VariableDeclaration ';'} | ProcedureHeading ';' . 7 Import = [ FROM Ident ] IMPORT IdentList ';' . 8 Export = EXPORT [ QUALIFIED ] IdentList ';' . 9 Block = {Declaration } [ BEGIN StatementSequence ] END . 10 Declaration = CONFIGURATION {ConfigDeclaration ';'} | CONNECTION {ConnectionDeclaration ';'} | CONST {ConstantDeclaration ';'} | TYPE {TypeDeclaration ';'} | VAR {VariableDeclaration ';'} | ProcedureDeclaration ';' . 11 ConfigDeclarartion = Config {';' Config } . 12 Config = [ Ident ConfigRange | Ident '=' Qualident ConfigRange | Ident ConfigRange '=' Qualident ] . 13 ConfigRange = '[' ConstExpression '..' ConstExpression ']` | '[' '*' ']' . 14 ConnectionDeclaration = TransferFunction | FOR Ident ':=' Expression TO Expression DO TransferFunction {';' TransferFunction } END . 15 TransferFunction = [ Direction ':' Qualident '[' Source {',' Source }']' ( '?>' Dest1 {',' Dest1} | '' Dest2 {',' Dest2} ) ] 16 Direction = Ident [ '[' Source ']' ] . 17 Source = Ident | Integer | String | '*' . 18 Dest1 = [ Discriminant ] Qualident '[' DestExprList ']' . 19 Dest2 = [ Discriminant ] Qualident '[' DestExprList ']' :' Ident [ '[' Expression ']' ] . 54 4 Appendix 20 DestExprList = DestExpr {',' DestExpr } . 21 DestExpr = Expression | Expression '..' Expression | '*' . 22 Discriminant = '{' Expression '}' . 23 ConstantDeclaration = Ident '=' ConstExpression . 24 ConstExpression = Expression . 25 TypeDeclaration = Ident '=' GeneralType . 26 GenaralType = SimpleType | GeneralArrayType | GeneralRecordType | SetType | PointerType | ProcedureType | ( VECTOR | Qualident ) [ Qualident ] OF ScalarType . 27 ScalarType = SimpleType | ScalarArrayType | ScalarRecordType | SetType | PointerType | ProcedureType . 28 GeneralArrayType = ARRAY SimpleType { ',' SimpleType } OF GeneralType . 29 GeneralRecordType = RECORD FieldListSequence1 END . 30 FieldListSequence1 = [ FieldList1 { ';' FieldList1 } ] . 31 FieldList1 = IdentList ':' GeneralType | CASE [ Ident ] ':' Qualident OF Variant { '|' Variant} [ ELSE FieldListSequence2 ] END . 32 Variant = [ CaseLabelList ':' FieldListSequence2 ] . 33 CaseLabelList = CaseLabels { `,' CaseLabels } . 34 CaseLabels = ConstExpression [ '..' ConstExpression ] . 35 ScalarArrayType = ARRAY SimpleType { ',' SimpleType } OF ScalarType . 36 ScalarRecordType = RECORD FieldListSequence2 END . 37 FieldListSequence2 = [ FieldList2 {';' FieldList2 } ] . 38 FieldList2 = IdentList ':' ScalarType | CASE [ Ident ] ':' Qualident OF Variant { '|' Variant } [ ELSE FieldListSequence2 ] END . 39 SimpleType = Qualident | Enumeration | SubrangeType . 40 Enumeration = '(' IdentList ')' . 41 SubrangeType = [ Qualident ] '[` ConstExpression '..' ConstExpression ']' . 42 SetType = SET OF SimpleType . 43 PointerType = POINTER TO GeneralType . 44 ProcedureType = PROCEDURE [ FormalTypeList ] . 45 FormalTypeList = '(' [ [ VAR ] FormalType { ';' [ VAR ] FormalType } ] ')' [ ':' [ ( VECTOR | Qualident ) OF ] Qualident . 4.4 Parallaxis-III Syntax 55 46 FormalType = [ ( VECTOR | Qualident ) OF ] [ ARRAY OF ] Qualident . 47 VariableDeclaration = IdentList ':' GeneralType . 48 ProcedureDeclaration = ProcedureHeading ';' Block Ident . 49 ProcedureHeading = PROCEDURE Ident [ FormalParameters ] . 50 FormalParameters = '(' [ Parameter { ';' Parameter } ] ')' [ ':' Qualident ] . 51 Parameter = [ VAR ] IdentList ':' FormalType . 52 StatementSequence = Statement {';' Statement } . 53 Statement = [ Assignment | ProcedureCall | IfStatement | CaseStatement | WhileStatement | RepeatStatemen | LoopStatement | ForStatement | WithStatement | EXIT | RETURN [ Expression ] | AllStatement | SendStatement | LoadStatement | StoreStatement ] . 54 Assignment = Designator ':=' Expression . 55 ExprList = Expression { ',' Expression } . 56 Expression = SimpleExpression { RelationOperator SimpleExpression }. 57 RelationOperator = '=' | '#' | '<>' | '<' | '<=' | '>' | '>=' | IN . 58 SimpleExpression = [ '+' | '?' ] Term { AddOperator Term } . 59 AddOperator = '+' | '?' | OR . 60 Term = Power { MulOperator Power } . 61 MulOperator = '*' | '/' | DIV | MOD | AND | '&' . 62 Power = Factor { '**' Factor } . 63 Factor = Number | String | Set | Designator [ ActualParameters ] | MoveFunction | ReceiveFunction | ReduceFunction | ArrayInitializer | RecordInitializer | '(' Expression ')' | NOT Factor | '~' Factor . 64 Set = [ Qualident ] '{' [ Element { ',' Element } ] '}' . 65 Element = ConstExpression [ '..' ConstExpression ] . 66 ActualParameters = '(' [ ExprList ] ')' . 67 MoveFunction = MOVE '.' DirSpecifier '(' Expression ')' . 56 4 Appendix 68 DirSpecifier = Ident [ '[' Expression ']' ] [ ':#' ReductIdent ] | '<<' Expression '>>' [ ':#' ReductIdent ] | '<:' Dimension { ',' Dimension } ':>' [ ':#' ReductIdent ]. 69 SendSpecifier = Ident [ '[' Expression ']' ] [ ':' StepSpecifier ] [ ':#' ReductIdent ] | '<<' Expression '>>' [ ':#' ReductIdent ] | '<:' Dimension { ',' Dimension } ':>' [ ':#' ReductIdent ] . 70 StepSpecifier = '(' Expression ')' | Qualident | Integer . 71 ReductIdent = AND | OR | Qualident . 72 ReceiveFunction = RECEIVE '.' DirSpecifier '(' Expression ')' . 73 ReduceFunction = REDUCE '.' ReductIdent '(' Expression ')' . 74 ArrayInitializer = Qualident '(' ExprList ')' . 75 RecordInitializer = Qualident '(' ExprList ')' . 76 ProcedureCall = Designator [ ActualParameters ] . 77 IfStatement = IF Expression THEN StatementSequence { ELSIF Expression THEN StatementSequence} [ ELSE StatementSequence ] END . 78 CaseStatement = CASE Expression OF Case { '|' Case } [ ELSE StatementSequence ] END . 79 Case = [ CaseLabelList ':' StatementSequence ] . 80 WhileStatement = WHILE Expression DO StatementSequence END . 81 RepeatStatement = REPEAT StatementSequence UNTIL Expression . 82 ForStatement = FOR Ident ':=' Expression TO Expression [ BY ConstExpression ] DO StatementSequence END . 83 LoopStatement = LOOP [ OF Qualident DO ] StatementSequence END . 84 WithStatement = WITH Designator DO StatementSequence END . 85 AllStatement = ALL Qualident DO StatementSequence END . 86 SendStatement = SEND '.' SendSpecifier '(' Expression ',' Designator ')' . 87 LoadStatement = LOAD '(' Designator ',' Designator [ ',' Designator ] ')' . 88 StoreStatement = STORE '(' Designator ',' Designator [ ',' Designator ] ')' . 89 Designator = Qualident { << Expression >> | <: ExprList :> | '.' Ident | '[' ExprList ']' | '^' } . 90 Dimension = Expression | '*' | '#' ReductIdent . 91 Qualident = Ident [ '.' Ident ] . 92 IdentList = Ident { ',' Ident } . 93 Ident = Letter { Letter | Digit } . 94 Letter = ( 'A' .. 'Z' | 'a' .. 'z' | '_' ) . 95 String = ' ' ' { Character } ' ' ' | ' " ' { Character } ' " ' . 96 Number = Integer | Real . 97 Integer = Digit { Digit } [ 'D' ] | OctalDigit { OctalDigit } ( 'B' | 'C' ) 4.4 Parallaxis-III Syntax 57 | Digit { HexDigit } 'H' . 98 Real = Digit { Digit } '.` { Digit } [ ScaleFactor ] . 99 ScaleFactor = 'E' [ '+' | '?' ] Digit { Digit } . 100 HexDigit = Digit | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' . 101 Digit = OctalDigit | '8' | '9' . 102 OctalDigit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' . 58 4 Appendix 4.5 Literature [Bräunl 89] T. Bräunl, Structured SIMD Programming in Parallaxis, Structured Programming, vol. 10, no. 3, July 1989, pp. 121-132 (12) [Bräunl 91] T. Bräunl, Designing Massively Parallel Algorithms with Parallaxis, Proceedings of the 15th Annual International Computer Software & Applications Conference, compsac91, Sep. 1991, pp. 612-617 (6) [Bräunl 93] T. Bräunl, Parallel Programming ? An Introduction, Prentice Hall, Englewood Cliffs NJ, 1993 [Bräunl, Feyrer, Rapf, Reinhardt 95] T. Bräunl, S. Feyrer, W. Rapf, M. Reinhardt, Parallele Bildverarbeitung, Addison-Wesley, Bonn, 1995 [MasPar 91] MasPar Computer Corporation, MasPar Programming Language (ANSI C compatible MPL) User Guide, Software Version 2.2, MasPar System Documentation, DPN 9302-0101, Dec. 1991 [Peitgen, Saupe 88] H.-O. Peitgen, D. Saupe (Eds.), The Science of Fractal Images, Springer-Verlag, Berlin Heidelberg New York, 1988 [Thinking Machines 89] Thinking Machines Corporation, Connection Machine Model CM-2 Technical Summary, version 5.1, Technical Report, May 1989 [Wirth 83] N. Wirth, Programming in Modula-2, Springer-Verlag, Berlin Heidelberg New York, 1983 PostScript-Fehler (rangecheck, readline)