PGPVM2 is an enhancement package for PVM 3.3 that produces trace files for use with standard ParaGraph. PGPVM2 attempts to give an accurate portrayal of applications by minimizing the perturbation inherent with this type of monitoring. PGPVM2 does not utilize standard PVM tracing but instead its own buffered tracing techniques to provide more accurate monitoring information. Further, PGPVM2 provides a shellscript that performs some postprocessing and produces a pgfile.trf, i.e. a standard ParaGraph trace event file. Tachyon removal and clock synchronization are performed during postprocessing when necessary.
PGPVM2, including a brief documentation and examples, is available as a gziped tar file pgpvm2.tar.gz from ftp://phalanstere.univ-mlv.fr. ParaGraph may be obtained from netlib or by anonymous Ftp from netlib2.cs.utk.edu.
You may prefer simply to get the postscript documentation.
Motivation
As computers are getting more and more connected via world-wide networks, a new chance of gaining computing power from the net arises. Thus, a collection of heterogeneous even small computers can now clusters computing power to exceed that of the sequential computers. However, it seems clear that the full potential of such distributed systems cannot be realized without similar advances in parallel software, just because parallel computation in a distributed environment is not so easy whereas it seems quite common to run large sequential scientific calculations on a single huge processor.
In other words, the software has to be adapted to the new situation. It means that new algorithms have to be developed or sequential ones to be adapted, but it also implies that the program must take into account the distribution of the computation onto the different machines. This is why PVM has been developed. The PVM software provides an environment to build and use distributed programs in an efficient manner. It provides a framework in which one sees a collection of heterogeneous computers as a single parallel virtual machine.
On the other hand, to be able to analyze and understand the behavior of parallel programs, in order to improve their general performance, it seems quite natural to use graphical visualization techniques. In fact, this is true for parallel computers including an advanced architecture but it remains true in the case of distributed systems. In both cases, the user will gain insight into the analysis of large volumes of trace data by using a special tool devoted to this purpose. Such tools exist, among which Paragraph which provides a detailed, dynamic, graphical animation of the behavior of message-passing parallel programs.
Now, the PGPVM software has been
developed to connect PVM
and Paragraph. More explicitly,
PGPVM is an enhancement package for PVM 3.3 that produces
trace files for use with standard Paragraph. PGPVM
attempts to give an accurate portrayal of applications by
minimizing the perturbation inherent with this type of monitoring.
PGPVM does not utilize standard PVM tracing but instead its
own buffered tracing techniques to provide more accurate
monitoring information. Further, PGPVM provides
a shellscript that performs some postprocessing and
produces a
The new version PGPVM2, has been developed to be more flexible than PGPVM. It allows to trace any PVM program and provides new possibilities with respect to the interface between PVM and Paragraph. PGPVM2 uses a new external task, named pg_administrator, that counts all tasks that want to be traced. In particular, this administrator is able to simulate a parallel virtual machine having a fixed number of processors. This is useful at the level of Paragraph when for instance a PVM program uses, say, 100 tasks while at most 8 are effectively running in parallel. The program which has been developed for a 8-processor parallel computer can thus be tested using PVM and analyzed using PGPVM2 because Paragraph will only see 8 processors. For instance, it gives the following Paragraph visualization windows:
|
|
which respectively describe the utilization of individual processors (or more precisely processes) together with the overall load balance across processes, and the status of each process (busy, idle, sending, receiving) in the virtual parallel system represented as a graph whose nodes denote processes and whose arcs represent communication events between processes. This is more convenient even if one can learn how to decode Paragraph trace files, as suggested by the following example:
| -3 -901 0.023579 0 -1 0 | trace_start clock 23578 node 0 | |
| -3 -52 0.061272 0 -1 1 2 -1 | recv_blocking clock 61271 node 0 | |
| -3 -901 0.182373 1 -1 0 | trace_start clock 182373 node 1 | |
| -3 -21 0.189360 1 -1 3 2 432 5 0 | send_entry clock 189359 node 1 to 0 type 5 lth 432 | |
| -4 -21 0.189726 1 -1 0 0xb2 | send exit clock 189725 node 1 | |
| -3 -52 0.189773 1 -1 1 2 -1 | recv_blocking clock 189772 node 1 | |
| -4 -52 0.190360 0 0 3 2 432 5 1 | recv_waking clock 190359 node 0 from 1 type 5 lth 432 | |
| -3 -21 0.193675 0 -1 3 2 136 6 1 | send entry clock 193674 node 0 to 1 type 6 lth 136 | |
| -4 -52 0.236901 1 0 3 2 136 6 0 | recv_waking clock 236901 node 1 from 0 type 6 lth 136 | |
| -4 -21 0.240374 0 -1 0 | send exit clock 240373 node 0 | |
| -3 -52 0.240422 0 -1 1 2 -1 | recv_blocking clock 240421 node 0 | |
| -3 -21 0.433116 1 -1 3 2 640 5 0 | send entry clock 433115 node 1 to 0 type 5 lth 640 | |
| -4 -52 0.434116 0 0 3 2 640 5 1 | recv_waking clock 434116 node 0 from 1 type 5 lth 640 | |
| -3 -21 0.439659 0 -1 3 2 200 6 1 | send entry clock 439658 node 0 to 1 type 6 lth 200 | |
| -4 -21 0.440059 0 -1 0 | send exit clock 440059 node 0 | |
| -3 -52 0.440105 0 -1 1 2 -1 | recv_blocking clock 440104 node 0 | |
| -4 -21 0.445879 1 -1 0 | send exit clock 445879 node 1 | |
| -3 -52 0.445931 1 -1 1 2 -1 | recv_blocking clock 445930 node 1 | |
| -4 -52 0.446151 1 0 3 2 200 6 0 | recv_waking clock 446150 node 1 from 0 type 6 lth 200 |
Installing the PGPVM2 Library
Details about how to install PGPVM2 can be found in the postscript documentation.
Using PGPVM2 with C Programs
Using PGPVM2 requires only three minor modifications to a PVM application. First, the application must include a new header file, "pgpvm2.h". This should go directly under the standard "pvm3.h" header file. The new header file provides macros that replace normal PVM routines with calls to the PGPVM2 library. Calls to the PVM library in the application source code need not be modified. The source does however need to be recompiled.
The second modification is the addition of the
pg_startadmin(char *outfile, char *host, int nbt_admin_max)
library routine in the first spawned task. The name outfile
corresponds
to the output trace file name, which is chosen by default (when the
outfile name is "")
to be pgfile.
The exact behavior of the first call to the pg_tids function is to declare a new task to the administrator. Moreover, this first call informs the administrator whether the task is or not traced from now on. If the task wants its trace to stop, it must call pg_close(). Then, if it wants to produce again some trace data, it must call pg_tids("Y").
In the first release of PGPVM, the pg_tids(int *tid, int nprocs) library routine had a completely different behavior. The array of tid's (nprocs denoted the number of elements in the array) informed PGPVM of which processes should produce tracing information. All processes that wished to participate in tracing had to call pg_tids() and all had to pass the same tids array all other processes passed. In PGPVM2, this is more flexible.
To illustrate the use of PGPVM2, let us consider a simple example. The following program corresponds to the source father.c in the EXAMPLE/C subdirectory of pgpvm2. It starts with the first modification discussed previously, that is the header file pgpvm2.h included just after pvm3.h:
#include <stdio.h> #include <malloc.h> #include "pvm3.h" /* includes pvm3.h */ #include "pgpvm2.h" /* includes pgpvm2.h after pvm3.h */Then, it continues with several macros and since it is the first spawned task, it contains the function call to pg_startadmin in which we have specified that the maximal number of tasks simultaneously producing trace information will be NBSONS1+NBSONS2+1. Then, the program specifies that it wants also to produce trace information by running pg_tids("y"):
#define SONS1 "son1"
#define SONS2 "son2"
#define NBSONS1 5 /* nb son1 tasks */
#define NBSONS2 3 /* nb son2 tasks */
void main()
{ int i; /* variable for loop */
int mytid; /* my task id */
int nbs=NBSONS1+NBSONS2; /* total number of sons */
int *tids; /* tids array */
int info; /* pvm info */
pg_startadmin("", "", NBSONS1+NBSONS2+1);
pg_tids("y");
The PGPVM2 library provides two extra functions
pg_beglab(int label) and pg_endlab(int label)
that allow a program to associate an initeger label (at the
level of Paragraph) for a section
of its source. The pg_beglab begins the section while the
pg_endlab ends it.
This is the case in the next lines of
father.c:
pg_beglab(1);
tids=(int *)malloc(sizeof(int)*(nbs+1));
assert(tids!=NULL);
tids[0]=mytid=pvm_mytid();
info=pvm_spawn(SONS1, (char **)0,
PvmTaskDefault, "", NBSONS1, tids+1);
info=pvm_spawn(SONS2, (char **)0,
PvmTaskDefault, "", NBSONS2, tids+1+NBSONS1);
pvm_initsend(PvmDataDefault);
pvm_pkint(&nbs, 1, 1);
pvm_pkint(tids, nbs+1, 1);
for (i=1;i<nbs+1;i++) pvm_send(tids[i],1);
for (i=1;i<nbs+1;i++) pvm_recv(-1,-1);
sleep(1);
pg_endlab(1);
This can be useful to count the number of tasks simultaneously executing a certain part of their code, as given by the submenu Count of the menu Tasks of Paragraph. Coming back to father.c, the program ends by spawning again several tasks:
info=pvm_spawn(SONS1, (char **)0,
PvmTaskDefault, "", NBSONS1, tids+1);
info=pvm_spawn(SONS2, (char **)0,
PvmTaskDefault, "", NBSONS2, tids+1+NBSONS1);
pvm_initsend(PvmDataDefault);
pvm_pkint(&nbs, 1, 1);
pvm_pkint(tids, nbs+1, 1);
for (i=1;i<nbs+1;i++) pvm_send(tids[i],1);
for (i=1;i<nbs+1;i++) pvm_recv(-1,-1);
free(tids);
pvm_exit();
exit(0);
}
Now,
we provide two other PVM programs which are spawned by the
pvm_spawn commands of
father.c.
Here is the first one, named
son1.c:
#include <stdio.h>
#include <malloc.h>
#include <assert.h>
#include "pvm3.h"
#include "pgpvm2.h"
void main()
{ int mytid; /* task identifier */
int *tids; /* tids array */
int nbs; /* number total of sons */
pg_tids("y");
pg_beglab(2);
pvm_recv(-1, 1);
pvm_upkint(&nbs, 1, 1);
tids=(int*)malloc(sizeof(int)*(nbs+1));
assert(tids!=NULL);
pvm_upkint(tids, nbs+1, 1);
pvm_initsend(PvmDataDefault);
pvm_send(tids[0], 1);
pg_endlab(2);
free(tids);
pvm_exit();
exit(0);
}
Note that it
involves a section labeled by 2.
Now, the second task is called
son2.c and its source
involves the two labels 3 and 4:
#include <stdio.h>
#include <malloc.h>
#include <assert.h>
#include "pvm3.h"
#include "pgpvm2.h"
void main()
{ int *tids; /* tids array */
int nbs; /* number total of sons */
pg_tids("y");
pg_beglab(3);
pvm_recv(-1, 1);
pvm_upkint(&nbs, 1, 1);
tids=(int*)malloc(sizeof(int)*(nbs+1));
assert(tids!=NULL);
pvm_upkint(tids, nbs+1, 1);
pg_endlab(3);
sleep(3);
pg_beglab(4);
pvm_initsend(PvmDataDefault);
pvm_send(tids[0], 1);
pg_endlab(4);
free(tids);
pvm_exit();
exit(0);
}
Once the application has been modified as described and recompiled by the following command (in the EXAMPLE/C subdirectory):
$ aimk father son1 son2
making in HPPA/ for HPPA
cc -I/usr/local/lib/pvm3/include -DUSE_PGTRACE -o
/users/pvm/bin/HPPA/father ../father.c -L/usr/local/lib/pvm3/lib/HPPA
-lpvm3 -lpgpvm2
strip /users/pvm/bin/HPPA/father
cc -I/usr/local/lib/pvm3/include -DUSE_PGTRACE -o
/users/pvm/bin/HPPA/son1 ../son1.c -L/usr/local/lib/pvm3/lib/HPPA
-lpvm3 -lpgpvm2
strip /users/pvm/bin/HPPA/son1
cc -I/usr/local/lib/pvm3/include -DUSE_PGTRACE -o
/users/pvm/bin/HPPA/son2 ../son2.c -L/usr/local/lib/pvm3/lib/HPPA
-lpvm3 -lpgpvm2
strip /users/pvm/bin/HPPA/son2
upon execution PGPVM2 produces a single tracefile named
pgfile.
$ PGSORT /tmp/pgfile.516 Working... initial sort Working... clocksync Working... 2nd sort Working... converter Working... final sort
This
shellscript converts the file to standard
Paragraph trace file format and the trace file
will appear in the directory as
pgfile.
Once the file pgfile.516.trf has been produced, the user may run Paragraph by the command PG pgfile.516.trf in order to visualize the behavior of the program:
|
|
Using PGPVM2 with Fortran Programs
Using PGPVM2 with PVM Fortran programs is only slightly more tedious. Although the user does not add an extra header file nor a #define USE_PGTRACE as is the case with C programs, the user must modify the names of certain PVM routines. For example, the user must change all calls from pvmfmytid() to pgfmytid(). The next figure lists the mandatory name modifications for PVM Fortran applications:
| Routine Name | New Routine Name |
| pvmfmytid() | pgfmytid() |
| pvmfexit() | pgfexit() |
| pvmfrecv() | pgfrecv() |
| pvmfprecv() | pgfprecv() |
| pvmnrecv() | pgfnrecv() |
| pvmtrecv() | pgftrecv() |
| pvmfmcast() | pgfmcast() |
| pvmfsend() | pgfsend() |
| pvmfpsend() | pgfpsend() |
| pvmfspawn() | pgfspawn() |
The only other modifications are the addition to the source code of the pgfstartadmin(), pgftids() pgfbeglab(), pgfendlab() and pgfclose() library routines to the source code. Please, refer to the section dealing with the utilization of PGPVM2 with C programs for more details about these functions, respectively called in C programs pg_startadmin(), pg_tids(), pg_beglab(), pg_endlab() and pg_close().
In particular, one can use labels to identify specific parts of your program by using the pgfbeglab and pgfendlab functions. For instance, to start a section labelled by 1, one has to insert the following line:
call pgfbeglab(1, info)
where info is an integer, which is set to a value different
from PvmOk if the label is already in use.
In the same way, use:
call pgfendlab(1, info)
to end the labelled section.
Furthermore, the internal C function pg_veriflab() is replaced by pgfveriflab() in Fortran programs. Refer to the previous C example or to the Fortran examples in the subdirectory EXAMPLE/FORTRAN.
Note that all strings in the Fortran routines must be explicitly null (\0) terminated. Where one uses the empty string "" as a parameter in C PGPVM2 functions, one should use a "*" with a null termination in the associated Fortran routines. For example, from master1.f, we use:
call pgfstartadmin('BRAD\0', '*\0', nproc+1)
or:
call pgfstartadmin('*\0', '*\0', nproc+1)
Once
the application has been modified as described and recompiled,
upon execution PGPVM2 produces a single tracefile named
pgfile.
Installing the PGPVM2 Library
Details about PGPVM2 Fortran compilation can be found in the postscript documentation.
PGPVM2 -- Advanced Features
The first version of PGPVM had three advanced features, pg_outfile() (pgfoutfile() for Fortran), pg_close() (pgfclose() for Fortran) and pg_chprefix() (pgfchprefix()).
Currently, in the new release PGPVM2, pg_chprefix() and pg_outfile() had been removed. However, it is still possible to specify the output file name by calling the function pg_startadmin() (pgfstartadmin()).
The only advanced feature is the routine pg_close() (pgfclose() for Fortran). The pg_close() routine allows the production of trace events to terminate before pvm_exit() is called. If this call is not used, the process produces trace events up until pvm_exit(). This necessarily implies that all nodes that participate in tracing need to eventually call pg_close() or pvm_exit(). After having stopped the production of trace events by a pg_close(), nodes can produce again trace information by calling pg_tids("Y"), as previously explained.
PGPVM2 -- Another C Example
Now that you know how to use PGPVM2, let us come to a "real" example in which we want to distribute a quick-sort on a parallel virtual machine. This example, which is also included in the subdirectory EXAMPLE/C, demonstrates how the visualization of an execution can be useful to analyze a parallel program.
We consider a distributed sort in which each slave has in charge to sort one part of an array of integers, while the master that runs slaves has to merge the sorted arrays at the end of the computation. Here is first the source of the slave named dis_qsort_slave.c:
#include <stdlib.h>
#include <assert.h>
#include "pvm3.h"
#include "pgpvm2.h"
#define TAG_UNSORTED 1
#define TAG_SORTED 2
int increase(a, b) char *a; char *b;
{ if (*a==*b) return 0;
if (*a>*b) return 1;
return -1;
}
void main()
{ char *data;
int mytid, buffer_id, bytes, typetag, nb, myrank, tid;
mytid=pvm_mytid();
pg_tids("y");
pg_beglab(4);
buffer_id = pvm_recv(-1, -1);
pvm_bufinfo(buffer_id, &bytes, &typetag, &tid);
pvm_upkint(&myrank, 1, 1);
pvm_upkint(&nb, 1, 1);
data=(char *)malloc(sizeof(char)*nb);
assert(data!=NULL);
pvm_upkbyte(data, nb, 1);
pg_endlab(4);
pg_beglab(5);
qsort((void *)data, nb, sizeof(char),
(int (*)(const void *, const void *))increase);
pg_endlab(5);
pvm_initsend(PvmDataRaw);
pvm_pkint(&myrank, 1, 1);
pvm_pkbyte(data, nb, 1);
pvm_send(tid, TAG_SORTED);
free(data);
pvm_exit();
exit(0);
}
Slaves use label 4 when they are receiving data, and label 5 when they sort integers. The master (dis_qsort_master.c) initiates the computation by creating the slaves as illustrated by the following lines:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "pvm3.h"
#include "pgpvm2.h"
#define TAG_UNSORTED 1
#define TAG_SORTED 2
#define NB_TASKS_PER_PROCESSOR 1
#define MAX_NB_TASKS 1024
#define QSORT_TASK_NAME "dis_qsort_slave"
#define INPUT_FACTOR 1000
void main(argc, argv) int argc; char *argv[];
{ int i, j, rank, nb, nbdata, minipos;
char *data, *sorteddata, minival;
int mytid, tids[MAX_NB_TASKS], nbtids, route;
int startsubarray[MAX_NB_TASKS], indicessubarray[MAX_NB_TASKS],
maxpossubarray[MAX_NB_TASKS];
int nhost, narch, pvm_info;
struct pvmhostinfo *hostp;
assert(argc==2 || argc==3);
nb=atoi(argv[1])*INPUT_FACTOR;
data=(char *)malloc(sizeof(char)*nb);
assert(data!=NULL);
sorteddata=(char *)malloc(sizeof(char)*nb);
assert(sorteddata!=NULL);
mytid=pvm_mytid();
route=pvm_setopt(PvmRoute, PvmRouteDirect);
pvm_info=pvm_config(&nhost, &narch, &hostp);
assert(pvm_info>=0);
if (argc==3) nbtids=atoi(argv[2]);
else nbtids=NB_TASKS_PER_PROCESSOR*nhost;
assert(nbtids<=MAX_NB_TASKS);
/* random initialization of the array to sort */
for (i=0;i<nb;i++) data[i]=(char)rand();
pg_startadmin("", "", nbtids+1);
pg_tids("y");
for (i=0;i<nhost-1;++i)
{ if (nbtids/nhost)
pvm_spawn(QSORT_TASK_NAME, (char **)NULL, PvmTaskHost,
hostp[i].hi_name, nbtids/nhost,
tids+i*(nbtids/nhost));
}
pvm_spawn(QSORT_TASK_NAME, (char **)NULL, PvmTaskHost,
hostp[i].hi_name,
(nbtids/nhost)+((nbtids%nhost)?(nbtids%nhost):0),
tids+i*(nbtids/nhost));
pg_beglab(1);
nbdata=nb/nbtids;
for (i=0;i<nb%nbtids;++i)
{ startsubarray[i]=indicessubarray[i]=i*(nbdata+1);
maxpossubarray[i]=startsubarray[i]+nbdata;
}
for (j=0;j<nbtids-nb%nbtids;++j)
{ startsubarray[i+j]=(nb%nbtids)*(nbdata+1)+j*(nbdata);
indicessubarray[i+j]=startsubarray[i+j];
maxpossubarray[i+j]=startsubarray[i+j]+nbdata-1;
}
for (i=0;i<nbtids;++i)
{ pvm_initsend(PvmDataRaw);
pvm_pkint(&i, 1, 1);
nbdata=maxpossubarray[i]-startsubarray[i]+1;
pvm_pkint(&nbdata, 1, 1);
printf("[%d]", nbdata);
pvm_pkbyte(data+startsubarray[i], nbdata, 1);
pvm_send(tids[i], TAG_UNSORTED);
}
printf(" distributed...\n"); fflush(stdout);
pg_endlab(1);
Then, the master waits for sorted arrays and ends the computation by merging all these arrays in order to produce a global sorted array:
/* receiving sorted arrays */
pg_beglab(2);
for (i=0;i<nbtids;++i)
{ pvm_recv(-1, TAG_SORTED);
pvm_upkint(&rank, 1, 1);
pvm_upkbyte(data+startsubarray[rank],
maxpossubarray[rank]-startsubarray[rank]+1, 1);
}
pg_endlab(2);
/* merging sorted arrays */
pg_beglab(3);
for (i=0;i<nb;++i)
{ for (minipos=0;minipos<nbtids;++minipos)
if (indicessubarray[minipos]<=maxpossubarray[minipos]) break;
minival=data[indicessubarray[minipos]];
for (j=0;j<nbtids;++j)
if (indicessubarray[j] <= maxpossubarray[j] &&
data[indicessubarray[j]]<minival)
{ minival=data[indicessubarray[j]];
minipos=j;
}
sorteddata[i]=minival;
indicessubarray[minipos]++;
}
pg_endlab(3);
free(data);
free(sorteddata);
pvm_exit();
exit(0);
}
Now, for instance, suppose we want to know whether the partial qsort() function calls are really performed in parallel on the virtual machine, we should compile these programs and link them to the PGPVM2 library with the aimk command. We then run the master program with the following command to mean that we want to sort 5000000 integers using 10 tasks:
$ dis_qsort_master 5000 10 pg_tids: initialization completed [500000][500000][500000][500000][500000][500000][500000] [500000][500000][500000] distributed... pg_exit: calling pvm_exitand call PGSORT:
$ PGSORT /tmp/pgfile.516 Working... initial sort Working... clocksync Working... 2nd sort Working... converter Working... final sortNow, we use Paragraph to answer to the previous question about the number of concurrent qsort() calls performed on blocs of 500000 integers:
|
This figure confirms that almost all slaves are performing their quick sort in parallel, because it shows that 8 processes are simultaneously executing their section labelled 5, i.e. the qsort() part of the program.
Obtaining Paragraph and PGPVM2
Paragraph, a performance visualization tool created by Michael T. Heath and Jennifer E. Finger, is available free over the internet. For instructions on how to obtain Paragraph, simply send an electronic mail message to netlib@ornl.gov containing the text, "send index from paragraph". PGPVM2 can be obtained by anonymous Ftp.
Comments or Questions
Please send comments and questions regarding either the first or the second release of PGPVM to either:
Sébastien Veigneau Institut Gaspard Monge Université de Marne-la-Vallée Sebastien.Veigneau@univ-mlv.fr
Brad Topol Graphics, Visualization and Usability Center Georgia Institute of Technology topol@cc.gatech.edu
Vaidy Sunderam Department of Math and Computer Science Emory University vss@mathcs.emory.edu
Anders Alund ITM - Swedish Institute of Applied Mathematics anders@itm.se

That's all folks !