Newsgroups: comp.parallel.mpi
From: cmanry@haiphong.eecs.wsu.edu (C. Manry)
Subject: Help wanted with FFT's on a Paragon
Organization: Washington State University
Date: Wed, 28 Sep 1994 14:35:50 GMT
Message-ID: <CwuGJq.Gz2@serval.net.wsu.edu>

(Note this may have been sent twice but my local newreader says nope!)

Help,

I need to know how FFT (fast-fourier transforms) are done on a paragon
message passing parallel computer?  I am most interested in 2-d FFT's.

What I am looking for is a libary call and how it works in a parallel
environment.  Or if there is serveral calls how do they work?

Right now I have code on the Cray (San Diego Supercomputer Center, SDSC)
that is running quite well, however it is currently working on a problem 
that is 1/2 sized.  When I go to full size I will need 1/2 Gbyte of memory.  
The cray can do this but it will sit in NQS for some time before it runs and 
take on the order of 1hr of CPU time to run.

The problem I am doing is parallelizable (sic).  Also the problem can be 
spread out among the nodes of a paragon partition.  Thus each node will
require about 11Mbytes.  The paragon at SDSC can handle the load.  

However, I need to decide if I am going to apply for more time on the cray
or switch to the paragon at SDSC.  The only thing keeping me from jumping
on the parallel wagon is I'm not sure how to approach the FFT's I do.
On the cray the FFT's I do (and I do alot of 'em) take 94% of all CPU time
so it is importaint to me to know how FFT's are done

Here is what I am tring to do:

On the cray:  Alot of my code looks like:

do pp = 1,64
  do j = 0,63
   do i = 0,63
     perform operations on x(i,j,pp) and/or y(i,j,pp)
   enddo
  enddo
enddo

where x and y have dimensions stated in the loops.  The i,j refer
to a rotaion or separte "problem" with pp used to keep track of which
"problem" is being operated on.

In the paragon the outer pp loop would be removed and a separate node
would handle each "problem".  When results from separate "problems" 
need to be combined this could be *easily* done with message passing. 
BTW I have the Paragon Users Guide from SDSC.

The problem or difficulty I have is with the following:

On the cray

do pp = 1,64
   take X(a 64x64,pp marix ) zero pad it to a 128x128 matrix = XP
   take 2-D FFT of XP in place
   Do a simple multiplication of XP = XP * FK 
   take inverse 2-D FFT of XP in place.
   place result in Y( a 64x64,pp matrix)
enddo

On the paragon I would like to remove the outer pp loop as I would
do in the rest of my code.  I could have each processor perform it's 
own FFT without the help of the other node.  However, if there is a 
savings in terms of number of floating point operations needed if I *do*
use a parellel FFT algorithm.  But I think that I will have to do some
more controling using messaage passing.

Any thing you could tell me would be helpful.  If you have a man page
on FFT calls on a paragon would you please mail them to me?

Thanks all....
----
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Charles (skip) Manry  / School of EECS / WSU, Pullman WA, 99164-2752        %
% cmanry@eecs.wsu.edu   / My opinions are my own and no one else!             %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

