This directory contains 5 source files: 
(a) fp_4rrm_fu.id 
(b) fp_4irm_fu.id
(c) fp_4irm_is.id 
(d) fp_4irm_ms.id 
(e) fp_sms.id 
all implementing NAS benchmark FP, which solves 3-dimensional heat
equation using 3D forward and inverse Fourier transform. The 3D
Fourier transform is implemented using 1-D Fourier transform whose
base case is of size 4. Consequently, inputs (in any dim) of size less
than 4 will not work. 
The acronym fp stands for FFT-PDE.
The acronym 4i stands for Iterative with base case 4 of 1D FFT.
The acronym 4r stands for Recursive with base case 4 of 1D FFT.
The acronym rm stands for Resource Managed.

fu = purely functional
is = I-structure implementation
ms = M-structure implementation

(a) fp_4rrm_fu.id : This is a functional implementation of the
benchmark i.e. it contains no I or M structures.  The 1-D FFT routine
in this implementation is recursive. This implementaion  has
a low critical path length,but runs out of heap memory quickly.
For a 1PE1IS Monsoon machine the largest input size 16x16x16.

(b) fp_4irm_fu.id : This implementation is identical to the previous
one except that here the 1-D FFT routine is an iterative one with
bottom case size 4.

(c) fp_4irm_is.id : This is an implementation with only I-structures.
This method has the lowest instruction count and the 2nd lowest critical
path length. Use of efficient releases make this implementation more
space efficient than purely functional. The maximum input size that can
run on 1PE1IS Monsoon is 32x32x32.

(d) fp_4irm_ms.id : This implementation replaces almost all the 
data-stuctures of the previous implementations with M-structure. This
method relies heavily on the "replace" (get + barrier + put) operation
on M-structures and includes quite a number of explicit barriers. 
Consequently, it runs with least parallelism (max critical path length)
but this method runs our target input size of 64x64x64 on a 1PE1IS Monsoon 
machine.

(e) fp_sms.id : This is a more time-efficient version of the previous
implementation. This implemetation substitutes the "replace" operation
with "get" or "put" and utilizes producer-consumer parallelism heavily.
This implementation also runs input size 64x64x64 on a 1PE1IS Monsoon 
machine.



