file: 	inmos.txt
author: alex stuebinger
date:   09/04/98 23:50


GNU MP 2.0.2/ECM for the Inmos Transputer
=========================================

This is a port of Torbjoern Granlund's <tege@matematik.su.se>
                  ^^^^^^^^^^^^^^^^^^
GNU Multiple Precision Arithmetic Library, Edition 2.0.2 of June 1996
to the Inmos Transputer.
The routines can be applied in parallel.

The port was done by Alexander Stuebinger <stuebi@mail.uni-mainz.de>
                     ^^^^^^^^^^^^^^^^^^^^
in April 1998. Thanks to Torbjoern for making the GMP available.
And also thanks for his continuing friendly cooperation.

The GMP is under GNU Library General Public License. See "copying.lib".

The manual is in Postscript(TM) format, (\doc). It is a must read.

Be sure to check out the GMP home page <http://www.matematik.su.se/~tege/gmp>
for the latest information.

Included is my new port of the latest ECM (Elliptic Curve Method) executable
for integer factoring by Paul Zimmermann of INRIA Lorraine/ France
                         ^^^^^^^^^^^^^^^
<Paul.Zimmermann@loria.fr>.

ECM is an application, which uses GMP.
Thanks to Paul for making it free. Paul, we love it!

For more information about ECM visit
<http://www.loria.fr/~zimmerma/records/ecmnet.html>

The routines can be applied in parallel.
The core routines, which seriously affect GMP performance are coded
in assembly. This gives a significant speed improvement, see below.
The speed gain for popcount and Hamming distance are the most dramatic.
Maybe we will hear from some applications in Coding Theory.

This distribution contains the binary libraries for the generic 32-bit
transputer (/ta) and for the t805. Also included is the transputer
related source code to rebuild it. This source code is a supplement to the
standard GMP 2.0.2 distribution. The syntax of the makefile obeys Watcom
conventions. The only difference between unix standard is the line
continuation character.

Notes for rebuiding it:
Unpack the standard distribution of GMP 2.0.2 and truncate the filenames
to 8.3 conventions. The 8.3 conventions of the Inmos AnsiC Toolset present
a major annoyance.
Copy the transputer source over the standard distribution.
You must do this manually. The most stuff goes into the /mpn directory.
Read the makefile "inmos.mak". You have to copy the headers in each
directory. Walk then from directory to directory and make the libraries,
e.g. "wmake /f inmos.mak mpcore.lib". Unite the libraries to "gmp.lib".
For any questions please consult the source code first.

The bootable files of ECM are in generic and t805 formats.
The t805 executable is faster but does not run on a t4, since it has inline
fpu instructions. The ECM on the Inmos Transputer is about 70 times as slow
as on a Pentium2/300MHz. It's a toy, when one uses it on processors of the
80's. It is not meant for serious factoring. Well, problems of the 90's
and the hardware of the 80's do not come together. ;-)

You can contact me, if you need the libraries in a special format,
as T425-files for example. I will do the best to assemble it.

I plan to port the forthcoming GMP 2.1 as well.

If you do any interesting applications with the library, we would like to
hear from it.

If you discover any error in the routines please contact Torbjoern and me.
The library is well tested. As is the assembly code.


Notes for optimum performance of applications:
==============================================
Wherever possible use a stack size  <= 4k. If you really need the routines
from the mpn section allocate the numbers from the heap, do not use the
stack.


Caveats:
========
Population and Hamming distance routines do not run on a t414 as they use the
"bitcnt" instruction. Who has still a t414?  Solution: recompile the original
hamdist.c popcount.c from source.


Speed:
======

Machine: Inmos t805/30MHz transputer
The routines, which begin with "ref" are the standard c-source code from
GMP 2.0.2. The others are coded in assembly language.
"Size" is the number of 32-bit limbs the number consists of.
Units are cpu clock cycles per limb.
The gmpa.lib was used for the timings.
Please read Torbjoern's "speed.gmp" for the performance of other processors.

=======================================================
size = 10
=======================================================
refmpn_popcount: 	    186.24 cycles/limb
mpn_popcount: 		     18.82 cycles/limb
refmpn_lshift: 		     65.86 cycles/limb
mpn_lshift: 		     35.90 cycles/limb
refmpn_rshift: 		     63.55 cycles/limb
mpn_rshift: 		     42.82 cycles/limb
refmpn_add_n: 		     66.05 cycles/limb
mpn_add_n: 		     37.25 cycles/limb
refmpn_sub_n: 		     66.24 cycles/limb
mpn_sub_n: 		     37.63 cycles/limb
refmpn_mul_1: 		    129.22 cycles/limb
mpn_mul_1: 		     54.14 cycles/limb
refmpn_addmul_1: 	    168.00 cycles/limb
mpn_addmul_1: 		     75.26 cycles/limb
refmpn_submul_1: 	    167.04 cycles/limb
mpn_submul_1: 		     75.26 cycles/limb
=======================================================


=======================================================
size = 30
=======================================================
refmpn_popcount: 	    193.98 cycles/limb
mpn_popcount: 		     44.48 cycles/limb
refmpn_lshift: 		     89.22 cycles/limb
mpn_lshift: 		     57.28 cycles/limb
refmpn_rshift: 		     87.17 cycles/limb
mpn_rshift: 		     64.96 cycles/limb
refmpn_add_n: 		     87.36 cycles/limb
mpn_add_n: 		     59.71 cycles/limb
refmpn_sub_n: 		     87.36 cycles/limb
mpn_sub_n: 		     59.78 cycles/limb
refmpn_mul_1: 		    134.98 cycles/limb
mpn_mul_1: 		     76.67 cycles/limb
refmpn_addmul_1: 	    173.95 cycles/limb
mpn_addmul_1: 		     97.66 cycles/limb
refmpn_submul_1: 	    172.93 cycles/limb
mpn_submul_1: 		     97.73 cycles/limb
=======================================================


=======================================================
size = 100
=======================================================
refmpn_popcount: 	    196.84 cycles/limb
mpn_popcount: 		     56.66 cycles/limb
refmpn_lshift: 		     97.50 cycles/limb
mpn_lshift: 		     64.78 cycles/limb
refmpn_rshift: 		     95.48 cycles/limb
mpn_rshift: 		     72.71 cycles/limb
refmpn_add_n: 		     94.81 cycles/limb
mpn_add_n: 		     67.64 cycles/limb
refmpn_sub_n: 		     94.85 cycles/limb
mpn_sub_n: 		     67.66 cycles/limb
refmpn_mul_1: 		    137.15 cycles/limb
mpn_mul_1: 		     84.61 cycles/limb
refmpn_addmul_1: 	    176.12 cycles/limb
mpn_addmul_1: 		    105.64 cycles/limb
refmpn_submul_1: 	    175.14 cycles/limb
mpn_submul_1: 		    105.64 cycles/limb
=======================================================


=======================================================
size = 300
=======================================================
refmpn_popcount: 	    197.66 cycles/limb
mpn_popcount: 		     57.74 cycles/limb
refmpn_lshift: 		     99.85 cycles/limb
mpn_lshift: 		     66.95 cycles/limb
refmpn_rshift: 		     97.84 cycles/limb
mpn_rshift: 		     74.92 cycles/limb
refmpn_add_n: 		     96.96 cycles/limb
mpn_add_n: 		     69.89 cycles/limb
refmpn_sub_n: 		     96.97 cycles/limb
mpn_sub_n: 		     69.90 cycles/limb
refmpn_mul_1: 		    137.75 cycles/limb
mpn_mul_1: 		     86.89 cycles/limb
refmpn_addmul_1: 	    176.75 cycles/limb
mpn_addmul_1: 		    107.90 cycles/limb
refmpn_submul_1: 	    175.75 cycles/limb
mpn_submul_1: 		    107.90 cycles/limb
=======================================================


