file: 	gmpinmos.txt
author: 	alex stuebinger
date:    	30 may 1999
version:    GNU MP 2.0.2 with patches until May 99 and ECM


GNU MP 2.0.2/ECM for the Inmos Transputer
=========================================

This is a port of Torbjoern Granlund's <tege@swox.com>
                  ^^^^^^^^^^^^^^^^^^
GNU Multiple Precision Arithmetic Library, Edition 2.0.2 of June 1996
to the Inmos Transputer.
The routines can be applied in parallel.

The GNU MP source code for this release has all patches applied, that
were released until May 1999. 
This version supersedes the release of 29 April 1998.

The port was done by Alexander Stuebinger <stuebi@acm.org>.
                     ^^^^^^^^^^^^^^^^^^^^
Thanks to Torbjoern for making the GMP available.
And also thanks for his continuing friendly cooperation.

The GMP is under GNU Library General Public License. See "copying.lib".

The manual is in Postscript(TM) format, (\doc). It is a must read.

Be sure to check out the GMP home page <http://www.swox.com/gmp>
for the latest information.

Included is my new port of the latest ECM (Elliptic Curve Method) executable
for integer factoring by Paul Zimmermann of INRIA Lorraine/ France
                         ^^^^^^^^^^^^^^^
<Paul.Zimmermann@loria.fr>.

ECM is an application, which uses GMP.

For more information about ECM visit
<http://www.loria.fr/~zimmerma/records/ecmnet.html>

The core routines, which seriously affect GMP performance are coded
in assembly. This gives a significant speed improvement, see below.
The speed gain for popcount and Hamming distance are the most dramatic.

This distribution contains the binary libraries for the generic 32-bit
transputer (/ta) and for the t400, t425, t800, and t805.
The necessary header files are in the /include directory.
 
Also included is the transputer related source code to rebuild it, see
/inmos directory.
This source code is a supplement to the standard GMP 2.0.2 distribution. 


Notes for rebuiding it:
This release contains working sources for the 8.3 filenaming restrictions
of the Transputer Development Kits. These sources are zipped as 
"gmp202patched.zip" in the /source directory.
They are based on GMP 2.0.2, with the patches applied.
The transputer-specific files have been copied into the respective directories.
The syntax of the makefile "inmos.mak" obeys Watcom conventions.
The only difference between unix standard is the line continuation character.

For any questions please consult the source code first.

The bootable file of ECM is in t805 format.
The t805 executable is faster but does not run on a t4, since it has inline
fpu instructions.
The ECM on the Inmos Transputer t805/30MHz is about 67 times as slow as on
a Pentium2/300MHz.
It's a toy, when one uses it on processors of the 80's.
It is not meant for serious factoring. Well, problems of the 90's
and the hardware of the 80's do not come together. ;-)

You can contact me, if you need the libraries in a special format,
as T801-files for example. I will do the best to assemble it.

I plan to port the forthcoming GMP 2.1 as well.

If you build any interesting applications with the library, we would like to
hear from it.

If you discover any error in the routines please contact Torbjoern and me.
The library is well tested. As is the assembly code.


Notes for optimum performance of applications:
==============================================
Wherever possible use a stack size  <= 4k. If you really need the routines
from the /mpn section allocate the numbers from the heap, do not use the
stack.


Caveats:
========
Population and Hamming distance routines do not run on a t414 as they use the
"bitcnt" instruction. Who has still a t414?  Solution: recompile the original
hamdist.c popcount.c from source.


Speed:
======

Machine: Inmos t805/20MHz transputer
The routines, which begin with "ref" are the standard c-source code from
GMP 2.0.2. The others are coded in assembly language.
"Size" is the number of 32-bit limbs the number consists of.
Units are cpu clock cycles per limb.
The gmp805.lib was used for the timings.
Please read Torbjoern's "speed.gmp" for the performance of other processors.

=======================================================
size = 10
=======================================================
refmpn_popcount: 	          169.20 cycles/limb
mpn_popcount: 		     48.82 cycles/limb
refmpn_lshift: 		     92.63 cycles/limb
mpn_lshift: 		     73.87 cycles/limb
refmpn_rshift: 		     90.78 cycles/limb
mpn_rshift: 		     81.17 cycles/limb
refmpn_add_n: 		     89.93 cycles/limb
mpn_add_n: 		           76.13 cycles/limb
refmpn_sub_n: 		     90.04 cycles/limb
mpn_sub_n: 		           76.13 cycles/limb
refmpn_mul_1: 		    120.12 cycles/limb
mpn_mul_1: 		           92.88 cycles/limb
refmpn_addmul_1: 	          153.49 cycles/limb
mpn_addmul_1: 		    113.88 cycles/limb
refmpn_submul_1: 	          153.17 cycles/limb
mpn_submul_1: 		    114.14 cycles/limb
=======================================================


=======================================================
size = 30
=======================================================
refmpn_popcount: 	          168.39 cycles/limb
mpn_popcount: 		     56.48 cycles/limb
refmpn_lshift: 		     91.52 cycles/limb
mpn_lshift: 		     70.24 cycles/limb
refmpn_rshift: 		     90.26 cycles/limb
mpn_rshift: 		     78.08 cycles/limb
refmpn_add_n: 		     86.65 cycles/limb
mpn_add_n: 		           73.36 cycles/limb
refmpn_sub_n: 		     86.72 cycles/limb
mpn_sub_n: 		           73.36 cycles/limb
refmpn_mul_1: 		    117.41 cycles/limb
mpn_mul_1: 		           90.08 cycles/limb
refmpn_addmul_1: 	          150.56 cycles/limb
mpn_addmul_1: 		    111.07 cycles/limb
refmpn_submul_1: 	          150.44 cycles/limb
mpn_submul_1: 		    111.20 cycles/limb
=======================================================


=======================================================
size = 100
=======================================================
refmpn_popcount: 	          168.12 cycles/limb
mpn_popcount: 		     59.28 cycles/limb
refmpn_lshift: 		     91.17 cycles/limb
mpn_lshift: 		     69.00 cycles/limb
refmpn_rshift: 		     90.08 cycles/limb
mpn_rshift: 		     76.99 cycles/limb
refmpn_add_n: 		     85.57 cycles/limb
mpn_add_n: 		           72.41 cycles/limb
refmpn_sub_n: 		     85.58 cycles/limb
mpn_sub_n: 		           72.41 cycles/limb
refmpn_mul_1: 		    116.46 cycles/limb
mpn_mul_1: 		           88.97 cycles/limb
refmpn_addmul_1: 	          149.55 cycles/limb
mpn_addmul_1: 		    110.11 cycles/limb
refmpn_submul_1: 	          149.54 cycles/limb
mpn_submul_1: 		    110.15 cycles/limb
=======================================================


=======================================================
size = 300
=======================================================
refmpn_popcount: 	          168.04 cycles/limb
mpn_popcount: 		     61.09 cycles/limb
refmpn_lshift: 		     91.07 cycles/limb
mpn_lshift: 		     68.66 cycles/limb
refmpn_rshift: 		     90.03 cycles/limb
mpn_rshift: 		     76.69 cycles/limb
refmpn_add_n: 		     85.25 cycles/limb
mpn_add_n: 		           72.14 cycles/limb
refmpn_sub_n: 		     85.25 cycles/limb
mpn_sub_n: 		           72.14 cycles/limb
refmpn_mul_1: 		    116.19 cycles/limb
mpn_mul_1: 		           88.74 cycles/limb
refmpn_addmul_1: 	          149.28 cycles/limb
mpn_addmul_1: 		    109.85 cycles/limb
refmpn_submul_1: 	          149.26 cycles/limb
mpn_submul_1: 		    109.86 cycles/limb
=======================================================










