Application Center - Maplesoft

Contact Maplesoft Request Quote

Submit your work

Application Center Applications Writing DLL in Assembly Language for External Calling in Maple Preview

App Preview:

Writing DLL in Assembly Language for External Calling in Maple

You can switch back to the summary page by clicking here.

Learn about Maple

Download Application

assembly.mws

Introduction
Section I: Writing a DLL in Assembly Language
Section II: External Calling of Facmodp in Maple
Section III: Some Improvements
Section IV: Using Floating Point Registers
Conclusion
References

Writing DLL in Assembly Language for External Calling in Maple

by Alec Mihailovs, Ph.D.

Shepherd College, USA

alec@mihailovs.com

http://webpages.shepherd.edu/amihailo/

NOTE: The article shows how to write assembler code using x86 and FPU registers. An example of calculating large factorials mod m is discussed in detail.

Introduction

This article started from a posting [1] in comp.soft-sys.maple newsgroup where I wrote:

In general, such calculations as huge factorials mod p should be programmed directly in assembly.

It is fairly easy and doesn't require much assembly knowledge - just a few instructions.

It can be compiled as a library (dll) and accessed from Maple through external calling.

Since quite a few people expressed an interest, I wrote the assembly code for that particular problem. I hope that it will be useful as a jump-start for writing your own DLL for the innermost loops repeating a billion times or more. For loops repeating "only" a few million times, the DLL could be written in C and other high level languages, but for billions of operations high level languages are too slow.

To produce a DLL, one needs an assembler. I used MASM32 v.8 [2] which can be downloaded from http://www.masm32.com . Many people consider MASM (Microsoft Assembler) language as "standard". I used it here for creating a DLL in Windows. It produces object files in COFF (Common Object File Format), so they can be used for building a library in Linux as well.

>	restart;

Section I: Writing a DLL in Assembly Language

To produce a dll, put 3 files, Facmod.asm, Facmod.def, and Makeit.bat in one directory, and run Makeit.bat (by clicking on it in Windows Explorer, or
opening Facmod.asm in qeditor (which is a part of the MASM32 distribution) and running Makeit.bat from the Project menu. Here are the files:

Facmod.asm

.586                                     ; for 586 processor or better
.model flat, stdcall                     ; 32-bit memory and standard call

.code                                    ; the beginning of the code section
LibMain proc h:DWORD, r:DWORD, u:DWORD   ; the dll entry point
       mov eax, 1                       ; if eax is 0, the dll won't start
       ret                              ; return
LibMain Endp                             ; end of the dll entry

Facmodp proc n:DWORD, p:DWORD            ; a function with dword parameters n and m
       push ebx                         ; save the ebx value
       mov ecx, n                       ; put n in the counter register ecx
       mov ebx, p                       ; put p in ebx
       mov eax, 1                       ; put 1 in eax
   L:                                   ; a loop label
       mul ecx                          ; multiply eax by ecx
       div ebx                          ; divide the product by p
       mov eax, edx                     ; move the remainder from edx to eax
       dec ecx                          ; decrease the counter by 1
       jnz L                            ; repeat the loop if the counter is not 0
                                        ;
       pop ebx                          ; restore the ebx value
       ret                              ; return eax
Facmodp endp                             ; end of the function
End LibMain                              ; end of the dll

Facmod.def

LIBRARY Facmod
EXPORTS Facmodp

Makeit.bat

@echo off

if exist Facmod.obj del Facmod.obj
if exist Facmod.dll del Facmod.dll

\masm32\bin\ml /c /coff Facmod.asm

\masm32\bin\Link /SUBSYSTEM:WINDOWS /DLL /DEF:Facmod.def Facmod.obj

pause

That will produce 4 files, Facmod.dll, Facmod.lib, Facmod.exp, and Facmod.obj.

Files Facmodp.def and Makeit.bat are self-explanatory. I'll add detailed comments here on Facmod.asm.

Facmod.asm commented

.586
.model flat, stdcall

This is the standard beginning of the asm files for MASM32. They tell the assembler that the program will be used in computers with 586 processor or better, with 32-bit memory addressing, and use stdcall calling convention.

.code

The beginning of the code section.

LibMain proc h:DWORD, r:DWORD, u:DWORD

       mov eax, 1
       ret
LibMain Endp

Creating the dll entry point. If eax is 0, the dll won't start, so we put 1 there.

Facmod proc n:DWORD, p:DWORD

The beginning of our function. It has two dword = 4 byte = 32 bit parameters, n and p.

push ebx

In Windows and Linux, registers eax, ecx, and edx can be arbitrarily modified by programs, but other registers including ebx should be preserved,

so we are pushing the ebx value on the stack at the beginning of the program and will restore it back before the end of the program.

mov ecx, n

Put n in the counter register ecx.

mov ebx, p

Put p in the ebx register.

mov eax, 1

Put 1 in the eax register.

A loop label.

mul ecx

Multiply eax*ecx and put result in edx:eax, i.e. the beginning of it in edx and the last 32 bits in eax.

div ebx

Divide edx:eax (the result of multiplication) by ebx = p and put the quotient in eax and the remainder in edx.

mov eax, edx

Move the remainder from edx to eax.

dec ecx

Decrease the counter (ecx) by 1. We start from n there. It becomes n - 1 after the first loop, n - 2 after the second etc.

jnz L

If counter is not 0, return to label L, otherwise continue below.

pop ebx

Restore the ebx value from the stack.

ret

Return.

Facmodp endp

The end of our function.

End LibMain

The end of the DLL.

The integer return value of the function is located in the eax register (that is a standard thing in Windows).

For the sake of simplicity, I didn't do any optimization of the calculating procedure, or adding at the beginning that if n > p - 1, then the result is 0, or
checking whether p is 0. I'll do that in Section 3.

Section II: External Calling of Facmodp in Maple

In Maple,

>	Facmodp:=define_external( 'Facmodp', 'n'::integer[4], 'p'::integer[4], 'RETURN'::integer[4], 'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll" ):

with changing the LIB to the location of Facmodp.dll creates a function calculating n! mod p. For example,

>	Facmodp(10^9,10^9+7);

that took just a few seconds on my computer instead of the many hours that a procedure written in Maple without external calling would.

Note that in the definition of external call we used the integer[4] data descriptor for n and p. It takes 4 bytes, the same as DWORD in the assembly code. The sign in the integer[4] uses 1 bit, so we can use positive values less than

2^31;

for n and p. However, keeping in mind that negative integers have the highest bit 1 (the minus sign) in their 32-bit representation, we can use negative integers for representing unsigned integers up to

>	2^32-1;

Conversion formulas

The conversion formulas are very simple. Denoting s as the negative signed integer and u the unsigned integer having the same 32-bit representation in the memory (and registers),

For example, let

>	u:=prevprime(2^32);

>	s:=u-2^32;

>	Facmodp(1000,-5);

Maple calculates 1000! pretty fast, so we can check the answer by a direct calculation,

>	1000! mod u;

The returned values also should be converted if they are negative. For example,

>	Facmodp(1001,-5);

>	1001! mod u;

>	%%+2^32;

Also, for prime p, Wilson's theorem allows us to do less calculations for n > p/2.

Wilson's Theorem

If p is a prime number, then

Corollary

If a positive integer n is less than a prime number p, then

factorial(n) = `mod`((-1)^(p-n)/factorial(p-n-1), p) .

Proof. Using equality p - k = -k mod p and substituting p - 1 with -1, p - 2 with -2, and so on, ..., n + 1 with n + 1 - p, in (p - 1)!, we can rewrite the Wilson's formula as

Dividing both parts by (-1)^(p - n - 1) * (p - n - 1)!, we get the corollary.

Note also that Facmodp cannot be used for n or p equal to 0. Entering p = 0 would crash Maple's kernel (so Maple would have to be restarted after that).

Using n = 0 would take a long time and give the answer 0. Combining all that, I wrote the following procedure that should be used instead of the direct use of Facmodp.

facmodp := proc (n::nonnegint, p::posint)
local r;
   if p <= n then return 0
   elif n = 0 then return 1
   elif p>4294967295 then error "2nd argument %1 should be less than 4294967296", p
   elif 1/2*p < n and isprime(p) then
       if n = p-1 then return p-1
       elif p < 2147483648 then r := Facmodp(p-n-1,p)
       else r := Facmodp(p-n-1,p-4294967296)
       end if;
       return `mod`((-1)^(p-n)/`if`(r < 0,4294967296+r,r),p)
   else r := Facmodp(`if`(n < 2147483648,n,n-4294967296),`if`(p < 2147483648,p,p-4294967296))
   end if;
   `if`(r < 0,4294967296+r,r)
end proc:

Here are some examples,

>	facmodp(0,1);

>	tt:=time():

>	facmodp(10^9,10^9+7);

>	time()-tt;

>	facmodp(2^31,nextprime(2^31));

>	facmodp(17,2^32-5);

Even better, both Facmodp and facmodp can be included in a module, with exporting only facmodp to prevent the possibility of crashing Maple's kernel by entering p = 0. That is a typical example of using external calls in Maple. Instead of direct call, it is much better in many cases to add a Maple "wrapper" for it. In this example, facmodp is extending the range of using Facmodp, making calculations much faster in cases with n close to a prime p, and preventing Maple crashing.

Section III: Some Improvements

Returning 0 for n > p could be done at the beginning of the assembly code for the Facmodp function. To be able to jump to the end, one can add a label E before the popping of ebx at the end. Returning 0 if n > p can be done by adding the following lines before putting 1 in eax,

       mov eax, 0                       ; put 0 in eax for returning 0 if n>=p
       cmp ecx, ebx                     ; compare n and p
       jae E                            ; if n>=p, exit returning 0

Also, we can add the code returning 1 if n = 0 before the loop, when 1 is in eax,

test ecx, ecx ; it is faster than jecxz E
jz E ; if n=0, exit returning 1

If we are going to use it for cases with not necessarily prime p, the factorial is 0 for many cases, and in these cases we can return 0 right after it first appears in eax, because the rest of the calculations won't change that.

       mov eax, edx                     ; move the remainder from edx to eax
       test eax, eax                    ; if the remainder is 0,
       jz E                             ; then exit returning 0

As usual, when we have a working function, Facmodp, it is better not to change it, but add a renamed copy of it in the dll (before the End LibMain line) and make changes there. Here is the improved function Facmodm.

Facmodm

Facmodm proc n:DWORD, m:DWORD            ; a function with dword parameters n and m
       push ebx                         ; save the ebx value
       mov ecx, n                       ; put n in the counter register ecx
       mov ebx, m                       ; put p in the base register ebx
       mov eax, 0                       ; put 0 in eax for returning 0 if n>=p
       cmp ecx, ebx                     ; compare n and p
       jae E                            ; if n>=p, exit returning 0
       mov eax, 1                       ; put 1 in eax
       test ecx, ecx                    ; it is faster than jecxz E
       jz E                             ; if n=0, exit returning 1
   L:                                   ; a loop label
       mul ecx                          ; multiply eax by ecx
       div ebx                          ; divide the product by p
       mov eax, edx                     ; move the remainder from edx to eax
       test eax, eax                    ; if the remainder is 0,
       jz E                             ; then exit returning 0
       dec ecx                          ; decrease the counter by 1
       jnz L                            ; repeat the loop if the counter is not 0
   E:                                   ; an exit label
       pop ebx                          ; restore the ebx value
       ret                              ; return eax
Facmodm endp                             ; end of the function

To make a dll, we can use the same Makeit.bat file, but Facmod.def should be modified by adding the new export,

Facmod.def

LIBRARY   Facmod
EXPORTS   Facmodp
EXPORTS   Facmodm

In Maple, we can define the external call to Facmodm similarly to Facmodp,

>	Facmodm:=define_external( 'Facmodm', 'n'::integer[4], 'm'::integer[4], 'RETURN'::integer[4], 'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll" ):

Test its speed,

>	time(Facmodp(10^8,10^8+7));

>	time(Facmodm(10^8,10^8+7));

So, Facmodm is about 10% slower, because of the additional operations inside the loop. However, it compensates by faster calculations for composite p,

>	time(Facmodm(10^8,10^8+9));

Facmodm can be extended to larger values of n and p similarly to facmodp. I included the corresponding function facmodm in the module below.

Now, when we have fast function facmodp for prime p and fast function facmodm for composite p, we can write a module, including them as well as the function facmod working in both cases, which chooses facmodp if p is prime and facmodm otherwise. Since we suppose that facmodp will be used only for prime p, we can delete checking whether p is prime from its definition. . Here is the module.

Facmod module

Facmod:=module()
local Facmodp, Facmodm;
export facmod, facmodp, facmodm;
description "The functions in this module calculate n! mod p. If p is prime, facmodp(n,p) should be used. If p is composite, facmodm(n,p) should be used. If p can be prime or composite, facmod(n,p) should be used. The functions work for n and p less than 2^32.";
Facmodp:=define_external(
       'Facmodp',
       'n'::integer[4],
       'p'::integer[4],
       'RETURN'::integer[4],
       'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll"
);
facmodp := proc (n::nonnegint, p::posint)
local r;
   if p <= n then return 0
   elif n = 0 then return 1
   elif p>4294967295 then error "2nd argument %1 should be less than 4294967296", p
   elif 1/2*p < n then
       if n = p-1 then return p-1
       elif p < 2147483648 then r := Facmodp(p-n-1,p)
       else r := Facmodp(p-n-1,p-4294967296)
       end if;
       return `mod`((-1)^(p-n)/`if`(r < 0,4294967296+r,r),p)
   else r:=Facmodp(`if`(n<2147483648,n,n-4294967296),`if`(p<2147483648,p,p-4294967296))
   end if;
   `if`(r < 0,4294967296+r,r)
end proc;
Facmodm:=define_external(
       'Facmodm',
       'n'::integer[4],
       'm'::integer[4],
       'RETURN'::integer[4],
       'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll"
);
facmodm := proc (n::nonnegint, p::posint)
local r;
   if p <= n then return 0
   elif p>4294967295 then error "2nd argument %1 should be less than 4294967296", p
   else
   r:=Facmodm(`if`(n<2147483648,n,n-4294967296),`if`(p<2147483648,p,p-4294967296));
   end if;
   `if`(r < 0,4294967296+r,r)
end proc;
facmod := proc (n::nonnegint, p::posint)
   if isprime(p) then facmodp(n,p)
   else facmodm(n,p)
   end if
end proc
end module:

Section IV: Using Floating Point Registers

There are 2 ways of extending our calculations of n! mod p for p > 2^32-1. First - using the gmp library for operations with multiple precision integers, and second - using the floating point stack. In this section we'll cover the second case. It doesn't extend values of p too much, but still it can be useful for some calculations. The following assembler function is analogous to Facmodp, just operates in floating point registers and returns a floating point number instead of an integer.

Fmodp

Fmodp   proc n:QWORD, p:QWORD            ; a function with qword parameters n and p
       fninit                           ; initialize the floating point stack
       fild qword ptr [n]               ; put n in the FPU
       fild qword ptr [p]               ; st=p, st(1)=n
       fld1                             ; st=1, st(1)=p, st(2)=n
   L:                                   ; a loop label
       fmul st, st(2)                   ; st=st*st(2)
       fprem                            ; st=st mod p
       fld1                             ; st=1, st(3)=counter
       fsub st(3), st                   ; st=1, st(3)=counter-1
       fcomp st(3)                      ; compare 1 and st(3) and pop the stack
       fnstsw ax                        ; store the status word in ax
       shr ah, 1                        ; it is faster than sahf
       jc L                             ; repeat the loop if the counter is > 1
                                        ;
       fstp st(2)                       ; put the return value at the bottom
       fstp st                          ; clear the floating point stack
       ret                              ; return st
Fmodp   endp                             ; end of the function

To see how it is working, we can add it to the Facmod.asp before the End LibMain, add the line EXPORT Fmodp to the Facmod.def and run Makeit.bat. If Maple was using external calls to functions from Facmod.dll, it should be shut down and then started again. The restart command is not enough for replacing the dll.

The external call in this case looks slightly different than for Facmodp. First, notice that we used QWORD = 64 bit parameters. The corresponding Maple data descriptor is the integer[8]. Second, if we want to get the returned value from the FPU, it should be declared in the external call as a floating point number, float[8] to get higher precision.

>	Fmodp:=define_external( 'Fmodp', 'n'::integer[8], 'p'::integer[8], 'RETURN'::float[8], 'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll" ):

It does calculations more slowly than Facmodp though. For example,

>	tt:=time():

>	Fmodp(10^8,10^8+7);

>	time()-tt;

However, it doesn't crash Maple for p = 0.

>	Fmodp(3,0);

It would give a wrong answer if n is nonpositive though. That's why it is a good idea again to use it not directly, but through a Maple procedure excluding such bad input cases.

Also, it can be optimized. Agner Fog wrote a great optimization manual [3] for different Pentium processors. The optimizations steps are different for different processors, so I'll do just one optimization here - replacing the slow fprem opcode with a calculation of N mod p using formula

Using truncation instead of rounding would always give a non-negative remainder. However, Pentium processors don't have an operation for truncation - only for rounding.

Fmodp1

Fmodp1 proc n:QWORD, p:QWORD            ; a function with qword parameters n and p
       fninit                           ; initialize the floating point stack
       fild qword ptr [n]               ; put n in the FPU
       fild qword ptr [p]               ; st=p, st(1)=n
       fld1                             ; st=1, st(1)=p, st(2)=n
       fdiv st, st(1)                   ; st=1/p, st(1)=p, st(2)=n
       fld1                             ; st=1, st(1)=1/p, st(2)=p, st(3)=n
   L:                                   ; a loop label
       fmul st, st(3)                   ; st=st*st(3)
       fld st                           ; save the product, st(2)=1/p, st(3)=p
       fmul st, st(2)                   ; st=product/p
       frndint                          ; st=round(product/p)
       fmul st, st(3)                   ; st=round(product/p)*p
       fsubp st(1), st                  ; st=product-round([product/p])*p
       fld1                             ; st=1, st(4)=counter
       fsub st(4), st                   ; st=1, st(4)=counter-1
       fcomp st(4)                      ; compare 1 and st(4) and pop the stack
       fnstsw ax                        ; store the status word in ax
       shr ah, 1                        ; it is faster than sahf
       jc L                             ; repeat the loop if the counter is > 1
                                        ;
       fstp st(3)                       ; put the return value at the bottom
       fcompp                           ; clear the floating point stack
       ret                              ; return st
Fmodp1 endp                             ; end of the function

Again, before using it, we have to add the export of Fmodp1 in Facmod.def, rebuild the dll, shut down Maple, and start it again. Here is the external call for it in Maple,

>	Fmodp1:=define_external( 'Fmodp1', 'n'::integer[8], 'p'::integer[8], 'RETURN'::float[8], 'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll" ):

Compare the speed for the same example,

>	time(Fmodp1(10^8,10^8+7));

Faster than Fmodp and about twice as slowly as Facmodp.

Because we used the rounding instead of truncation, the answer can be negative. For example,

>	Fmodp1(1001,10^8+7);

>	1001! mod (10^8+7);

>	%-(10^8+7);

Few examples that I did, gave the correct answers for p up to about

2^49;

The larger numbers can give a wrong answer. For example,

>	*a:=nextprime(710^14);**

>	Fmodp1(50000,a);

>	50000! mod a;

Correct!

>	*a:=nextprime(7510^13);**

>	Fmodp1(50000,a);

>	50000! mod a;

Wrong!

It would be interesting to find a precise bound M such that all Fmodp1(n, p) give the correct answers for positive integers n and p less than M.

As we did in Section III for Facmodp, we can improve Fmodp1 by giving the answer 1 for non-positive n and the fast answer 0 for n > p - 1. The following assembler code for Fmodm does that. It also adds checking if the answer is 0 inside the loop, the same as in Facmodm.

Fmodm

Fmodm   proc n:QWORD, p:QWORD            ; a function with qword parameters n and p
       fninit                           ; initialize the floating point stack
       fld1                             ; put 1 in the FPU for returning 1 if n<=1
       fild qword ptr [n]               ; st=n, st(1)=1
       fcomp st(1)                      ; compare n and 1 and pop the stack
       fnstsw ax                        ; store the status word in ax
       and ah, 41h                      ; it is faster than sahf
       jnz E                            ; if n<=1, exit returning 1
       fstp st                          ; clear the stack
       fldz                             ; put 0 in the FPU for returning 0 if n>=p
       fild qword ptr [n]               ; st=n, st(1)=0
       fild qword ptr [p]               ; st=p, st(1)=n, st(2)=0
       fcompp                           ; compare p and n and pop them off the stack
       fnstsw ax                        ; store the status word in ax
       and ah, 41h                      ; it is faster than sahf
       jnz E                            ; if p<=n, exit returning 0
       fstp st                          ; clear the stack
       fild qword ptr [n]               ; put n in the FPU
       fild qword ptr [p]               ; st=p, st(1)=n
       fld1                             ; st=1, st(1)=p, st(2)=n
       fdiv st, st(1)                   ; st=1/p, st(1)=p, st(2)=n
       fld1                             ; st=1, st(1)=1/p, st(2)=p, st(3)=n
   L:                                   ; a loop label
       fmul st, st(3)                   ; st=st*st(3)
       fld st                           ; save the product, st(2)=1/p, st(3)=p
       fmul st, st(2)                   ; st=product/p
       frndint                          ; st=round(product/p)
       fmul st, st(3)                   ; st=round(product/p)*p
       fsubp st(1), st                  ; st=product-round([product/p])*p
       ftst                             ; check if st=0
       fnstsw ax                        ; store the status word in ax
       and ah, 40h                      ; it is faster than sahf
       jnz S                            ; if st=0, exit returning 0
       fld1                             ; st=1, st(4)=counter
       fsub st(4), st                   ; st=1, st(4)=counter-1
       fcomp st(4)                      ; compare 1 and st(4) and pop the stack
       fnstsw ax                        ; store the status word in ax
       shr ah, 1                        ; it is faster than sahf
       jc L                             ; repeat the loop if the counter is > 1
   S:                                   ; exit with clearing the stack
       fstp st(3)                       ; put the return value at the bottom
       fcompp                           ; clear the floating point stack
   E:                                   ; the exit label
       ret                              ; return st
Fmodm   endp                             ; end of the function

To make a dll, we can use the same Makeit.bat file, but Facmod.def should be modified by adding new exports,

Facmod.def

LIBRARY   Facmod
EXPORTS   Facmodp
EXPORTS   Facmodm
EXPORTS   Fmodp
EXPORTS   Fmodp1
EXPORTS   Fmodm

In Maple, we can define the external call to Fmodm similarly to the external call to Fmodp,

>	Fmodm:=define_external( 'Fmodm', 'n'::integer[8], 'm'::integer[8], 'RETURN'::float[8], 'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll" ):

Test its speed on the same example,

>	time(Fmodm(10^8,10^8+7));

Certainly, Fmodm is faster for composite p,

>	time(Fmodm(10^8,10^8+9));

Finally, we can write a module FactorialMod similar to the module Facmod and including its functions modified by using external calls to Fmodp1 and Fmodm for n and/or p greater than 2^32-1.

FactorialMod module

FactorialMod:=module()
local Facmodp, Facmodm, Fmodp1, Fmodm;
export facmod, facmodp, facmodm;
description "The functions in this module calculate n! mod p. If p is prime, facmodp(n,p) should be used. If p is composite, facmodm(n,p) should be used. If p can be prime or composite, facmod(n,p) should be used. If n and/or p are greater than 2^49, result may be wrong.";
Digits:=max(15,Digits);
Facmodp:=define_external(
       'Facmodp',
       'n'::integer[4],
       'p'::integer[4],
       'RETURN'::integer[4],
       'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll"
);
Facmodm:=define_external(
       'Facmodm',
       'n'::integer[4],
       'm'::integer[4],
       'RETURN'::integer[4],
       'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll"
);
Fmodp1:=define_external(
       'Fmodp1',
       'n'::integer[8],
       'p'::integer[8],
       'RETURN'::float[8],
       'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll"
):
Fmodm:=define_external(
       'Fmodm',
       'n'::integer[8],
       'm'::integer[8],
       'RETURN'::float[8],
       'LIB'="F:/MyProjects/Assembly/Facmod/Facmod.dll"
):
facmodp := proc (n::nonnegint, p::posint)
local r;
   if p <= n then return 0
   elif n = 0 then return 1
   elif p > 4294967295 then
       if p > 562949953421312 then WARNING("result may be wrong") end if;
       if 1/2*p < n then
           if n = p-1 then return p-1
           else return `mod`((-1)^(p-n)/round(Fmodp1(p-n-1,p)),p)
           end if;
       else return `mod`(round(Fmodp1(n,p)),p)
       end if
   elif 1/2*p < n then
       if n = p-1 then return p-1
       elif p < 2147483648 then r := Facmodp(p-n-1,p)
       else r := Facmodp(p-n-1,p-4294967296)
       end if;
       return `mod`((-1)^(p-n)/`if`(r < 0,4294967296+r,r),p)
   else r:=Facmodp(`if`(n<2147483648,n,n-4294967296),`if`(p<2147483648,p,p-4294967296))
   end if;
   `if`(r < 0,4294967296+r,r)
end proc;
facmodm := proc (n::nonnegint, p::posint)
local r;
   if p <= n then return 0
   elif p > 4294967295 then
       if p > 562949953421312 then WARNING("result may be wrong") end if;
       return `mod`(round(Fmodm(n,p)),p)
   else
   r:=Facmodm(`if`(n<2147483648,n,n-4294967296),`if`(p<2147483648,p,p-4294967296));
   end if;
   `if`(r < 0,4294967296+r,r)
end proc;
facmod := proc (n::nonnegint, p::posint)
   if isprime(p) then facmodp(n,p)
   else facmodm(n,p)
   end if
end proc
end module:

Conclusion

As we saw here, the interaction between Maple and assembly language can be very fruitful. A DLL written in assembler provides speed unavailable in Maple or DLL written in high level languages. Maple, from another point of view, adds its symbolic capabilities making the calculations much faster by using division modulo p and isprime function in this example. The future improvements could be made by replacing the line "if p <= n then return 0" in the facmod procedure by including more sophisticated conditions guaranteed that n! = 0 mod p for composite p. For a specific processor, the assembler code could be further optimized. Other improvements could be achieved by using the gmp library supplied with Maple 9. It would be interesting to find a precise bound M such that all Fmodp1(n, p) give the correct answers for positive integers n and p less than M.

This article started from my posting [1]. The MASM32 author's website [2] contains useful assembler references. Dr. Agner Fog wrote a great optimization manual [3] and periodically updates it. The Intel developer's manual [4] contains a description of the assembler instructions.

I would like to thank my beautiful and wonderful wife, Bette for her support.

References

1. Alec Mihailovs, Huge factorials, comp.soft-sys.math.maple, 12/4/2003, http://groups.google.com/groups?threadm=oHNzb.7942%24Zu5.1208563%40news3.news.adelphia.net

2. S.L.Hutchesson, MASM32, http://www.movsd.com/

3. Agner Fog, How to optimize for the Pentium family of microprocessors, http://www.agner.org/assem/#optimize

4. IA-32 Intel Architecture Software Developers' Manual, v.2: Instruction Set Reference, http://developer.intel.com/design/Pentium4/manuals/

Disclaimer: While every effort has been made to validate the solutions in this worksheet, Waterloo Maple Inc. and Dr. Alec Mihailovs are not responsible for any errors contained and are not liable for any damages resulting from the use of this material.

Maple

Maple Add-Ons

Student Success Platform

Improving Retention Rates

MapleSim

MapleSim Add-Ons

Systems Engineering

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

App Preview:

Writing DLL in Assembly Language for External Calling in Maple

Maple

Powerful math software that is easy to use

Maple Add-Ons

Student Success Platform

Improving Retention Rates

MapleSim

Advanced System Level Modeling

MapleSim Add-Ons

Systems Engineering

Consulting Services

Maple T.A. and Möbius

Education

Industries

Automotive and Aerospace

Robotics

Machine Design & Industrial Automation

Other

Application Areas

Product Pricing

Purchasing

Institutional Student Licensing

Maplesoft Elite Maintenance (EMP)

Support

Product Training

Online Product Help

Webinars & Events

Publications

Content Hubs

Examples & Applications

Community

About Maplesoft

Media Center

User Community

Contact

App Preview:

Writing DLL in Assembly Language for External Calling in Maple