Application Center - Maplesoft

App Preview:

First Digit Simulation

You can switch back to the summary page by clicking here.

Learn about Maple
Download Application


 

Image 

FIRST DIGIT SIMULATION 

Author:  Roland Engdahl
University of Kalmar
Sweden
mr.engdahl@telia.com 

Background 

'First Digit' is the first digit, not equal zero,  in a number. i e the first significant digit. e g in 0.0003457  the first digit is 3. 

HistoryNewcomb noticed in 1881  that in logarithm tables the first pages, containing numbers starting with 1, were more dirty than later pages.Benford rediscovered the properties in 1938. He formulated a law for the occurence of first digits:Benfords Law:  The probability that the first digit is k is     log(1+1/k)         ( log(a) =  Typesetting:-mrow(Typesetting:-msub(Typesetting:-mi(There are two aspects of Benfords law:1. Application to  numerical data2. Numerical calculation 

Application to numerical data 

On the WEB we can get many applications to numerical data e g physical constants,
stock market, rivers area,  population, Am. league, addresses, squareroots, death rate, newspapers.
 

A more interesting application of Benfords principle is fraud detection in business accounts. 

Numerical calculation  

Before 1600 it was hard to do the product of several factors, even worse to divide. 

John Napier (1550-1617) invented logarithms. In tables you could find the logarithm for a number.  

To multiply  numbers you had to add the logarithms and to divide you had to subtract the logarithms.  

At last you had to go the other way in the tables, finding the 'antilogarithm'. 

This method was common in schools and applications until the seventies. 

An important facility was the slide rule. The scale is logarithmic, so the distance between 1 and 2 is the same as the distance between 2 and 4 etc 

 

Image 

To multiply is to add distances on the slide rule, to divide is to subtract distances on the slide rule. 

After some multiplications or divisions of numbers taken at random the probability to end betwwen 1 and 2 is about 30 percent (log(2)). 

If you are the owner of a slide rule compare the distance between 1 and 2 with the distance  between 1 and 10. 

 

Simulation with random numbers 

Quotation from Donald E. Knuth, The Art of Computer Programming vol 2 p 173 

The most prudent policy for a person to follow is to run each Monte Carlo program at least twice
using quite different sources of random numbers, before taking the answers of the program seriously;
this not only will give an indication of the stability of the results,
 

it also will guard against the danger of trusting in a generator with hidden deficiencies.  

(Every random number generator will fail in at least one application.) 

 

Donald E. Knuth. The Art of Computer Programming, vol 2. Ch 3 Random Numbers Ch 4.2.4  
Theory about leading digits p 238-
 

 

You can select different random number generators. 

OBSERVE:  The result of calling rand() depends of the version of Maple: 

 

Early version of Maple is here Maple versions up to and including 9.5          Eversion 

Later version of Maple is here Maple from version 10                                Lversion 

 

Simulation with Built-in generators 

In RandomTools[Linear Congruence] the generator is of the linear congruence type  

x[k+1] = m*x[k]  mod p 

p = 999999999989 is the greatest prime with 12 digits,  m = 427419669081 multiplier  

For short we call this generator for single 

 

In Eversion when calling  rand() you  get version RandomTools[Linear Congruence]  i e  single generator  

 

In Lversion when calling rand() you get  version  RandomTools[MersenneTwister]{GenerateInteger] 

This generator is not of LinearCongruence type 

RandomTools[MersenneTwister][NewGenerator] is applicable in Lversion and gives the same result as rand() 

 

Simulation with triple-generator 

The generator triple consists of three generators of linear congruence type in parallel for 'smoothing out' 

 

In Eversion  

Simulation in Built-in generator:    single and rand are the same procedure:  NewGenerator not applicable 

Simulation in triple-generator:       same result as Lversion (LinearCongruence) 

Simulatiom:Another approach...     not applicable 

 

In  Lversion 

Simulation in Built-in generator:    single and rand are quite different procedures:   

Simulation in triple-generator:       same result as Eversion (LinearCongruence) 

Simulatiom:Another approach:      applicable       RandomTools[MersenneTwister][NewGenerator] very fast  

 

 

We can compare the results from the simulations with different generators. See the results at Comments and results.  

If you take only one factor (nrf = 1) in the  product the relative frequency of first digits will be (almost) equal for all digits 1..9.  

This is an indication of that the random number generator passes the frequency test..   

 

Let the random number generator produce  a great many numbers. The computer calculates the product of these random numbers.  

At last we look at the first  digit (first nonzero digit) in the product.  We study the distribution of these first digits.  

Choose the number of factors in  nrf                 

Choose the number of trials in trial  

 

Simulation with Built-in generators 

The single generator        

> Typesetting:-mrow(Typesetting:-mi(
 

     parameters for random number generator single        
     single is  of  the same  as the
built-in generator in LinearCongruence  

> m:=427419669081: p:=999999999989: x:=32564:
 

> single:=proc() local s: global x,m,p:
x:=x*m mod p: s:=x/p: evalf(frac(s));
end proc;
 

proc () local s; global x, m, p; `:=`(x, `mod`(`*`(x, `*`(m)), p)); `:=`(s, `/`(`*`(x), `*`(p))); evalf(frac(s)) end proc (5.1.1)
 

     procedure  prodrand gives the first digit with nrf factors in the product and the generator singel 

> prodrand:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*single(); end do;
while pr < 1 do  pr:=pr*10  end do; trunc(pr)
end proc;
 

proc () local i, pr; `:=`(pr, 1); for i to nrf do `:=`(pr, `*`(pr, `*`(single()))) end do; while `<`(pr, 1) do `:=`(pr, `+`(`*`(10, `*`(pr)))) end do; trunc(pr) end proc (5.1.2)
 

     Choose nrf :  number of factors            

> nrf:=2:                
 

    Choose trial    number of trials        

> trial := 10000:
 

> for k from 1 to 9 do u[k]:=0 end:
 

> for k from 1 to trial do s:=prodrand();
u[s]:=u[s]+1 end do:
 

> print('i', relfreq); Digits:=5:
for i from 1 to 9 do i,evalf(u[i]/trial) end do;
 

 

 

 

 

 

 

 

 

 

i, relfreq
1, .24740
2, .18110
3, .14610
4, .12030
5, 0.92200e-1
6, 0.73900e-1
7, 0.58600e-1
8, 0.47900e-1
9, 0.32500e-1 (5.1.3)
 

The rand generator        

> Typesetting:-mrow(Typesetting:-mi(
 

      pp module for built-in generator rand 

> pp:=10^12:
 

      procedure prodrand  gives the first digit with  nrf factors in the product and
      the
built-in random number generator rand() 

> prodrand:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*rand()/pp; end do;
while pr < 1 do  pr:=pr*10  end do; trunc(pr)
end proc;
 

proc () local i, pr; `:=`(pr, 1); for i to nrf do `:=`(pr, `/`(`*`(pr, `*`(rand())), `*`(pp))) end do; while `<`(pr, 1) do `:=`(pr, `+`(`*`(10, `*`(pr)))) end do; trunc(pr) end proc (5.2.1)
 

       Start new experiment from here 

      Choose nrf :  number of factors          

> nrf:=2:                
 

      Choose trial :  number of trials          

> trial := 10000:
 

> for k from 1 to 9 do u[k]:=0 end:
 

> for k from 1 to trial do s:=prodrand();
u[s]:=u[s]+1 end do:
 

      Record of distribution of first digit of the product 

> print('i',relfreq);Digits:=5:
for i from 1 to 9 do i, evalf(u[i]/trial) end do;
 

 

 

 

 

 

 

 

 

 

i, relfreq
1, .24190
2, .18160
3, .14880
4, .11670
5, 0.90100e-1
6, 0.79700e-1
7, 0.62200e-1
8, 0.48200e-1
9, 0.30800e-1 (5.2.2)
 

NewGenerator               

> restart: with(RandomTools[MersenneTwister]);
 

[GenerateFloat, GenerateFloat64, GenerateInteger, GenerateInteger32, GenerateUnsignedInt32, GetState, NewGenerator, SetState] (5.3.1)
 

> M:=NewGenerator(range=10^12);
 

proc () (proc () option builtin = RandNumberInterface; end proc)(6, 1000000000000, 40) end proc (5.3.2)
 

> prodrand:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*Float(M(),-12); end do;
while pr < 1 do  pr:=pr*10  end do; trunc(pr)
end proc;
 

proc () local i, pr; `:=`(pr, 1); for i to nrf do `:=`(pr, `*`(pr, `*`(Float(M(), -12)))) end do; while `<`(pr, 1) do `:=`(pr, `+`(`*`(10, `*`(pr)))) end do; trunc(pr) end proc (5.3.3)
 

     Start new experiment from here 

    Choose nrf :  number of factors  

> nrf:=2:                
 

      Choose trial :  number of trials    

> trial := 10000:
 

> for k from 1 to 9 do u[k]:=0 end:
 

> for k from 1 to trial do s:=prodrand();
u[s]:=u[s]+1 end do:
 

      Record of distribution of first digit of the product 

> print('i',relfreq);Digits:=5:
for i from 1 to 9 do i, evalf(u[i]/trial) end do;
 

 

 

 

 

 

 

 

 

 

i, relfreq
1, .24190
2, .18160
3, .14880
4, .11670
5, 0.90100e-1
6, 0.79700e-1
7, 0.62200e-1
8, 0.48200e-1
9, 0.30800e-1 (5.3.4)
 

Simulation with triple generator 

> Typesetting:-mrow(Typesetting:-mi(
 

     Parameters for random number generator triple.  
     Produced with numtheory :  safeprime (p)  and primroot (m)        
triple is  of  the same type as the built-in generator singel  in LinearCongruence  
     Triple has three generators in parallel for
smooting out for better performance in serial test 

> m[1]:=413654159661: p[1]:=876543220739: x[1]:=32564:
m[2]:=319894150324: p[2]:=676543219907: x[2]:=536766:
m[3]:=274222336502: p[3]:=555985626419: x[3]:=2321:
 

> triple:=proc() local s,i: global x,m,p;
s:=0: for i from 1 to 3 do x[i]:=x[i]*m[i] mod p[i]: s:=s+x[i]/p[i]: end do:
evalf(frac(s)); end proc;
 

proc () local s, i; global x, m, p; `:=`(s, 0); for i to 3 do `:=`(x[i], `mod`(`*`(x[i], `*`(m[i])), p[i])); `:=`(s, `+`(s, `/`(`*`(x[i]), `*`(p[i])))) end do; evalf(frac(s)) end proc (6.1)
 

     procedure  trust gives the first digit with generator triple and nrf factors    

> trust:=proc( ) local i,pr;
pr:=1; for i from 1 to nrf do pr:=pr*triple(); end do;
while pr < 1 do  pr:=pr*10  end do; trunc(pr)
end proc;
 

proc () local i, pr; `:=`(pr, 1); for i to nrf do `:=`(pr, `*`(pr, `*`(triple()))) end do; while `<`(pr, 1) do `:=`(pr, `+`(`*`(10, `*`(pr)))) end do; trunc(pr) end proc (6.2)
 

     Choose nrf :  number of factors         

> nrf:=2:                
 

    Choose trial    number of trials        

> trial := 10000:   
 

> for k from 1 to 9 do u[k]:=0 end:
 

> for k from 1 to trial do s:=trust();
u[s]:=u[s]+1 end do:
 

> print('i', relfreq); Digits:=5:
for i from 1 to 9 do i,evalf(u[i]/trial) end do;
 

 

 

 

 

 

 

 

 

 

i, relfreq
1, .23910
2, .18580
3, .14800
4, .11560
5, 0.93400e-1
6, 0.77000e-1
7, 0.60700e-1
8, 0.48200e-1
9, 0.32200e-1 (6.3)
 

Comments and results from simulations 

 

Comments on Random Number Generators 

In what extent can we trust a pseudo-random generator? 

In Birger Jansson, Random Number Generators (doctor's dissertation 1966)
and at Donald Knuth we find different kinds of test:
 

Frequency tests, Serial correlation, Poker tests, Gap tests, Run tests,
Coupon collector's test, The Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi(- test, Visual tets.
 

 

Summary of results
number of
trials : 10 000 000 

From the results we can expect that the error is at most 1 promille 

The generators can be considered  suitable for the purpose! 

FD is First Digit    

BIN is built-in generator rand() in Lversion    triple is triple-generator    

 

                                        relative  frequency 

nrf          FD                   BIN                  triple           

 

 2              FD =1                      0.24142                     0.24125 

                 FD=2                       0.18309                     0.18313 

 

 3               FD=1                       0.30057                     0.30062 

                  FD=2                       0.18839                     0.18817 

 

 4               FD=1                       0.30731                     0.30752 

                  FD=2                       0.17814                     0.17839 

 

 5               FD=1                       0.30276                     0.30287 

                  FD=2                       0.17525                     0.17518 

 

 6               FD=1                       0.30081                     0.30063 

                  FD=2                       0.17557                     0.17556 

 

Theory for nrf = 2 

 

nrf2FD1 is probability for FD = 1 

nrf2FD2 is probability for FD = 2 

 

We leave to Maple do the work 

> restart:
 

nrf=2 Typesetting:-mrow(Typesetting:-mo(  

Typesetting:-mrow(Typesetting:-mi(1-2 

> a1:=int(2/x-1,x=1..2);
 

`+`(`-`(1), `*`(2, `*`(ln(2)))) (7.1)
 

10-20 

> b1:=int(10-10/x,x=1..2)+int(20/x-10/x,x=2..10);
 

`+`(10, `-`(`*`(10, `*`(ln(2)))), `*`(10, `*`(ln(5)))) (7.2)
 

> (nrf2FD1):=evalf((a1+b1)/81);
 

.2413481688 (7.3)
 

Typesetting:-mrow(Typesetting:-mi( 

 

Typesetting:-mrow(Typesetting:-mi( 

> a2:=int(3/x-2/x,x=1..2)+int(3/x-1,x=2..3);
 

`+`(`-`(`*`(2, `*`(ln(2)))), `-`(1), `*`(3, `*`(ln(3)))) (7.4)
 

Typesetting:-mrow(Typesetting:-mn( 

> b2:=int(10-20/x,x=2..3)+int(30/x-20/x,x=3..10);
 

`+`(10, `*`(30, `*`(ln(2))), `-`(`*`(30, `*`(ln(3)))), `*`(10, `*`(ln(5)))) (7.5)
 

> (nrf2FD2):=evalf((a2+b2)/81);
 

.1832094861 (7.6)
 

 

The correspondence with simulated and theoretical values are very good. 

Typesetting:-mrow(Typesetting:-mi( 

     

There are good reasons to trust the simulated values even for nrf  > 2 

It seems that P(FD=1) has a maximum for nrf = 4 and that P(FD=1) for nrf>2 lies near log(2) 

Simulation: Principle Slide Rule  

We study the numbers  k * r  where r is a given irrational number and k passes the natural numbers from 1 to trial              c   to convert    from   Typesetting:-mrow(Typesetting:-mi(  = exp(x)  to   Typesetting:-mrow(Typesetting:-msup(Typesetting:-mn( 

> restart: r:=evalf(-sqrt(2)/2): s:=0: c:=evalf(ln(10)):
 

> trial:=10000:
 

     In sumlog we add the irrational number r to s.  s is transformed to (0,1) . Let log(u) = r.
     The effect of add r is the same as multiply with u.      On the slide rule we add distances when doing multiplication.
 

> sumlog:=proc() global s,r,c;
s:=s+r;  while s<0 do s:=s+1 end do;
trunc(exp(s*c))
end proc;
 

proc () global s, r, c; `:=`(s, `+`(s, r)); while `<`(s, 0) do `:=`(s, `+`(s, 1)) end do; trunc(exp(`*`(s, `*`(c)))) end proc (8.1)
 

> for i from 1 to 9 do u[i]:=0 end do:
 

> for k from 1 to trial do t:=sumlog():
u[t]:=u[t]+1 end do:
 

> Digits:=5: print(digit,relfreq);
for k from 1 to 9 do k, evalf(u[k]/trial) end do;
 

 

 

 

 

 

 

 

 

 

digit, relfreq
1, .30110
2, .17610
3, .12500
4, 0.96800e-1
5, 0.79200e-1
6, 0.66900e-1
7, 0.58000e-1
8, 0.51300e-1
9, 0.45600e-1 (8.2)
 

Benfords law 

table of log(i+1)-log(i) for i = 1..9 

> Digits:=5:print('i','log10(1+1/i)');
for i from 1 to 9 do i,evalf(log10(1+1/i)) end do;
 

 

 

 

 

 

 

 

 

 

i, log10(`+`(1, `/`(1, `*`(i))))
1, .30103
2, .17609
3, .12493
4, 0.96908e-1
5, 0.79180e-1
6, 0.66959e-1
7, 0.58008e-1
8, 0.51151e-1
9, 0.45753e-1 (9.1)
 

Results and comments 

 

                              slide rule principle         Benfords law 

FD = 1                                 0.30103                                0.30103 

FD = 2                                 0.17609                                0.17609 

 

 

The agreement between simulation and Benfords law is not surprising. 

 

We study the numbers   k*r  where r is an irrational number and k passes the the natural numbers  

It is a known fact the  numbers frac(k*r) lie densely and the distribution is uniform  on the interval (0,1) 

Let  log (u) = r. In the following we assume that  Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi(  is reduced to the interval (0,1) 

Let F(x) be the probability that   Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi(< x.      F(x) = P (Typesetting:-mrow(Typesetting:-mo() 

This implies    

log(Typesetting:-mrow(Typesetting:-msup(Typesetting:-mi() < log(x)       and this is  equivalent with  frac(k*r) < log(x) 

So we have F(x) = log(x) 

 

First digit in frac(k*r)) ≤ j implies frac(k*r) < j+1. This occurs with  the probability log(j+1) 

 

P(First Digit = h) = log(h+1)-log(h) = log(1+1/h)     

 

Slide rule : 

To multiply with u repeatedly  is the same as  add distance r repeatedly. If necessary go 'back to' interval (1,10). 

We note that the distribution is dense and uniform 

How often do we end in the interval (1,2) on the slide rule?. 

The relative frequency must be the ratio on the slide rule between distance (1-2) and distance (1-10)   i e  log(2). 

So we may realize that the relative frequency for first digit is equal to 1 is log(2) 

 

Simulation: Another approach  

 

When multiply with a random number the products are  accumulated. In this program we only study  FD=1 

 

> restart: with(RandomTools[MersenneTwister]);
 

[GenerateFloat, GenerateFloat64, GenerateInteger, GenerateInteger32, GenerateUnsignedInt32, GetState, NewGenerator, SetState] (11.1)
 

> M:=NewGenerator(range=10^10);
 

proc () (proc () option builtin = RandNumberInterface; end proc)(6, 10000000000, 34) end proc (11.2)
 

       

     In maple we multiply pr with a random number in the interval (0.01 - 0.1) (In Float(M(),-11)
     On the slide rule we start with 1 and multiply repeatedly with random numbers (uniformly distributed)      and read the first digit for every new random number.
     For every new serie we start with pr=1
 

> maple:=proc() global pr;
pr:=pr*Float(M(),-11);  
while pr<1 do pr:=pr*10 end do;
trunc(pr)
end proc;
 

proc () global pr; `:=`(pr, `*`(pr, `*`(Float(M(), -11)))); while `<`(pr, 1) do `:=`(pr, `+`(`*`(10, `*`(pr)))) end do; trunc(pr) end proc (11.3)
 

      Start new experiment  from here 

     Choose number of trials for each serie     

> trial:=10000: serie :=3:
 

> for i from 1 to serie  do u[i]:=0:
pr:=1:for k from 1 to trial do t:=maple():
if t=1 then u[i]:=u[i]+1 end if end do: end do:
Digits:=5: print(relfreq);
for k from 1 to serie do  evalf(u[k]/trial) end do;
print(average);
su:=0: for k from 1 to serie do su :=su + u[k]: end do:
evalf(su/trial/serie);
 

 

 

 

 

 

relfreq
.30040
.30280
.29540
average
.29953 (11.4)
 

         COMMENTS         From the results we can see the variations in the series of simulations.     It seems that the probability for FD=1 is near log(2) as in the other simulations.     Theory may be rather complex.     Do your own experiments with several values for trial and serie         This simulation resembles SIMULATION Principle slide Rule There we added a constant term.     Here we perform the  simulation by  multiplying with a new factor at haphazard every time on the slide rule:     How often will the result end up between 1 and 2 on the slide rule?          If you wish to write comments to me by e-mail,
    you must specify the subject to   'First Digit'   (e-mail with other subjects are deleted)
 


Legal Notice: The copyright for this application is owned by the author(s). Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact the author for permission if you wish to use this application in for-profit activities.
 

Image