An interesting phenomenon of naturally occurring numbers is that the leading digit ‘1’ occurs with surprising frequency, that is, about 30% of the time. This is known as Benford’s Law and is discussed in a number of places (Wikipedia, Wolfram, Cut-the-Knot, NY Times). Statisticians can use Benford’s Law to try to detect fake data that people generate, probably with a simple Uniform(0,1) function such as `rand()` in so many programming languages.

What I wanted to do was generate random numbers that complied with Benford’s Law. Impatient? Generate some random Benford numbers now.

## Why

You mean, “Why am I trying to cheat more effectively?” No, but if I am trying to generate sample datasets for pedagogical purposes, I would like to use the most realistic fake numbers that I can.

## How

My script generates one digit at a time, and the likelihood of a particular digit 0..9 occurring depends on its place in the number. For example in generating a four digit integer, the first digit will be a ‘1’ 30% of the time, but the second digit will be a ‘1’ only 12% of the time. After the second digit, the numbers occur (in the script) with equal probability.

I use the following table as the basis for my calculations

Digit | First Place | Second Place |
---|---|---|

0 | 0 | 0.1197 |

1 | 0.3010 | 0.1139 |

2 | 0.1761 | 0.1088 |

3 | 0.1249 | 0.1043 |

4 | 0.0969 | 0.1003 |

5 | 0.0792 | 0.0967 |

6 | 0.0669 | 0.0934 |

7 | 0.0580 | 0.0904 |

8 | 0.0512 | 0.0876 |

9 | 0.0458 | 0.0850 |

The simplest case is when I am generating a fixed number of digits. I know that the first digit is never a zero, so I can use the tables exclusively.

In the case where I want to generate all integers *up to* a certain point, I have to be a bit more sneaky. Suppose I want to generate integers from [1..35]. I will begin by generating a digit, say, 4. I check to see if 4 is the largest number ≤ 35 that I can generate that starts with a 4, and sure enough 4×10=40 is greater than 35, so I stop there. Voila: a single digit number.

Suppose that in generating integers from [1..35], I first generate a 2. It is possible that I could generate a second digit and end up with, say, 27, so the above test will not suffice. Next I check the probability that any uniformly-distributed integer from [1..35] will be a single digit (9 out of 35), and if a random number draw gives me this probability, I simply return the value 2 and leave it at that.

## The Script

I am hosting the script on my SourceForge pages here: http://iharder.sourceforge.net/benford.php I had started with a JavaScript version, but I thought a PHP-based script would be more useful.

PURPOSE: Generates random numbers that comply with Benford's Law. PARAMETERS: help Display this help message (default behavior). source Echoes the source code for this script. count The number of numbers to generate (default is 100) ex: .../benford.php?count=200 FIXED LENGTH: format Instead of upto generate numbers with the given format, where X signifies a digit and any other character is simply echoed back. ex: .../benford.php?format=X.XXX VARIABLE LENGTH: upto Generate numbers from 1 to this value [1..upto] instead of fixed length numbers, as with 'format'. ex: .../benford.php?upto=150 includeZero When used with upto the number zero will be included in the random numbers [0..upto]. LICENSE: This code is released as Public Domain. AUTHOR: Robert Harder, rob _ iharder.net

## Examples

To generate random house numbers for fake addresses, try http://iharder.sourceforge.net/benford.php?upto=9999 to generate numbers from 1 to 9999 (1-, 2-, 3-, and 4-digit house numbers).

To generate random car prices, try http://iharder.sourceforge.net/benford.php?format=XXXXX.

Enjoy!

I was about to start a similar project to help me learn python. (I like mathy programs, what can I say.) I wasn’t going to go to the depth you have, HOWEVER, I’d like to port your php code to python. That is, if you care to share your code. š

Being the manager of this site I’m guessing you get to see my email address. You can find my email address at my website.

If you add the “source” parameter, it will give the source code. http://iharder.sourceforge.net/benford.php?source

In other words, RTFM. Doh! Now I see it. Thanks!