jump to navigation

How To Generate Your Own Benford’s Law Numbers November 10, 2010

Posted by Robert Harder in Utility.
Tags: , ,
trackback

An interesting phenomenon of naturally occurring numbers is that the leading digit ‘1’ occurs with surprising frequency, that is, about 30% of the time. This is known as Benford’s Law and is discussed in a number of places (Wikipedia, Wolfram, Cut-the-Knot, NY Times). Statisticians can use Benford’s Law to try to detect fake data that people generate, probably with a simple Uniform(0,1) function such as rand() in so many programming languages.

What I wanted to do was generate random numbers that complied with Benford’s Law. Impatient? Generate some random Benford numbers now.

Why

You mean, “Why am I trying to cheat more effectively?” No, but if I am trying to generate sample datasets for pedagogical purposes, I would like to use the most realistic fake numbers that I can.

How

My script generates one digit at a time, and the likelihood of a particular digit 0..9 occurring depends on its place in the number. For example in generating a four digit integer, the first digit will be a ‘1’ 30% of the time, but the second digit will be a ‘1’ only 12% of the time. After the second digit, the numbers occur (in the script) with equal probability.

I use the following table as the basis for my calculations

Digit First Place Second Place
0 0 0.1197
1 0.3010 0.1139
2 0.1761 0.1088
3 0.1249 0.1043
4 0.0969 0.1003
5 0.0792 0.0967
6 0.0669 0.0934
7 0.0580 0.0904
8 0.0512 0.0876
9 0.0458 0.0850
Benford’s Law Probabilities (source Simon Newcomb)

The simplest case is when I am generating a fixed number of digits. I know that the first digit is never a zero, so I can use the tables exclusively.

In the case where I want to generate all integers up to a certain point, I have to be a bit more sneaky. Suppose I want to generate integers from [1..35]. I will begin by generating a digit, say, 4. I check to see if 4 is the largest number ≤ 35 that I can generate that starts with a 4, and sure enough 4×10=40 is greater than 35, so I stop there. Voila: a single digit number.

Suppose that in generating integers from [1..35], I first generate a 2. It is possible that I could generate a second digit and end up with, say, 27, so the above test will not suffice. Next I check the probability that any uniformly-distributed integer from [1..35] will be a single digit (9 out of 35), and if a random number draw gives me this probability, I simply return the value 2 and leave it at that.

The Script

I am hosting the script on my SourceForge pages here: http://iharder.sourceforge.net/benford.php I had started with a JavaScript version, but I thought a PHP-based script would be more useful.

PURPOSE: Generates random numbers that comply with Benford's Law.

PARAMETERS:
 help         Display this help message (default behavior).

 source       Echoes the source code for this script.

 count        The number of numbers to generate (default is 100)
              ex: .../benford.php?count=200

 FIXED LENGTH:
 format       Instead of upto generate numbers with the given
              format, where X signifies a digit and any other
              character is simply echoed back.
              ex: .../benford.php?format=X.XXX

 VARIABLE LENGTH:
 upto         Generate numbers from 1 to this value [1..upto]
              instead of fixed length numbers, as with 'format'.
              ex: .../benford.php?upto=150

 includeZero  When used with upto the number zero will be
              included in the random numbers [0..upto].

LICENSE: This code is released as Public Domain.
AUTHOR: Robert Harder, rob _ iharder.net

Examples

To generate random house numbers for fake addresses, try http://iharder.sourceforge.net/benford.php?upto=9999 to generate numbers from 1 to 9999 (1-, 2-, 3-, and 4-digit house numbers).

To generate random car prices, try http://iharder.sourceforge.net/benford.php?format=XXXXX.

Enjoy!

Comments»

1. Dax Mickelson - May 18, 2014

I was about to start a similar project to help me learn python. (I like mathy programs, what can I say.) I wasn’t going to go to the depth you have, HOWEVER, I’d like to port your php code to python. That is, if you care to share your code. 🙂

Being the manager of this site I’m guessing you get to see my email address. You can find my email address at my website.

2. Robert Harder - May 18, 2014

If you add the “source” parameter, it will give the source code. http://iharder.sourceforge.net/benford.php?source

3. Dax MIckelson - May 19, 2014

In other words, RTFM. Doh! Now I see it. Thanks!

4. Best Online - October 23, 2014

Best Online

Think Harder » How To Generate Your Own Benford’s Law Numbers

5. Data Science – References – Dr. Idlewyld's Data Analysis Emporium and Assorted Quantitative Goodies - May 5, 2016

[…] R.Harder [2010] How To Generate Your Own Benford’s Law Numbers, Think Harder […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: