Tag Archives: statistics

Benford’s Law and corporate lies

Benford’s Law is one of those things that has always made me scratch my head.  It just doesn’t make sense!  Here’s what the law boils down to:

A second earth-shattering fact is that there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9.  And more numbers that begin with 2 than 3, or 4, and so on.  This relationship holds for the lengths of rivers, the populations of cities, molecular weights of chemicals, and any number of other categories.  What a blow to any of us who purport to have mastered the basic facts of the world around us!

One of the cool things this law allows us to do is to detect inaccurate corporate accounting!

In fact, Benford’s law has been used in legal cases to detect corporate fraud, because deviations from the law can indicate that a company’s books have been manipulated.

Jialan Wang wanted to find out if corporate accounting deviated from Benford’s law and how that changed over time.

So according to Benford’s law, accounting statements are getting less and less representative of what’s really going on inside of companies.  The major reform that was passed after Enron and other major accounting standards barely made a dent.
Next, I looked at Benford’s law for three industries: finance, information technology, and manufacturing.  The finance industry showed a huge surge in the deviation from Benford’s from 1981-82, coincident with two major deregulatory acts that sparked the beginnings of that other big mortgage debacle, the Savings and Loan Crisis.  The deviation from Benford’s in the finance industry reached a peak in 1988 and then decreased starting in 1993 at the tail end of the S&L fraud wave, not matching its 1988 level until … 2008.
Read the post with more data here.

The base rate fallacy

Amongst my favorite fallacies lies the base rate fallacy.

Here’s a great introduction to this fallacy on the BBC’s website.

Imagine you’ve invented a machine to detect terrorists. It’s good, about 90% accurate. You sit back with pride and think of the terrorists trembling.

You’re in the Houses of Parliament demonstrating the device to MPs when you receive urgent information from MI5 that a potential attacker is in the building. Security teams seal every exit and all 3,000 people inside are rounded up to be tested.

The first 30 pass. Then, dramatically, a man in a mac fails. Police pounce, guns point.

How sure are you that this person is a terrorist?
A. 90%
B. 10%
C. 0.3%

JSON feed for Source servers

I’m kind of a statistics whore.

To further my pursuit of more statistics, I’ve developed a Python CGI which provides a JSON feed of all sorts of information about servers based on Valve’s Source engine.

This script has three dependencies outside of what is in the standard Python library:

Once you have those installed (worst-case scenario just drop them in the same directory as sourcejson.py), you need to edit sourcejson.ini.  This file just has three options: rcon_pass, ip, and port.  If you don’t have the rcon password for the server you wish to monitor just leave it blank.

To access the JSON feed just use the url to sourcejson.py.  Something like this:

http://YOUR_DOMAIN/sourcejson.py

To access the extended information that having an rcon password allows use:

http://YOUR_DOMAIN/sourcejson.py?info=extended

If you use the extended format, but haven’t provided an rcon password in sourcejson.ini, it will just default to the basic feed.

Included in the zip is a .htaccess to prevent directory listings, which will help protect the contents of sourcejson.ini.

Download sourcejson here!