Tag Archives: statistics

Science with a capital S is better than you.

So, yesterday I shared this post on Google+:

This boulder on the moon was set a-rollin’ by whatever process. The interesting thing to me is that you can see some craters overlapping the track it created as it rolled.

From this, scientists estimate this track was created 50-100 million years ago.

Notice the impact craters overlapping the track created by the rolling boulder.

This got me to thinking about how they determined the age.  While I haven’t talked to the scientists who came up with this age figure, I imagine it went something like this:

  1. Have a model for frequency of asteroid impacts over time per unit of area of Moon surface.
  2. Determine area of tracks.
  3. Count impact craters overlapping tracks.
  4. Using impact frequency model determine how much time would have to pass before you would see the number of overlapping impact craters.

The interesting thing here is that, going by a layperson’s definition of “wrong”, the number you come up with in this scenario could be completely wrong.  I think a lot of reporting on science, and even the statements scientists make to the public, are “wrong” in the same manner.

You see, the 50-100 million year figure doesn’t make a lot of sense in isolation.  It should have probabilities assigned to it.  The real answer isn’t “50-100 million years”, it’s a, for example, (rough and dirty) graph like this:

Impact Probabilities

You see, it’s possible that the asteroid impacts all happened yesterday.  It’s unlikely, but it’s possible.

So anyway, this is usually acknowledged when actually doing Science-with-a-capital-S, it’s just that this is often lost when communicating with the public.  The thing I find interesting about this, is that, this view of things having probabilities attached to them is the way the word actually works and yet the general attitude people have doesn’t acknowledge this.

GTFO Naked Girl. I'm doing science!

Most people operate as if things either happened or not.  Of being real or not real.  Even things that you would say you’re 100% sure of…like the color of the sky…have a probability assigned to them.  You may be 100% sure, but that 100% is a measure of your over-confidence, not of reality.  For example, there’s a non-zero chance you may be living in a dream or hallucination.

What about your values, your religion, your politics?  Are your values self-consistent?  Is there a God?  Do your political leanings actually lead to the type of world you want?  There’s probabilities assigned to all of ’em, and that probability is a lot lower than the previous example about the color of the sky.

Benford’s Law and corporate lies

Benford’s Law is one of those things that has always made me scratch my head.  It just doesn’t make sense!  Here’s what the law boils down to:

A second earth-shattering fact is that there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9.  And more numbers that begin with 2 than 3, or 4, and so on.  This relationship holds for the lengths of rivers, the populations of cities, molecular weights of chemicals, and any number of other categories.  What a blow to any of us who purport to have mastered the basic facts of the world around us!

One of the cool things this law allows us to do is to detect inaccurate corporate accounting!

In fact, Benford’s law has been used in legal cases to detect corporate fraud, because deviations from the law can indicate that a company’s books have been manipulated.

Jialan Wang wanted to find out if corporate accounting deviated from Benford’s law and how that changed over time.

So according to Benford’s law, accounting statements are getting less and less representative of what’s really going on inside of companies.  The major reform that was passed after Enron and other major accounting standards barely made a dent.
Next, I looked at Benford’s law for three industries: finance, information technology, and manufacturing.  The finance industry showed a huge surge in the deviation from Benford’s from 1981-82, coincident with two major deregulatory acts that sparked the beginnings of that other big mortgage debacle, the Savings and Loan Crisis.  The deviation from Benford’s in the finance industry reached a peak in 1988 and then decreased starting in 1993 at the tail end of the S&L fraud wave, not matching its 1988 level until … 2008.
Read the post with more data here.

The base rate fallacy

Amongst my favorite fallacies lies the base rate fallacy.

Here’s a great introduction to this fallacy on the BBC’s website.

Imagine you’ve invented a machine to detect terrorists. It’s good, about 90% accurate. You sit back with pride and think of the terrorists trembling.

You’re in the Houses of Parliament demonstrating the device to MPs when you receive urgent information from MI5 that a potential attacker is in the building. Security teams seal every exit and all 3,000 people inside are rounded up to be tested.

The first 30 pass. Then, dramatically, a man in a mac fails. Police pounce, guns point.

How sure are you that this person is a terrorist?
A. 90%
B. 10%
C. 0.3%

JSON feed for Source servers

I’m kind of a statistics whore.

To further my pursuit of more statistics, I’ve developed a Python CGI which provides a JSON feed of all sorts of information about servers based on Valve’s Source engine.

This script has three dependencies outside of what is in the standard Python library:

Once you have those installed (worst-case scenario just drop them in the same directory as sourcejson.py), you need to edit sourcejson.ini.  This file just has three options: rcon_pass, ip, and port.  If you don’t have the rcon password for the server you wish to monitor just leave it blank.

To access the JSON feed just use the url to sourcejson.py.  Something like this:


To access the extended information that having an rcon password allows use:


If you use the extended format, but haven’t provided an rcon password in sourcejson.ini, it will just default to the basic feed.

Included in the zip is a .htaccess to prevent directory listings, which will help protect the contents of sourcejson.ini.

Download sourcejson here!