Krista's Coding Corner


Know what you are calculating: mean values

Coding isn’t only code but also knowing what we code and why. You need to know whether the code is doing right things, and doing them correctly. One thing where we often go wrong is handling data.

First and most important rule of processing data is to know what your data is. It’s not enough that you know that it includes statistics from dairy products consumption rates. You have to know the facts (like that it is consumption rates from New Zealand from year 2012, and the numbers have be collected from certain food chains) but also how the data behaves. The easiest way to know at least something from your data is to draw just couple of raw plots from it. Plots will tell you if the data has some strange distribution or other remarkable features present.

Even the most common and easiest functions and methods of getting some information from data may fail if they are applied to wrong sort of data. And you need to know if this happens with you and your data.

The easiest example of this is the average value of data (calculated with arithmetic mean). I guess everybody has used this at least to calculate their school grades. And it works just fine for that purpose. But what if your data looks like this:

Or this:

Situations like above just don’t go great with mean values as it won’t give any extra info and it can even be misleading! Then you just need to try something else. I can’t give any good tips but your data will definitely lead your way. :)

PS. Average and mean values aren’t really THAT simple. If you think they are, read this:

PPS. Btw, did you know that the normal formula for calculating the mean value is actually a maximum likelihood estimator of a gaussian distribution?

blog comments powered by Disqus