As a retired mathematician, it was pretty exciting for me to have the chance to do some (rudimentary) mathematical programming recently. In our new and improved search algorithm on HireSpace.com (see my blog on this here!) we take measures of quality, e.g. customer feedback, into account when ranking search results. So positive feedback should boost a venue’s search ranking and negative feedback demote it. To decide how much to boost rankings by we wanted to know how much a venue deviates from the mean for a given measure of quality.

Keeping things simple, we decided to model all measures of quality by a normal distribution (see definition on Wikipedia). I created a static class called ProbabilityUtils with everything you need to work out probability densities for normally distributed variables. This class is up on the Hire Space github account here, but here’s a walk through of what it contains and how it’s used.

Getting the mean of a list of values is easy, you just use the built in `.Average()` method. The first custom method we need then is one to work out the standard deviation:

public static double StandardDeviation(IEnumerable values) { double sd = 0; var enumerable = values as double[] ?? values.ToArray(); if (enumerable.Count() > 0) { double avg = enumerable.Average(); double sum = enumerable.Sum(d => Math.Pow(d - avg, 2)); sd = Math.Sqrt(sum / (enumerable.Count() - 1)); } return sd; }

Now to convert our normal distribution to a standard normal distribution we need a method to work out the Z value.

public static double Z(double score, double average, double standardDeviation) { if (standardDeviation == 0) return 0; return (score - average) / standardDeviation; }

The actual probability density function, for a standard normal distribution:

public static double StandardNormalPdf(double x) { var exponent = -1 * (0.5 * Math.Pow(x, 2)); var numerator = Math.Pow(Math.E, exponent); var denominator = Math.Sqrt(2 * Math.PI); return numerator / denominator; }

We need a method to work out the definite integral of a unary function between values a and b. To do this we use Simpson’s 3/8 approximation rule (see definition on Wikipedia)

public delegate double Function(double x); public static double Integral(Function f, double a, double b) { double multiplier = (b - a) / 8; double sum = multiplier * (f(a) + (3 * f(((2 * a) + b) / 3)) + (3 * f((a + (2 * b)) / 3)) + f(b)); return sum; }

Finally, a method to calculate the probability of getting a value less than x, given a standard normal distribution. Since we’re dealing with a normal distribution, exactly half of the values fall below the mean. So this method simply takes the integral between a value and the mean and adds 0.5.

public static double ProbabilityLessThanX(double x) { var integral = Integral(StandardNormalPdf, 0, x); return integral + 0.5; }

Ok, now we have all the tools we need to calculate some probabilities! Here’s a step by step example of how to use them:

1. Given a list of values (call this variable `values`), work out the mean and standard deviation using the above methods.

var mean = values.Average(); var sd = ProbabilityUtils.StandardDeviation(values);

2. Say this results in a mean of 4 and a standard deviation of 2, and we want to know what the likelihood is of getting up to a score of 5. We’ll then call the method to work out the z-value:

var z = ProbabilityUtils.Z(5, mean, sd);

3. Now plug this into our probability method to get p, the probability of getting a value less than 5.

p = ProbabilityLessThanX(z)

In this example, this probability is approximately 0.69. In other words, given our assumptions are correct, around 69% of venues will get a score less than 5.

For our search rankings we can use this data however we want, boosting any venues for which p is greater than 0.5, demoting those for which p is less than 0.5. I don’t pretend to know the optimum way to factor this in to search rankings (we’re still playing around with this on hirespace.com) but knowing p is a good first step!

Hello! I’m a C# programmer and I am wondering if there is a way to return the probability that a given array of numbers is normally distributed. I found your blog while looking for an answer. I feel like the tools are staring me in the face, but I just can’t make the connection. Perhaps you could help? Thank you!

Hi Patrick, there are a few ways to judge whether a series is normally distributed. This is a helpful overview: https://statsthewayilikeit.com/about/is-my-data-normally-distributed/

A good formal test is the Shapiro-Wilks test: https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test

The easiest way to test this programmatically would be using a mathematical programming language like R which has these kinds of functions in its core!