# pouët.net

Go to bottom

## A math problem (I'm puzzled with this)

category: general [glöplog]

Suppose we have an array of floats. I want one float m that minimize the sum of absolute differences with the array. In code:

float array[NUMBER_OF_ELEMENTS];
float sum=0.0f;

for (i=0; i<NUMBER_OF_ELEMENTS; i++)
{
sum+=abs(array[i]-m);
}

So, how to calc the float m to make sum the minimum as possible?

I'm puzzled because I always intuitively thought m should be the average of the array... but it is not. One counter example is the array {100, 0, 0, 0}. In this case, the best m is 0, and not 25 that would be the average.

So... how to calc m? I've tried to search in statistics theory websites, but maybe because of my lack of knowlegde of the terms I've not been able to find the solution for the problem. This is why I'm asking here.

At first sight, it doesn't even look that there is a first-derivate-continuous function for this...

Then, what about the same, but instead the sum of absolute differences, with the sum of the square differences?

Because abs(x) is not diffentiable, problems like this are most times expressed as:

for (i=0; i<NUMBER_OF_ELEMENTS; i++)
{
sum+=(array[i]-m)* (array[i]-m);
}

This is called the "The method of least squares".

http://en.wikipedia.org/wiki/Least_squares
TGGC, he doesn't need least sum of square distances, he needs least sum of absolute differences.
absolute difference = distance. Isn't there a sqrt missing somewhere?
Anyway, a quick search reveals this:
http://en.wikipedia.org/wiki/Least_absolute_deviations
More than this thread could offer, I guess. Should have tried google first.
bruteforce is the solution
imbusy: See last sentence of original question.

rmeth: At least do it iterative.
From wikipedia:
The median is also the central point which minimizes the average of the absolute deviations
For squared deviations I'm positive that the mean is optimal, I didn't find a reference, though.
TGGC:

Yes, I know, but I want the m1 that minimizes the least squares, and by other hand, the m2 that minimizes the least absolutes.
So, from http://en.wikipedia.org/wiki/Least_absolute_deviations:

Quote:
Though the idea of least absolute deviations regression is just as straightforward as that of least squares regression, the least absolute deviations line is not as simple to compute efficiently. Unlike least squares regression, least absolute deviations regression does not have an analytical solving method. Therefore, an iterative approach is required.
Quote:
From wikipedia:
The median is also the central point which minimizes the average of the absolute deviations

Thanks
There may be more than one solution (median) to your problem. With {0;1} all m in [0;1] will be ok. I think that's why we use least squares instead of least absolute deviation in general.
Quote:
Pouet is also the central point which maximizes the average of the absolute deviations
Quote:
There may be more than one solution (median) to your problem. With {0;1} all m in [0;1] will be ok. I think that's why we use least squares instead of least absolute deviation in general.

The reason that everyone loves least squares is that it boils down to solving a linear system of equations. Also, the 2-norm is invariant under orthogonal transformations (rotations and reflections).

In short, it's enormously convenient. Penalizing larger deviations more makes sense sometimes, yes, but people use least squares even when the actual minimization problem is a 1-norm (sum of absolute differences) or infinity-norm (minimize maximum difference) problem, because it's so much simpler to deal with.
Quote:
One counter example is the array {100, 0, 0, 0}. In this case, the best m is 0, and not 25 that would be the average.

Hmm.. (100-25) + (0-25) + (0-25) + (0-25) = 0. Well yeah my few cents into the conversation.
you're missing some abses there harism
haha. how do i remove my comment :D
at pouet nobody deletes ANYTHING, it'll haunt you forever
You can see the problem recursively.
1) Any value between min and max won't affect abs(min-v)+abs(max-v)=v-min+max-v=max-min.
2) So we can safely drop the 2 values from the array.
3) goto 1
the loop stops when there are 2 or 1 items in the array.
4) if 2 items left then any value between these 2 is good
5) if 1 item left then it is the only possible value

So if original array is sorted, the solution is to take the item in the middle (index rounded to upper or lower).

Hope this helps
Quote:
haha. how do i remove my comment :D
I think it might take a roboknight to defeat a boboknight. And I think your comment is etched in stone here on pouet sadly.