A math problem (I'm puzzled with this)

category: general [glöplog]

Suppose we have an array of floats. I want one float m that minimize the sum of absolute differences with the array. In code:

float array[NUMBER_OF_ELEMENTS];
float sum=0.0f;

for (i=0; i<NUMBER_OF_ELEMENTS; i++)
{
sum+=abs(array[i]-m);
}

So, how to calc the float m to make sum the minimum as possible?

I'm puzzled because I always intuitively thought m should be the average of the array... but it is not. One counter example is the array {100, 0, 0, 0}. In this case, the best m is 0, and not 25 that would be the average.

So... how to calc m? I've tried to search in statistics theory websites, but maybe because of my lack of knowlegde of the terms I've not been able to find the solution for the problem. This is why I'm asking here.

At first sight, it doesn't even look that there is a first-derivate-continuous function for this...

Then, what about the same, but instead the sum of absolute differences, with the sum of the square differences?

Thanks in advance

added on the 2009-09-26 17:43:42 by texel

Because abs(x) is not diffentiable, problems like this are most times expressed as:

for (i=0; i<NUMBER_OF_ELEMENTS; i++)
{
sum+=(array[i]-m)* (array[i]-m);
}

This is called the "The method of least squares".

http://en.wikipedia.org/wiki/Least_squares

added on the 2009-09-26 17:52:25 by TGGC

pwned

added on the 2009-09-26 17:59:41 by ferris

TGGC, he doesn't need least sum of square distances, he needs least sum of absolute differences.

added on the 2009-09-26 18:10:10 by imbusy

absolute difference = distance. Isn't there a sqrt missing somewhere?

added on the 2009-09-26 18:13:32 by xTr1m

Anyway, a quick search reveals this:
http://en.wikipedia.org/wiki/Least_absolute_deviations
More than this thread could offer, I guess. Should have tried google first.

added on the 2009-09-26 18:16:58 by imbusy

bruteforce is the solution

added on the 2009-09-26 18:19:18 by rmeht

imbusy: See last sentence of original question.

rmeth: At least do it iterative.

added on the 2009-09-26 18:30:17 by TGGC

From wikipedia:
The median is also the central point which minimizes the average of the absolute deviations
For squared deviations I'm positive that the mean is optimal, I didn't find a reference, though.

added on the 2009-09-26 18:37:47 by chock

TGGC:

Yes, I know, but I want the m1 that minimizes the least squares, and by other hand, the m2 that minimizes the least absolutes.

added on the 2009-09-26 19:14:14 by texel

So, from http://en.wikipedia.org/wiki/Least_absolute_deviations:

Quote:

Though the idea of least absolute deviations regression is just as straightforward as that of least squares regression, the least absolute deviations line is not as simple to compute efficiently. Unlike least squares regression, least absolute deviations regression does not have an analytical solving method. Therefore, an iterative approach is required.

added on the 2009-09-26 19:15:33 by texel

Quote:

From wikipedia:
The median is also the central point which minimizes the average of the absolute deviations

Thanks

added on the 2009-09-26 19:17:37 by texel

There may be more than one solution (median) to your problem. With {0;1} all m in [0;1] will be ok. I think that's why we use least squares instead of least absolute deviation in general.

added on the 2009-09-28 10:44:43 by baah

Quote:

Pouet is also the central point which maximizes the average of the absolute deviations

added on the 2009-09-28 10:51:39 by gloom

added on the 2009-09-28 11:25:39 by Wade

Quote:

There may be more than one solution (median) to your problem. With {0;1} all m in [0;1] will be ok. I think that's why we use least squares instead of least absolute deviation in general.

The reason that everyone loves least squares is that it boils down to solving a linear system of equations. Also, the 2-norm is invariant under orthogonal transformations (rotations and reflections).

In short, it's enormously convenient. Penalizing larger deviations more makes sense sometimes, yes, but people use least squares even when the actual minimization problem is a 1-norm (sum of absolute differences) or infinity-norm (minimize maximum difference) problem, because it's so much simpler to deal with.

added on the 2009-09-28 11:39:55 by ryg

Quote:

One counter example is the array {100, 0, 0, 0}. In this case, the best m is 0, and not 25 that would be the average.

Hmm.. (100-25) + (0-25) + (0-25) + (0-25) = 0. Well yeah my few cents into the conversation.

added on the 2009-09-28 12:04:27 by harism

you're missing some abses there harism

added on the 2009-09-28 12:23:16 by jix

haha. how do i remove my comment :D

added on the 2009-09-28 12:42:23 by harism

at pouet nobody deletes ANYTHING, it'll haunt you forever

added on the 2009-09-28 17:28:04 by superplek

You can see the problem recursively.
1) Any value between min and max won't affect abs(min-v)+abs(max-v)=v-min+max-v=max-min.
2) So we can safely drop the 2 values from the array.
3) goto 1
the loop stops when there are 2 or 1 items in the array.
4) if 2 items left then any value between these 2 is good
5) if 1 item left then it is the only possible value

So if original array is sorted, the solution is to take the item in the middle (index rounded to upper or lower).

Hope this helps

added on the 2009-09-28 21:05:27 by riben

Quote:

haha. how do i remove my comment :D

I think it might take a roboknight to defeat a boboknight. And I think your comment is etched in stone here on pouet sadly.

added on the 2009-10-06 18:53:14 by bob^da almighty

pouët.net

A math problem (I'm puzzled with this)

login