#6876 - Alunze - Wed Jun 04, 2003 1:56 am
I guess this has been asked to death, but I still have to do it, what would be a fast algorithm for calculating the distance between two 2D points?
Thanks in *advance* >_6
#6883 - jd - Wed Jun 04, 2003 5:20 am
This is the only algo I know of:
Code: |
dx = x1 - x2;
dy = y1 - y2;
dist = sqrt((dx*dx) + (dy*dy));
|
The built-in square root function is very slow, but there are faster ones available such as this one which takes 51 cycles - 11 times faster than the standard code:
http://www.gbadev.org/download.php?section=demos&filename=isqr.zip
Alternatively, you might find that finding the square of the distance is acceptable for your purposes - in which case you can remove the square root altogether.
#6906 - peteh - Wed Jun 04, 2003 1:30 pm
There's currently a nice ongoing thread at gamedev covering this topic :
http://www.gamedev.net/community/forums/topic.asp?topic_id=160085
Basically distilled down to a decent approximation algorithm which is within 7% of the real answer :
#define FIVE_OVER_SIXTEEN 0.3125
int finddist(int x,int y)
{
return ( x < y ? ( x + y * FIVE_OVER_SIXTEEN ) : ( y + x * FIVE_OVER_SIXTEEN ) );
}
As well as a discussion on the Taylor Series.
Hope that helps
#6915 - Jason Wilkins - Wed Jun 04, 2003 2:49 pm
Try using fsqrt instead of sqrt, it will use single precision instead of double and thus be a bit faster. It would be best to find an integer sqrt function however.
In the same line, it would be better for FIVE_OVER_SIXTEEN to be 0.3125f instead of 0.3125 because that will not force the multiplication to be double precision.
In fact, if we want it to be integer, it would almost certainly be faster to change the code to this:
Code: |
return (x < y) ? (x + ((y * 5) >> 4)) : (y + ((x * 5) >> 4));
|
Actually, I am amazed that anyone would propose the code above to use doubles, even on a Pentium processor. On the ARM processor (y * 5) >> 4 can become a single instruction.
ADDENDUM:
I wrote the following program to test out this approximation. I originally wrote it to test all dx and dy between 0 and 300 because I wanted to see how good it would be for testing the distance between sprites in pixels. The result was that sometimes the percentage error was much greater than 7 percent. The version below multiplies 300 by 1000 in order to prevent lose of precision. Because it never tests values less than 1000 (except 0), the max error is indeed only around 7 percent, and that is when dx == dy.
If you graph out the approximation and the actual distance equation, then you will see that they are very similar until dx ~= dy, then the error gets bigger and bigger. With a lot of precision, the error never gets more than 7.19 percent.
I would recommend that if you want to use this to find distances that are less than 1000, such as pixel distances, then you should use fixed point numbers.
Code: |
#include <math.h>
#include <stdio.h>
int idist(int dx, int dy)
{
return (dy < dx) ? (dx + ((dy * 5) >> 4)) : (dy + ((dx * 5) >> 4));
}
float fdist (float dx, float dy)
{
return (float)sqrt((dx*dx) + (dy*dy));
}
#define ABS(x) ((x > 0) ? (x) : -(x))
float percent_error(float actual, float approx)
{
return (actual != 0) ? ABS((approx / actual) - 1) * 100.0f : -1;
}
int main(void)
{
int dx, dy;
int id;
float fd;
float absolute_error;
float error;
float max_error = 0;
float abs_for_perc = 0;
int dx_for_abs;
int dy_for_abs;
float max_absolute_error = 0;
float perc_for_abs = 0;
int dx_for_perc;
int dy_for_perc;
for (dx = 0; dx < 300000; dx += 10000) {
for (dy = 0; dy < 300000; dy += 10000) {
id = idist(dx, dy);
fd = fdist(dx, dy);
absolute_error = ABS(((float)id) - fd);
error = percent_error(fd, id);
if (error > max_error) {
max_error = error;
abs_for_perc = absolute_error;
dx_for_perc = dx;
dy_for_perc = dy;
}
if (absolute_error > max_absolute_error) {
max_absolute_error = absolute_error;
perc_for_abs = error;
dx_for_abs = dx;
dy_for_abs = dy;
}
printf("[%d, %d]\t idist = %d\t fdist = %.2f\t absolute_error = %.2f\t error = %%%.2f\n", dx, dy, id, fd, absolute_error, error);
}
}
printf("max absolute error = %f (%%%f) [%d, %d]\n", max_absolute_error, perc_for_abs, dx_for_abs, dy_for_abs);
printf("max error = %%%f (%f) [%d, %d]\n", max_error, abs_for_perc, dx_for_perc, dy_for_perc);
return 0;
}
|
_________________
http://devkitadv.sourceforge.net
#26774 - Mephisto - Sat Sep 25, 2004 8:29 pm
Quote: |
#define FIVE_OVER_SIXTEEN 0.3125
int finddist(int x,int y)
{
return ( x < y ? ( x + y * FIVE_OVER_SIXTEEN ) : ( y + x * FIVE_OVER_SIXTEEN ) );
} |
wtf are x and y ?
#26775 - tepples - Sat Sep 25, 2004 8:58 pm
the horizontal and vertical components of distance
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#26776 - DiscoStew - Sat Sep 25, 2004 8:58 pm
Mephisto wrote: |
wtf are x and y ? |
Probably a point extending from the origin.
A line from (2, 3)-(4, 8) would translate into (0, 0)-(2, 5).
Same distance.
_________________
DS - It's all about DiscoStew
#26777 - poslundc - Sat Sep 25, 2004 9:32 pm
Ugh... using floating-point totally defeats the purpose of the distance-approximation algorithm. Here's the version I use:
Code: |
// distApprox adds 8 to the precision of the result.
// Therefore two integers will generate a 24.8 result.
// Expected error around 7%.
#define DA_FOS(n) ((n << 6) + (n << 4))
inline s32 distApprox(s32 x, s32 y)
{
if (x < y)
return ((x << 8) + DA_FOS(y));
else
return ((y << 8) + DA_FOS(x));
} |
It is worth mentioning that if you don't need to find the actual distance, but just compare two distances to see which is bigger, you can just compare a1*a1 + b1*b1 to a2*a2 + b2*b2.
Dan.
#26778 - poslundc - Sat Sep 25, 2004 9:34 pm
Jason Wilkins wrote: |
On the ARM processor (y * 5) >> 4 can become a single instruction. |
How d'you figure?
Dan.
#26779 - DekuTree64 - Sat Sep 25, 2004 9:59 pm
poslundc wrote: |
Jason Wilkins wrote: | On the ARM processor (y * 5) >> 4 can become a single instruction. |
How d'you figure?
Dan. |
add reg, reg, reg, asr #2 would do the trick. Personally I haven't done anything involving lots of distance checks, but this one looks pretty nice. And if the accuracy drop when the values get close together is too much, you could probably get away with checking for it and using a more accurate algorithm if needed, since it's so fast.
EDIT: Silly me, putting an lsr on something that will obviously be negative half the time...
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#26780 - poslundc - Sun Sep 26, 2004 12:04 am
DekuTree64 wrote: |
poslundc wrote: | Jason Wilkins wrote: | On the ARM processor (y * 5) >> 4 can become a single instruction. |
How d'you figure?
Dan. |
add reg, reg, reg, asr #2 would do the trick. |
...
Uh, it don't add up...
Let y = 32
Then (y * 5) >> 4 == 10
add y, y, y, asr #2 == 40
I'm interested in knowing if there's a clever mathematical way of doing this, but as far as I can tell there's no way to do it in one instruction.
Dan.
#26781 - ampz - Sun Sep 26, 2004 12:48 am
How about a simple lookup table?
The GBA screen is not incredible high-resolution, so the number "X*X+Y*Y" can never be larger than 83200 for on-screen objects.
Each table entry needs to be 9bit to represent the longest possible on-screen distance (288), this means 16bit table entrys. If you don't want to waste 80kB extra ROM, just rightshift every table entry one step. You vill only loose 1 bit precision. Or simply round the largest entrys down to 255. Results in poor precision only for the most distant objects, and in most games, we only care about the objects nearby anyway, and we rarely need to calculate the distance between objects in oposite corners of the screen.
So, Two 8bit multiplications, one addition and one ROM table lookup is all it takes for a perfect precision distance algorithm. Should be about 10 cycles or so?
Of course, if you need to be able to calculate the distance between various off-screen objects, then a lookup table is not a very realistic solution.
#26782 - Krakken - Sun Sep 26, 2004 1:05 am
That would be ~83kb of memory. I expect that would only be useful if you don't care about your final ROM size or you're going to be using it constantly and you desperately need the speed.
#26784 - ecurtz - Sun Sep 26, 2004 2:28 am
Here's a good approximate function (with some helpful explanation) that I've got bookmarked.
flipcode distance article
#26788 - DekuTree64 - Sun Sep 26, 2004 3:12 am
poslundc wrote: |
DekuTree64 wrote: | poslundc wrote: | Jason Wilkins wrote: | On the ARM processor (y * 5) >> 4 can become a single instruction. |
How d'you figure?
Dan. |
add reg, reg, reg, asr #2 would do the trick. |
...
Uh, it don't add up...
Let y = 32
Then (y * 5) >> 4 == 10
add y, y, y, asr #2 == 40
I'm interested in knowing if there's a clever mathematical way of doing this, but as far as I can tell there's no way to do it in one instruction.
Dan. |
Indeed you're right. I was looking at is as (y * 5) / 4, not >> 4. Yeah, that will take 2 instructions, but the second can be absorbed into the adding of x:
Code: |
add ry, ry, ry, lsl #2 // y * 5
add rDest, rx, ry, asr #4 // x + (y * 5 >> 4) |
As for using lookup tables, eeh, I don't like it. Maybe for something like cloth simulation where you have to do crazy numbers of square roots between all the vertices, but they're all nice and close together, but for a real game, I doubt you'd ever have enough sprites on-screen for even a regular square root to kill you.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#26801 - ampz - Sun Sep 26, 2004 1:23 pm
Krakken wrote: |
That would be ~83kb of memory. I expect that would only be useful if you don't care about your final ROM size or you're going to be using it constantly and you desperately need the speed. |
Well, is that not what this thread is all about?
jd posted a link to a 51 cycle square root function, if that is not fast enough, then a approximation or a lookup table are the only options.
83kByte is not too high a price if you need both speed and accuracy.
DekuTree64 wrote: |
As for using lookup tables, eeh, I don't like it. Maybe for something like cloth simulation where you have to do crazy numbers of square roots between all the vertices, but they're all nice and close together, but for a real game, I doubt you'd ever have enough sprites on-screen for even a regular square root to kill you. |
Then what is this thread all about? The 51 cycle square root function should be plenty fast if you don't need to do too many square roots per frame.
I just realized that a square root lookup table should of course be 2-dimensional.
char sqrt[240][160];
This is only 38400 Bytes and takes only a few cycles.
#26846 - FluBBa - Mon Sep 27, 2004 11:33 am
Not to be picky about it but the longest distance between both corners is 288 "pixels".
I don't really see the problem of a 76800 bytes big table if you are after speed though.
_________________
I probably suck, my not is a programmer.
#26851 - poslundc - Mon Sep 27, 2004 2:19 pm
FluBBa wrote: |
I don't really see the problem of a 76800 bytes big table if you are after speed though. |
For the record, it's not going to be THAT speedy, unless you let it consume a huge chunk of your EWRAM. If you keep it in ROM, you're going to have huge access time delays, especially if your code is running from ROM as well. I'd take the fast distance function I provided earlier in the thread: you can put it in IWRAM in ARM mode if you want and optimize it using conditional execution and probably end up with comparable performance, without the restrictions of the table size.
Dan.
#26862 - ampz - Mon Sep 27, 2004 10:11 pm
FluBBa wrote: |
Not to be picky about it but the longest distance between both corners is 288 "pixels".
I don't really see the problem of a 76800 bytes big table if you are after speed though. |
I know. But there are ways around that.
If a +-1 error is acceptable, or you don't need to know the exact distance between objects in opposite corners, then you can get away with only 8bit table entrys.
You could also use a simple trick like: distance=x+sqrt_table[x][y]. This will keep the table entrys down to 8 bit, give you perfect precision, and only cost you a single extra instruction.
poslundc: A ROM table lookup should be 6 cycles if the code is executed from IWRAM. Add to this the X+Y addition and we get 7 cycles. You save another cycle if you can afford to put the table in EWRAM.
A 6-7cycle precision sqrt function is as fast as it gets.