gbadev.org forum archive

Hi, I was wondering if anybody could tell me how to find the square root of a number. Please don't suggest the bios call thing because I don't want to use any bios calls (is that the right terminology - "bios calls").

hello world wrote:

Hi, I was wondering if anybody could tell me how to find the square root of a number. Please don't suggest the bios call thing because I don't want to use any bios calls (is that the right terminology - "bios calls").

Here :)
_________________
My homepage!

Far enough have I known people who find their limitations in game development in mathematical methods or modelling. Since my sole function while alive is to develop mathematics, I must enhance this forum with a square root algorithm.
At first, one would say one needs the square root algorithm to detect collision with a circle. To that particular one, I shall lend an invitation to hell. This one must have thought a collision would happen when:

sqrt( x*x + y*y ) <= radius

Nevertheless, squaring both members results in:

x*x + y*y <= radius*radius,

which is far better since you can define radius*radius as a constant and
mainly because IT DOES NOT USE SQRT!!!!!!!!!!!!!!!!!!!!!!!

However, if you really wish to use sqrt, there is a way:

Let us reason a bit. The square root of a non negative fixed point x is so that:

0 <= sqrt(x) <= x

Therefore, we can, for instance, calculate sqrt(2) as follows:

0 <= sqrt(2) <= 2.

Then, we get the mean of 0 and 2, which is one, and assume it is the square root of 2. Then, to check precision, we multily 1*1 anbd conclude it is less than 2. So, it is to be deduced:

1 <= sqrt(2) <= 2

Next: we get the mean between 1 and 2 and assume it to be the root.
1.5*1.5 = 2.25, which is greater than 2. Therefore:

1 <= sqrt(2) <= 1.5

And then it follows....

CODING:

Let us first define the main structure:

Code:

word sqrt(word x) {
word
left, right, mean;

left = 0;
right = x;

do {
mean = (left + right) >> 1;

if ((mean*mean >> 8) < x) // the >> 8 is for fix point
left = mean;
else right = mean;
} while (condition);

return ((left + right) >> 1);
}

Now, you may be wondering what condition is.
There are 2 limitant factors to calculate square:
Factor 0: time.
Factor 1: precision.

Let us deal with precision first. Let us define an acceptable error and wait for it to fit. So, on the header, you shall define 3 things:

Code:

#define ERROR (/* the error you think you wish */)
#define abs(x) (x > 0 ? x : - x) /* absoulte value */
#define condition (abs(error) <= ERROR)

Finally, we must calculate error by rewriting the code above as follows:

Code:

#define ERROR (/* the error you think is acceptable (perhaps 16) */)
#define abs(x) (x > 0 ? x : - x) /* absoulte value */
#define condition (abs(error) <= ERROR)

word sqrt(word x) {
word
left, right, mean, error;

left = 0;
right = x;

do {
mean = (left + right) >> 1;

error = ((mean*mean) >>8) - x;
if (error < 0) left = mean;
else right = mean;
} while (condition);

return ((left + right) >> 1);
}

Another approach is limit the time the processor has to caluculate the root, that is done by setting the number of steps it has to calculate the root, so the code must be rewritten as follows:

Code:

#define condition (--i)

word sqrt(word x) {
word
left, right, mean, i;

i = /* number of steps acceptable (perhaps 256) */

left = 0;
right = x;

do {
mean = (left + right) >> 1;

if ((mean*mean >> 8) < x) // the >> 8 is for fix point
left = mean;
else right = mean;
} while (condition);

return ((left + right) >> 1);
}

Last, but not least, you shall try and assemble this code by hand using an assembler. The code written in assembly is pure, beautiful,
BUT DOES NOT SUBSTITUTE THE VERY OWN ----LOGIC---- IN THE DEVELOPMENT OF THE ALGORITHM.

STUDY MATHEMATICS, DEDICATE YOUR LIFE AND YOUR FAMILY LIFE AND THE LIFE OF EVERY ONE TO MATH TO FIND THE PATH FOR SOMETHING INTERESTING....

Mathematics is perfect.

Thanks, I was wondering how to do that for many a night :-).

http://forum.gbadev.org/viewtopic.php?t=7129&highlight=sqrt

the topic that wouldn't die...

...edit: oops, Legolas beat me to it.
_________________
"Artificial Intelligence is no match for natural stupidity."

Shultz, what you are describing is a binary search for the square root. The time for this is a log2 (b), where a = time for one iteration of loop, and b = input number. The iteration for one loop will be [about] 9 or so cycles. I think the other method may actually be faster.

Also with your binary search you can greatly reduce the search space by storing the minimum and maximum numbers based on the MSB. I think you can get the MSB in about 16 cycles using 2 step binary search and an 8-bit lookup. Since the range is greatly reduced this may speed up the binary search method. Also note that your MSB search will automatically return the range.

Doing this your maximum range is 46341 - 32768 (13573). Also your maximum number in a 32 bit environment is 46341, so you can speed up a binary search greatly by having that as your maximum value.

However you would still have a worst case scenario of 14 steps. With each step being about 9 or so cycles, this is quite slow compared dijkstra's method - which is 51 cycles (man he's everywhere)

Keldon, I must thank you for the description posted. Nevertheless two things must be analysed.
First, my name is not Shultz, it is Schultz.
Second, I have not described a computational method for square rooting but a mathematical method for so, therefore implementation is not my function actually.
However I must congratulate you for worring so much with the speed and I shall also consider you are right to say the complexity of my algorithm is O(log n).
I did not quite underestant the procedure you described, but what I can say is that the code I posted is actually not real C code but a pseudo-code. Therefore, the main purpose of my algorithm is to reach the smallest complexity, since a methematician I am, and not to reduce the cycles the algorithm shall contain. This should be done by programmers.
In short, I can think or remind of no algorithms to calculte a square root whose COMPLEXITY is LESS THAN LOG N . Then, my algorithm is as nice as yours. What is different is the implementation. Yours is better, but it is based on "BRUTE FORCE" optimization and not on logical optimization.
I thank your effort and your consideration into posting your method reinforcing that COMPLEXITY IS ALL THAT MATTERS for mathematics.

Thank you.

Computational complexity in practice is more complicated than a Big O indicates. Big O indicates a limit as the data size increases without bound. But in practice, data sizes used in video game programming have definite bounds, and an algorithm that runs slowly on large data sets may run faster than another algorithm on smaller data sets. For instance, a given array may never be larger than 128 elements, or a given integer may never be larger than 32 bits.

Some algorithms for sorting an array are O(n^2); others are O(n*log(n)). However, the constant factor represented by O may differ between the algorithms. For some array lengths, an O(n^2) algorithm may run faster than the O(n*log(n)) algorithm because the constant factor is so much smaller.

Newton's method for square roots uses the iteration x[n+1] = (x[n] + xsquared / x[n]) / 2, which converges to n bits of precision after O(log n) divisions. However, on some machines (such as Game Boy Advance), division is very slow. In these cases, computing the square root of a 32-bit number using Wilco Dijkstra's O(n) algorithm finishes in fewer cycles than doing so using Newton's method because of the dramatically smaller constant factor.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Schultz wrote:

In short, I can think or remind of no algorithms to calculte a square root whose COMPLEXITY is LESS THAN LOG N . Then, my algorithm is as nice as yours. What is different is the implementation. Yours is better, but it is based on "BRUTE FORCE" optimization and not on logical optimization.

A lookup table is O(1). For many coding applications with either a limited domain of input or the ability to interpolate between entries (also O(1)) this is more than sufficient.

Dan.

Schultz, I'm glad you were wise enough to add the caveat that complexity is all that matters for mathematics. For game development, there are many other concerns.

sajiimori wrote:

Schultz, I'm glad you were wise enough to add the caveat that complexity is all that matters for mathematics. For game development, there are many other concerns.

or any other developer for that matter.

as tepples point out, the O-notation only tells how an algorithm scales. for pure theory-purposes, this is all fine. however, i (as well as 99% of the other people here on this forum) intend to actually do something with this. so it all really comes down to clock-cycles.

Skultz0r: no need for all those capital letters though. they don't make you seem smart.

I have got to be quick. Nevertheless, I have got to say two things.
First, if I wanted a O(1) function as I do with the trigonometrics one consulting tables, it would not make sense developing a function to root a number, which is the purpose of the forum.
Secons, cycles do matter. However I think a O(log n) function starts being faster than a O(n) function soon enough. I must also say that I have written pseudo-code, not assembly code, so I have not yet implemented optimizations.
I have described only a method that grows better than any other when the input raises. I think my method is nice enough since if perfect precision was required, it would do much better. If you still think cycles is important, I say that more important than a fast game is a mathematically beautiful and harmonic game.
Mathematics is perfect.

Schultz wrote:

However I think a O(log n) function starts being faster than a O(n) function soon enough. I must also say that I have written pseudo-code, not assembly code, so I have not yet implemented optimizations.

I'll wait until you implement the function in at least C to see whether "soon enough" is likely to happen within the data types commonly used in real-time video games on handheld systems.

Quote:

I have described only a method that grows better than any other when the input raises. I think my method is nice enough since if perfect precision was required, it would do much better. If you still think cycles is important, I say that more important than a fast game is a mathematically beautiful and harmonic game.

Turn-based games may be "mathematically beautiful and harmonic" but not everybody prefers them.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

can't... resist... posting...

Quote:

I say that more important than a fast game is a mathematically beautiful and harmonic game.

I've been reading some of these posts, and I can't figure out if Schultz is serious, or just having a piss-take on all of us. I keep looking around for the hidden camera and Ashton Kutcher jumping out to say, "You just got punked!"

So, I don't often do this, but I'm going to figuratively "whip it out and lay it on the table".

When I see a programming lead or technical director start getting obsessive and insistent about mathematical rigor, with little regard for fun and performance, I run screaming in the opposite direction.

In my 20+ years of professional game development experience (there it is... "whump"), I have witnessed abject failure as the consequence every single time that this paradigm is blindly embraced.

Usually, it is someone who is either fresh out of university or has not fully severed their ties to academia and the "academic" way of looking at problems.

The usage of the word "academic", as a pejorative to describe a purely theoretical and impractical approach, is in many cases quite accurate, especially in the world of game coding, where cheats and tricks are so often critical to the process.

I'd like to mention Cearn's tonc tutorials as striking an excellent balance between rigor in development style and applicability in the real world. That's the way it's done!
_________________
"Artificial Intelligence is no match for natural stupidity."

SevenString wrote:

can't... resist... posting...

Quote:

I say that more important than a fast game is a mathematically beautiful and harmonic game.

I've been reading some of these posts, and I can't figure out if Schultz is serious, or just having a piss-take on all of us. I keep looking around for the hidden camera and Ashton Kutcher jumping out to say, "You just got punked!"

So, I don't often do this, but I'm going to figuratively "whip it out and lay it on the table".

When I see a programming lead or technical director start getting obsessive and insistent about mathematical rigor, with little regard for fun and performance, I run screaming in the opposite direction.

In my 20+ years of professional game development experience (there it is... "whump"), I have witnessed abject failure as the consequence every single time that this paradigm is blindly embraced.

Usually, it is someone who is either fresh out of university or has not fully severed their ties to academia and the "academic" way of looking at problems.

The usage of the word "academic", as a pejorative to describe a purely theoretical and impractical approach, is in many cases quite accurate, especially in the world of game coding, where cheats and tricks are so often critical to the process.

I'd like to mention Cearn's tonc tutorials as striking an excellent balance between rigor in development style and applicability in the real world. That's the way it's done!

This is all very good news for me, because I'm much better at implementation on practical level than theorising. Not that I can't work stuff out on paper first if I have to (in fact, I do it a fair amount), but maybe my way of thinking just favours the "get in there and do it" approach.

Sincerely, SevenString last comment made me wonder if you are thinking I am a fresh out of university mathematician. Well, I am not.
I am new at university (my second month) and I think my method is once again nice enough for two details:
0. The word for the ARM processor the GBA has is 32 bit. Just try to take a square root by and O(n) number for very high indeed numbers such as 2 ^ 30. THe GBA would stop.
1. If the entry is small, as you have argued, I had better use tables. Sincerely, I would not argue that a O(n) can be better most of the times than a O(log n) because I would be wrong.
Futhermore, as the video games get better, the complexity overrides the cycles. (Of course both of them would be nice).
Finally, do not ask me to implement a thing because I am overloaded with stuff do do for mathematics.
Mathematics is perfect.

In theory O(1) is faster than O(n). But in practice O(1) can be 1000 cycles, and O(n) can be 100 cycles, if you catch my drift.

O(1) cannot be slower than O(n). The contrary is absurdum.

Look at the code:

Code:

ldr r0, [r1, r2, lsr p];

r0 is the return value for sqrt.
r1 is the table address.
r2 is the parameter (x).
p is a constant meaning the lack of precision for using a table.

That is O(1) literally. ONE INSTRUCTION. Can another method be faster than that?

Also, MOST of the numbers in a 32 bit range makes a O(ln n) better than a O(n). That is far enough for you to use my algorithm.

There is, also, a solution. Let us make a comparison to determine which of them is better (x is in which range) before using our algorithms. That is, we shall have both of them in IWRAM and, for each parameter, we shall check which is better by a range comparsion (also O(1)).

There is always a solution, since:
MATHEMATICS IS PERFECT

Schultz, the only thing that's absurd is that anyone here bothers to take you seriously.

O(1) cannot be slower than O(n) for all values of n, but it can obviously be slower for small values. For all the big O says, "small" could be a hundred billion.

It's absolutely laughable that you present yourself as an authority in mathematics when you don't even understand what O(1) means. Hint: it has nothing to do with the number of instructions.

Quote:

Finally, do not ask me to implement a thing because I am overloaded with stuff do do for mathematics.

Hahahahahaha! Now that's funny!
_________________
"Artificial Intelligence is no match for natural stupidity."

Schultz wrote:

That is O(1) literally. ONE INSTRUCTION. Can another method be faster than that?

Yes, a lookup can be slower than a computation if reading from a large lookup table causes a cache miss, requiring the processor to wait until data comes back from main RAM or, worse yet, disk or CF. If more computation helps the program avoid thrashing, then more computation is a good thing.

Quote:

There is, also, a solution. Let us make a comparison to determine which of them is better (x is in which range) before using our algorithms.

Now you're catching on.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

sajiimori wrote:

O(1) cannot be slower than O(n) for all values of n, but it can obviously be slower for small values. For all the big O says, "small" could be a hundred billion.

What I meant is that the complexity of the algorithm does not always tell you the exact speed of it. For example if multiplies were 1 cycle then the O(log n) algorithm would beat dijkstras algorithm which is I believe is O(n) (without checking it up), simply because one iteration of dijkstras algo is 'x' cycles, whereas the binary search would be 'y' (I'm not really checking out the specifics here as they would not hold much relevance anyway). I'm just showing an example where practice could go against theory.

Right, keldon, I was agreeing with you. :)

keldon wrote:

I'm just showing an example where practice could go against theory.

It's not so much that practice goes against theory. It's just that complexity and timing, while related to each other, are not the same thing.

Dan.

I find it somewhat interesting (amusing?) how people jumped on that big Oh discussion. Considering that every possible input is constant (or bound by a constant) in size anyway every algorithm will be in O(1).
Isnt it strange that people prefer quicksort, O(n^2), to MergeSort, O(nlogn)?

I propose another way to calculate the distance of two points in R^2 - that is where the sqrt is probably used the most (not?).

take the absolute value of both components (of the distance vector) and take the bigger value.
this approx could be done probably in like 8 cycles and and the biggest error you can get (when both components have the same abs) is around 30 % (below actual value). If you want it more accurate, you could add half of the smaller component... So yeah, it depends on what is desired to be achieved.
:o

PS: Mr math student; take some interesting algorithms course, and a numerical methods course, and stop smelling your ow... i mean, have fun in college, its good time ;-)

Ant6n wrote:

I propose another way to calculate the distance of two points in R^2 - that is where the sqrt is probably used the most (not?).

take the absolute value of both components (of the distance vector) and take the bigger value.
this approx could be done probably in like 8 cycles and and the biggest error you can get (when both components have the same abs) is around 30 % (below actual value). If you want it more accurate, you could add half of the smaller component... So yeah, it depends on what is desired to be achieved.
:o

Actually, there is a formula/function/"gem" in this thread that gives you the distance between two points with an expected error of around 7%.

It requires only one "if"-check and a couple multiplies/adds.

Dan.

o yeah, that works too - its basicly the same except that it uses a better constant for the multpilier 5/16, instead of 1/2 - of course then there is a mul which makes it slower - so again, depends on what is needed

somebody should stick up all those neat math thingies

Ant6n wrote:

its basicly the same except that it uses a better constant for the multpilier 5/16, instead of 1/2 - of course then there is a mul which makes it slower

MUL is not slow on ARM7TDMI by any means. In fact, because MUL is one instruction, an algorithm that uses MUL wisely might waste fewer cycles reading instructions from ROM, especially if ROM instruction prefetch is turned on so that the memory controller can spit out instructions rapid-fire after MUL finishes its "internal operation".
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Also, that particular formula in the example I gave was expressed as a sum of shifts, such that the entire formula (x and y) can be performed in three arithmetic instructions (one MOV and two ADDs).

Hard to beat. (Even MLA would require two instructions to set up the terms.)

Dan.

gbadev.org forum archive

Coding > How to square root?

#80652 - hello world - Mon Apr 24, 2006 3:50 pm

#80657 - Legolas - Mon Apr 24, 2006 4:23 pm

#80772 - Schultz - Tue Apr 25, 2006 1:33 pm

#80781 - hello world - Tue Apr 25, 2006 4:04 pm

#80788 - SevenString - Tue Apr 25, 2006 4:50 pm

#80888 - keldon - Wed Apr 26, 2006 12:39 am

#80958 - keldon - Wed Apr 26, 2006 10:13 am

#81047 - Schultz - Wed Apr 26, 2006 10:10 pm

#81063 - tepples - Wed Apr 26, 2006 11:21 pm

#81065 - poslundc - Wed Apr 26, 2006 11:37 pm

#81113 - sajiimori - Thu Apr 27, 2006 1:56 am

#81125 - kusma - Thu Apr 27, 2006 2:28 am

#81162 - Schultz - Thu Apr 27, 2006 2:13 pm

#81165 - tepples - Thu Apr 27, 2006 3:17 pm

#81174 - SevenString - Thu Apr 27, 2006 5:14 pm

#81175 - Ultima2876 - Thu Apr 27, 2006 5:34 pm

#81201 - Schultz - Thu Apr 27, 2006 9:18 pm

#81206 - keldon - Thu Apr 27, 2006 9:34 pm

#81208 - Schultz - Thu Apr 27, 2006 9:42 pm

#81213 - sajiimori - Thu Apr 27, 2006 10:03 pm

#81215 - SevenString - Thu Apr 27, 2006 10:13 pm

#81216 - tepples - Thu Apr 27, 2006 10:16 pm

#81222 - keldon - Thu Apr 27, 2006 10:34 pm

#81229 - sajiimori - Thu Apr 27, 2006 10:49 pm

#81230 - poslundc - Thu Apr 27, 2006 10:50 pm

#84376 - Ant6n - Mon May 22, 2006 6:51 am

#84449 - poslundc - Mon May 22, 2006 5:21 pm

#84477 - Ant6n - Mon May 22, 2006 7:51 pm

#84483 - tepples - Mon May 22, 2006 9:42 pm

#84508 - poslundc - Tue May 23, 2006 12:24 am