gbadev.org forum archive

How could I work with 64 bit numbers? I don't understand how they work, why do I need an orr after multiplication for example? Some explanation about the system would be great :)

Since the ARM7 is 32 bit, 64 bit values have to be simulated. A simple way of doing that would be to stick two 32 bit values together. Perhaps you'd make a structure such as this:

Code:

typedef struct
{
u32 hi;
u32 lo;
} int64;

Then, since you couldn't use the stock C operators (+ - * / etc) on int64's, you would probably want to implement those operations yourself.

But what if you implemented addition this way:

Code:

int64 add64(int64 left, int64 right)
{
int64 tmp;
tmp.lo = left.lo + right.lo;
tmp.hi = left.hi + right.hi;
return tmp;
}

That is clearly not correct. The addition of the low parts could have easily overflowed, requiring a carry to the high part (just like doing math by hand).

You could figure out the correct math for each operator on your own (good if you want to know how the math works), or find some code and copy it.

Some more info on the underlying mechanisms of binary math:
http://computer.howstuffworks.com/boolean.htm

GNU C has a built-in 64-bit integer type: long long.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Generally you only need 64-bit numbers for multiplication/division with 16-bit fixed point. The orr is to combine part of the lower 32 bits with part of the upper 32. So say you multiply 0x20000 by 0x30000 (2 << 16 * 3 << 16), you get 0x600000000, which overflows 32 bits. What you want is 6 << 16, so you want to shift the whole 64-bit number to the right 16 bits, but since you have to do it one reg at a time, you shift the lower 16 bits off of the first register (in this example those 8 0's after the 6 would be the lower register, and the 6 is the upper reg), and then shift the uppwer register LEFT 16, so it fills the empty 16 bits from the top of the lower reg, and orr them together. Since GBA's BIOS div is only 32 bit, I usually just shift the denominator right 8 spaces, divide, and shift the result left 8. Basically an 8-bit fp division. I know you can do a 32x32=64-bit multiply by splitting it up in to a bunch of 16x16=32 ones and shift/add them together, so you might be able to split up a divide to get full precision too. Not sure.
And adding is dead easy, just adds the lower halves, and adc the uppers.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

If you want a full 64x64=64 multiply, it's just cross-multiplication. GCC generates the code given below with r0=op1 low, r1=op1 high, r2=op2 low, r3=op2 high, r4=result low, r5=result high:

Code:

umull r4, r5, r0, r2
mla r5, r0, r3, r5
mla r5, r1, r2, r5

This works for signed or unsigned arithmetic and no shifts are needed as you can just stick the high results directly into the top word of the output. You can't do division by a similar method as far as I know; division is pretty crappy and unpleasant at the best of times let alone when you're operating above your word size.

Yeah, I probably should have mentioned that GCC will automatically generate 64 bit math for you, but it sounded like Lupin wanted to do the stuff by hand. ;-)

That's why I gave the asm code. =)

gcc -S is very helpful (though I had to use -O to get it to generate mla instructions instead of a seperate mul and add).

hm, i'm still confused... so, how does the code for an 32x32=32 multiply look like? I know, you'll say that has nothing to do with 64 bit numbers, but the problem is that when i multiply my both 32bit numbers the result won't fit in 32bit, but I want 32 bit output -> so I have to shift down 32 bits in order to get an 32 bit output.

I'm feeling like I'm asking an dumb question now :)

A 32x32=32 multiply is just mul r0, r1, r2. If the resulting number is more than 32 bits it will overflow, yes, but that's unavoidable; if you want a 32-bit output, you simply can't multiply two numbers whose product is more than 2^32. The alternative is to use a 32x32=64 multiply: the umull or smull instructions.

I may have misunderstood your question..

I want to do 32x32=64>>32 this is because for my reciprocal table (hmm, this is like my 100th try to create an good working reciprocal lut...lol :()

Ah. That's just a 32x32=64 multiply, followed by a shift. The multiply is done by umull or smull (unsigned and signed respectively), then just shift it normally.

Yeah, just think of it as regular decimal multiplication on paper. If you multiply a number with 2 digits after the decimal point by a number with 3 digits after the point, you get a number with 5 digits after the point. Same with fixed-point math, so if you use a 32 bit RCP table, and 16 bit numbers, then you multiply and get 48 fractional bits. You want to get back to 16, so shift 32 of those off. But since shifting by 32, you'd just shift off the entire low reg, and then orr on the hi reg without shifting at all, it's easier to just use a
umull r2, r0, r1, r0
Assuming you put that in a function so r0 and r1 are the args, r2 is free, and r0 is the return.

But actually a 32 bit RCP table will make dividing by 1 a problem, since 1 << 32 doesn't fit in a register. You could probably get away with just setting it to 0xffffffff instead though.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

hm, how could I shift the whole 64 bit number right by lets say 16 bits?

Thx for all your answers!

Lupin wrote:

hm, how could I shift the whole 64 bit number right by lets say 16 bits?

Thx for all your answers!

warning - untested code follows :)

r0 is input low
r1 is input high

mov r2, r1, lsl#16
add r0, r2, r0, lsr#16
mov r1, r1, asr#16

r0 is output low
r1 is output high

cheers

Col

I`m trying to use the 64 bit type long long but am having problems with the following calculations:-

c=((65536*65536)+(65536*65536))
print c/65536.0 = 0.0 WRONG!!!

c=4294967296+4294967296
print c/65536.0 = 131072.0 CORRECT

Both of these equations should give the same result all i`ve done in the second equation is worked the multiplication out in advance. All of these values should fit into 64bits.

The difference is that 65536 is interpreted as an int, while 4294967296, because it doesn't fit in an int, is interpreted as a long long. So the first multiplication is between int with an int result, while the second is between long long with a long long result. You can cause a 64-bit operation by casting one of the values to long long or appending LL to the number, as with "65536LL".

Ok now I see, thanks

By the way, the max value for an unsigned :
- 16unsigned bit int is :65535
- 32unsigned bit int is : 4294967295

I don't understand how your second test can work cause I tryed it in ASM, and loading a value superior to (2^32 - 1) in a register puts in it all zeros. So the second result should be zero...

GCC automatically detects that 2^32 is too large to fit, and uses 64-bit arithmatic.

gbadev.org forum archive

ASM > Working with 64 bit numbers

#10652 - Lupin - Thu Sep 11, 2003 5:14 pm

#10657 - sajiimori - Thu Sep 11, 2003 5:59 pm

#10659 - tepples - Thu Sep 11, 2003 6:03 pm

#10661 - DekuTree64 - Thu Sep 11, 2003 6:45 pm

#10665 - torne - Thu Sep 11, 2003 8:06 pm

#10671 - sajiimori - Fri Sep 12, 2003 12:33 am

#10681 - torne - Fri Sep 12, 2003 11:01 am

#10687 - Lupin - Fri Sep 12, 2003 4:54 pm

#10688 - torne - Fri Sep 12, 2003 5:01 pm

#10689 - Lupin - Fri Sep 12, 2003 6:30 pm

#10690 - torne - Fri Sep 12, 2003 6:35 pm

#10696 - DekuTree64 - Fri Sep 12, 2003 9:01 pm

#10698 - Lupin - Fri Sep 12, 2003 11:37 pm

#10700 - col - Sat Sep 13, 2003 12:13 am

#10725 - Roidy - Sat Sep 13, 2003 1:02 pm

#10727 - Burton Radons - Sat Sep 13, 2003 2:23 pm

#10732 - Roidy - Sat Sep 13, 2003 3:41 pm

#10838 - funkeejeffou - Wed Sep 17, 2003 7:07 pm

#10843 - torne - Wed Sep 17, 2003 8:32 pm