gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > CLZ (count leading zeros)

#28848 - pyros - Sun Nov 07, 2004 5:02 pm

According to the ARM instruction reference sheet, ARM5+ support the CLZ instruction. However I cannot get this to work (using combinations of goldroad, armasm, visualboy advance and mappy). Apparently no$gba does support it.

My questions are:

does the GBA support CLZ?

which assemblers support CLZ?

which emulators are known to support CLZ?

Thankyou :)

#28850 - ecurtz - Sun Nov 07, 2004 5:49 pm

Not supported on the gba.

The ARM7 in the gba is actually a generation "3" chip. I think you need an ARM9, which actually is generation "5" to support CLZ.

#28896 - tepples - Mon Nov 08, 2004 4:23 pm

Get a GP32 emulator or wait for 2006 when Nintendo DS emulators begin to show up.

What were you planning to use the 'CLZ' instruction for? Perhaps we could suggest a better algorithm to do what you ultimately wanted your program to do.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#28904 - keldon - Mon Nov 08, 2004 4:49 pm

CLZ and CTZ are sexy opcodes that would divide the time of a lot of algorithms, such as range checking etc. On the x86 we can (if I remember correctly) count zeros from a specific position; but if not it's still a sexy opcode.

#28906 - pyros - Mon Nov 08, 2004 5:12 pm

tepples wrote:
What were you planning to use the 'CLZ' instruction for? Perhaps we could suggest a better algorithm to do what you ultimately wanted your program to do.


Better algorithms already have been suggested :)

I was going to use it for writing my own fast divide function. Not a particularly necessary thing to do given several very fast ones have already been suggested to me. Just an interesting thing to do :)

Anyway, i 'would have' CLZed the numerator and denominator, then left-shifted the denominator along by the difference. Thus the algorithm 'would have' only required one loop, with only 0-32 iterations plus a few more instructions. Without CLZ, another loop is required to left-shift the denominator.

#29043 - bakery2k - Wed Nov 10, 2004 6:42 pm

keldon wrote:
CLZ and CTZ are sexy opcodes that would divide the time of a lot of algorithms...


There is no CTZ instruction. However, there is a cool way of counting trailing zeros (in 5 instructions if I remember correctly), if CLZ is available.

#29045 - poslundc - Wed Nov 10, 2004 6:52 pm

... It isn't, though. :P

Dan.

#29078 - tepples - Thu Nov 11, 2004 1:00 am

I can think of a way, using conditional execution, to count leading zeroes in about a dozen ARM cycles without CLZ.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#34558 - metis - Fri Jan 21, 2005 12:35 pm

From the book "Hacker's Delight":
Counting leading zeros
Counting trailing zeros

#34566 - Arjan - Fri Jan 21, 2005 6:29 pm

This is the version I made a couple of years ago. It's similar to nlz2 from Hacker's Delight.
Code:

      movs   r1,r0,lsr #16
                                                   
      moveq   r1,r0      @if r1 is zero, there are at least 16 leading zero's
      moveq   r0,#16          @and we'll have to search in lowest 16bits of input
      movne   r0,#0       @otherwise, no leading zero's found yet

      @divide reg in 2 parts of 8 bits
      movs   r2,r1,lsr #8

      addeq   r0,r0,#8   @if r2 is zero, we've found another 8 leading zero's
      moveq   r2,r1      @search in lowest 8 bits

      @devide reg in 2 parts of 4 bits
      movs   r1,r2,lsr #4

      addeq   r0,r0,#4   @if r1 is zero, we've found another 4 leading zero's
      moveq   r1,r2      @search in lowest 4 bits

      @devide reg in 2 parts of 2 bits
      movs   r2,r1,lsr #2

      addeq   r0,r0,#2   @if r2 is zero, we've found another 2 leading zero's
      moveq   r2,r1      @and we'll search in lowest 2 bits
      
      @devide reg in 2 parts of 1 bits
      subs   r2,r2,#1

      addeq   r0,r0,#1    @if r2 is zero, we've found another leading zero
      movmi   r0,#32          @if r2 was zero before substracting, there are 32 leading zero's
[/code]
_________________
dus.... http://www.bombaman.net

#34573 - Miked0801 - Fri Jan 21, 2005 8:31 pm

BSearch of leading 0's :) Same idea we use.

#34596 - FluBBa - Sat Jan 22, 2005 1:50 am

Arjan: can this be put at http://www.geocities.com/v_d_d/gba/?
Full credits is ofcourse given.
_________________
I probably suck, my not is a programmer.

#34627 - Arjan - Sat Jan 22, 2005 10:04 pm

FluBBa: I have no problems with that.
_________________
dus.... http://www.bombaman.net

#34811 - isildur - Wed Jan 26, 2005 5:28 pm

Thanks!

CLZ was added to the
GBA ARM Code Repository.

We look forward for more contributions!


Last edited by isildur on Wed Nov 16, 2005 5:03 pm; edited 1 time in total

#60984 - kusma - Wed Nov 16, 2005 2:47 pm

two iterations of binary search + a 256-entry lut ain't too bad either.
should be possible in around 10 cycles.

#61000 - Miked0801 - Wed Nov 16, 2005 5:47 pm

Behold as threads arise from the dead ;)

#65422 - keldon - Tue Jan 03, 2006 2:48 am

I think I have found a way to speed up CLZ and CTZ. If you do a simple search to find which 8bytes the MSB and LSB is within then you can use a 256 entry look table.

Code:
@ IN:
@   r0 = num
@ USE:
@   r1 = Trailing Zeros
@   r2 = num
CTZ:
   movs r2, r0, lsr #16

   moveq r2, r0
   moveq r1, #16
   movne r1, #0

   movs r0, r2, lsr #8

   addeq r1, r1, #8
   moveq r0, r2

You could start with that; and after use r0 as an index to an array where arr[n] = log(n). You then add the result in that array to r1 as the array only stores the amount of trailing for one byte and r1 stores how much zeroes trail before.

All that is missing from this code is using r0 to index the array; but I do not know any arm assembler.

#65423 - kusma - Tue Jan 03, 2006 3:07 am

keldon: that was exactly what i suggested three posts ago ;)

#65424 - keldon - Tue Jan 03, 2006 3:11 am

Oh yes. I did not notice that post. So how many cycles does it end up being after the memory accesses?

#65425 - kusma - Tue Jan 03, 2006 3:15 am

i've never tested [edit: i use a c-version myself], i don't consider this kind of code a candidate for assembly, as the compiler usually generate damn good results, and as i can choose to have it inline, the compiler can do optimizations with the code around. but i assume that 10 cycles is about right.

#71886 - nmain - Wed Feb 15, 2006 2:48 am

ecurtz wrote:
Not supported on the gba.

The ARM7 in the gba is actually a generation "3" chip. I think you need an ARM9, which actually is generation "5" to support CLZ.


I believe the ARM7TDMI is a v4T... a v3 doesn't have BX?