gbadev.org forum archive

Hi !

This is my triangle sort routine in ASM , i've maded all routine in asm like rotation and fill but i think the sort is not very fast, i'm very happy if someone can help me on make it faster...

Code:

@-----------------------------------------------------------------------
@
@ Triangle sort , this routine make the sort of triangle faces
@ to be filled
@
@extern void TriangleSort(PTriangolo,void*,u32 ntriangoli);
@
@PTriangolo is a structure
@-----------------------------------------------------------------------

.global TriangleSort
.arm
.align 2
.section .iwram @, "ax", %progbits

@ for (i=0;i<ntriangoli;i++)
@ {
@ z[i]=(((PPunto3D) &Punti_rotati[(&triang[i])->p1])->z+
@ ((PPunto3D) &Punti_rotati[(&triang[i])->p2])->z+
@ ((PPunto3D) &Punti_rotati[(&triang[i])->p3])->z);
@ }

TriangleSort:
stmfd sp!, {r4-r11, lr}

@Start of the routine who create the Z array, the same of the "C" version showed above

mov r5,#0
mov r10,#8
mov r11,#12
mov r14,r1
srtloop:
mov r12,r5,lsl #4
ldr r6,[r0,r12]
add r12,r12,#4
ldr r7,[r0,r12]
add r12,r12,#4
ldr r8,[r0,r12]
mla r9,r6,r11,r10
mla r6,r7,r11,r10
mla r7,r8,r11,r10
ldr r9,[r3,r9]
ldr r6,[r3,r6]
ldr r7,[r3,r7]
add r8,r9,r6
add r8,r8,r7
str r8,[r14],#4
add r5,r5,#1
cmp r5,r2
bmi srtloop
@End of routine who make Z Array

mov r4,#1 @i=1
sortloop:
mov r5,r2,lsl #2 @numtrinagoli in r5
sortloop2:
sub r5,r5,#4 @numtriangoli will be -1 this is j
cmp r5,r4 @compare j with i
bmi out @if is < i will go out else >= go on

ldr r6,[r1,r5]
sub r5,r5,#4 @j-1
ldr r7,[r1,r5]
cmp r6,r7 @if z[j]>z[j-1]
strgt r6,[r1,r5]
addgt r5,r5,#4
strgt r7,[r1,r5]
movgt r5,r5,lsl #2
ldrgt r6,[r0,r5]
addgt r5,r5,#4
ldrgt r7,[r0,r5]
addgt r5,r5,#4
ldrgt r8,[r0,r5]
addgt r5,r5,#4
ldrgt r9,[r0,r5]
subgt r5,r5,#12+16
ldrgt r10,[r0,r5]
addgt r5,r5,#4
ldrgt r11,[r0,r5]
addgt r5,r5,#4
ldrgt r12,[r0,r5]
addgt r5,r5,#4
ldrgt r14,[r0,r5]

strgt r9,[r0,r5]
subgt r5,r5,#4
strgt r8,[r0,r5]
subgt r5,r5,#4
strgt r7,[r0,r5]
subgt r5,r5,#4
strgt r6,[r0,r5]
addgt r5,r5,#16
strgt r10,[r0,r5]
addgt r5,r5,#4
strgt r11,[r0,r5]
addgt r5,r5,#4
strgt r12,[r0,r5]
addgt r5,r5,#4
strgt r14,[r0,r5]
subgt r5,r5,#12
movgt r5,r5,asr #2
b sortloop2
out:
add r4,r4,#1
cmp r4,r2
bmi sortloop

ldmfd sp!, {r4-r11, lr}
bx lr

THIS IS THE "C" Version of my sort, this is only the 2nd step the 1st one was showed above

for (i=1;i<ntriangoli;i++)
{
for (j=ntriangoli-1;j>=i;j--)
{
if (z[j]>z[j-1])
{
z1=z[j];
z[j]=z[j-1];
z[j-1]=z1;
t1=triang[j-1];
triang[j-1]=triang[j];
triang[j]=t1;
}
}
}

The ASM version is faster then the "C" version but i think not to much, i've tried to make some changes like this:

Code:

mov r4,#1 @i=1
sortloop:
mov r5,r2,lsl #2 @numtrinagoli in r5
sortloop2:
sub r5,r5,#4 @numtriangoli will be -1 tutto this is j
sortloop3:
cmp r5,r4 @compare j with i
bmi out @if is < i will go out else >= go on

ldr r6,[r1,r5]
sub r5,r5,#4 @j-1
ldr r7,[r1,r5]
cmp r6,r7 @if z[j]>z[j-1]
ble sortloop3

rest of code some of before

But i don't know is slowly then the other...

Any idea on how to improve ???

To sort,rotate,fill a 86 vertex and 168 triange i take 1 entire cycle...

I know is possible to make it better but really i don't know...

Thanks a lot to DekuTree64 to help me and thanks to anyone who make all the docs i found on the arm asm intructions...

Thanks to anyone in advance...

P.S. Sorry for nomuch comments and sorry for some comment in a very bad english...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/qsort.html
_________________
Team Pokeme
My blog and PM ASM tutorials

Sorry I never got a chance to reply to your mail about this, I've been a little busy lately.
Quicksort would be best, but here's a litle improvement to bubble sort that's easy to do. It should still work looping backward, but I'll stick with the one I know works

Code:

for(i = 0; i < count; i++)
{
int idx = i;
for(j = i+1; j < count; j++)
{
if(z[j] > z[idx])
idx=j;
}
if(idx != i)
//swap
}

_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

You should consider Radix sort instead. Simple to code, and much more efficient for the purpose of triangle sorting.
_________________
GBA,GC,NGPC,GP32,FPGA,DS stuff at http://torlus.com/

Thank you to anyone... I will try to make it in asm and hope will be faster hehehe...

If someone doit before me please post here...

Thanks.....
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------

Radix sort requires the use of a supplementary sorting algorithm. It is probably best when used in conjunction with counting sort, although the effectiveness of the algorithm will be determined by your available memory.

If you intend to use radix sort, though, make sure your sort-digits are a power of two, and not a power of ten.

Dan (uses insertion sort for almost everything).

Shell sort is another nice alternative that uses less RAM/Stack the QUick and for smaller data sets, is comperable or faster. It's what I use for sorting our actors.

Shell sort: O(n^1.25). I have used Shell sort (decreasing gap insertion sort) for an NES game, and it worked fine. I was even able to implement it in 6502 assembly language in fewer than a half dozen tries.

Heapsort: O(n*log(n)). Approaches the average case efficiency of Quicksort for large data sets.

Both Shell sort and Heapsort can be implemented either iteratively or with only tail recursion. Quicksort requires non-tail recursion and thus requires extra stack space. Quicksort can also degrade to O(n^2) in the worst case if your pivot choices work against you (common on sorted data unless you use middle-of-domain pivot); that won't happen with Shell sort or Heapsort.

See Wikipedia's Quicksort article for details.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

If this is for the purpose of drawing inconvex objects, have you thought about turning the object into a BSP-tree instead? No sorting, perfect rendering with no glitches...

BSP trees are a good idea, but you will still have to use sorting for models and such... or just use a backbuffer (will be way slower though).
_________________
Team Pokeme
My blog and PM ASM tutorials

tepples wrote:

Both Shell sort and Heapsort can be implemented either iteratively or with only tail recursion. Quicksort requires non-tail recursion and thus requires extra stack space.

Are you even sure that GCC supports tail recursion? I'm not. Anyway, it is not always possible for it to implement tail recursion dependent of how you use the local stack frame, e.g. passing on pointers to local variables etc. And since C is (very) weakly typed GCC cannot be sure with any pointers; for instance, passing on an integer in a cal in tail position might just be a pointer into the local stack frame and thus this cannot be destroyed, but there is no way for GCC to tell that.

True, it can be argued that passing on pointers to local variables is a bad habbit, but how does GCC care? ;)

the passing of integers is simply pushing the value of that integer onto the stack .. tail recursion - or any recursion for that matter is fine

Pointers are another issue - it's fine so long as you remember that they're all pointing to the same piece of data

gbadev.org forum archive

ASM > Triangle Sort in ASM how to improve ???

#18815 - batblaster - Tue Apr 06, 2004 12:48 am

#18825 - Lupin - Tue Apr 06, 2004 12:23 pm

#18826 - DekuTree64 - Tue Apr 06, 2004 3:54 pm

#18863 - Torlus - Wed Apr 07, 2004 9:30 am

#18867 - batblaster - Wed Apr 07, 2004 12:45 pm

#18877 - poslundc - Wed Apr 07, 2004 6:42 pm

#18910 - Miked0801 - Thu Apr 08, 2004 2:01 am

#18973 - tepples - Thu Apr 08, 2004 10:32 pm

#20504 - sasq - Wed May 12, 2004 8:47 am

#20513 - Lupin - Wed May 12, 2004 12:08 pm

#20521 - Daniel Andersen - Wed May 12, 2004 2:07 pm

#20872 - keldon - Tue May 18, 2004 9:49 pm