#18815 - batblaster - Tue Apr 06, 2004 12:48 am
Hi !
This is my triangle sort routine in ASM , i've maded all routine in asm like rotation and fill but i think the sort is not very fast, i'm very happy if someone can help me on make it faster...
The ASM version is faster then the "C" version but i think not to much, i've tried to make some changes like this:
But i don't know is slowly then the other...
Any idea on how to improve ???
To sort,rotate,fill a 86 vertex and 168 triange i take 1 entire cycle...
I know is possible to make it better but really i don't know...
Thanks a lot to DekuTree64 to help me and thanks to anyone who make all the docs i found on the arm asm intructions...
Thanks to anyone in advance...
P.S. Sorry for nomuch comments and sorry for some comment in a very bad english...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------
This is my triangle sort routine in ASM , i've maded all routine in asm like rotation and fill but i think the sort is not very fast, i'm very happy if someone can help me on make it faster...
Code: |
@----------------------------------------------------------------------- @ @ Triangle sort , this routine make the sort of triangle faces @ to be filled @ @extern void TriangleSort(PTriangolo,void*,u32 ntriangoli); @ @PTriangolo is a structure @----------------------------------------------------------------------- .global TriangleSort .arm .align 2 .section .iwram @, "ax", %progbits @ for (i=0;i<ntriangoli;i++) @ { @ z[i]=(((PPunto3D) &Punti_rotati[(&triang[i])->p1])->z+ @ ((PPunto3D) &Punti_rotati[(&triang[i])->p2])->z+ @ ((PPunto3D) &Punti_rotati[(&triang[i])->p3])->z); @ } TriangleSort: stmfd sp!, {r4-r11, lr} @Start of the routine who create the Z array, the same of the "C" version showed above mov r5,#0 mov r10,#8 mov r11,#12 mov r14,r1 srtloop: mov r12,r5,lsl #4 ldr r6,[r0,r12] add r12,r12,#4 ldr r7,[r0,r12] add r12,r12,#4 ldr r8,[r0,r12] mla r9,r6,r11,r10 mla r6,r7,r11,r10 mla r7,r8,r11,r10 ldr r9,[r3,r9] ldr r6,[r3,r6] ldr r7,[r3,r7] add r8,r9,r6 add r8,r8,r7 str r8,[r14],#4 add r5,r5,#1 cmp r5,r2 bmi srtloop @End of routine who make Z Array mov r4,#1 @i=1 sortloop: mov r5,r2,lsl #2 @numtrinagoli in r5 sortloop2: sub r5,r5,#4 @numtriangoli will be -1 this is j cmp r5,r4 @compare j with i bmi out @if is < i will go out else >= go on ldr r6,[r1,r5] sub r5,r5,#4 @j-1 ldr r7,[r1,r5] cmp r6,r7 @if z[j]>z[j-1] strgt r6,[r1,r5] addgt r5,r5,#4 strgt r7,[r1,r5] movgt r5,r5,lsl #2 ldrgt r6,[r0,r5] addgt r5,r5,#4 ldrgt r7,[r0,r5] addgt r5,r5,#4 ldrgt r8,[r0,r5] addgt r5,r5,#4 ldrgt r9,[r0,r5] subgt r5,r5,#12+16 ldrgt r10,[r0,r5] addgt r5,r5,#4 ldrgt r11,[r0,r5] addgt r5,r5,#4 ldrgt r12,[r0,r5] addgt r5,r5,#4 ldrgt r14,[r0,r5] strgt r9,[r0,r5] subgt r5,r5,#4 strgt r8,[r0,r5] subgt r5,r5,#4 strgt r7,[r0,r5] subgt r5,r5,#4 strgt r6,[r0,r5] addgt r5,r5,#16 strgt r10,[r0,r5] addgt r5,r5,#4 strgt r11,[r0,r5] addgt r5,r5,#4 strgt r12,[r0,r5] addgt r5,r5,#4 strgt r14,[r0,r5] subgt r5,r5,#12 movgt r5,r5,asr #2 b sortloop2 out: add r4,r4,#1 cmp r4,r2 bmi sortloop ldmfd sp!, {r4-r11, lr} bx lr THIS IS THE "C" Version of my sort, this is only the 2nd step the 1st one was showed above for (i=1;i<ntriangoli;i++) { for (j=ntriangoli-1;j>=i;j--) { if (z[j]>z[j-1]) { z1=z[j]; z[j]=z[j-1]; z[j-1]=z1; t1=triang[j-1]; triang[j-1]=triang[j]; triang[j]=t1; } } } |
The ASM version is faster then the "C" version but i think not to much, i've tried to make some changes like this:
Code: |
mov r4,#1 @i=1 sortloop: mov r5,r2,lsl #2 @numtrinagoli in r5 sortloop2: sub r5,r5,#4 @numtriangoli will be -1 tutto this is j sortloop3: cmp r5,r4 @compare j with i bmi out @if is < i will go out else >= go on ldr r6,[r1,r5] sub r5,r5,#4 @j-1 ldr r7,[r1,r5] cmp r6,r7 @if z[j]>z[j-1] ble sortloop3 rest of code some of before |
But i don't know is slowly then the other...
Any idea on how to improve ???
To sort,rotate,fill a 86 vertex and 168 triange i take 1 entire cycle...
I know is possible to make it better but really i don't know...
Thanks a lot to DekuTree64 to help me and thanks to anyone who make all the docs i found on the arm asm intructions...
Thanks to anyone in advance...
P.S. Sorry for nomuch comments and sorry for some comment in a very bad english...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------