gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Beginners > No difference in speed when code put in iwram

#19529 - alek - Wed Apr 21, 2004 3:05 pm

I have the sqrt rotine written by KevinW. I've changed it so I can assemble it in GAS. My question is when I write .section iwram is the code put in iwram and if that's the case why doesn't it go faster on hardware.

here is how I have written it.

Code:

.code 32
.section .iwram
.align
.Global isqr      
@ Entry
isqr:
the function...

If I remove .code 32 and .section .iwram the program doesn't execute slower on hardware. This function is called a lot so why is this.

Second qustion
Code:

@ Rounding
   IF :DEF:USE_ROUNDING
   CMP     r0,r2
   ADC     r2,r2,#1
   ENDIF

This code is in the same sqrt function. I get the folloiwng errors when I compile my project if I don't remove it

isqr.s:101: Error: bad instruction `use_rounding'
isqr.s:104: Error: bad instruction `endif'

What should I do?

Thankfull for any response,

#19531 - poslundc - Wed Apr 21, 2004 4:08 pm

alek wrote:
If I remove .code 32 and .section .iwram the program doesn't execute slower on hardware. This function is called a lot so why is this.


Just because a function is called a lot doesn't mean you're necessarily going to tap out the GBA's resources during the VDraw/VBlank period. I could divide using GCC's built-in divide routine if I wanted to instead of a faster, custom method and it wouldn't make any visible difference so long as my program can still keep up with 60 FPS. Do you know for a fact that the routine isn't executing any faster on hardware?

Quote:
This code is in the same sqrt function. I get the folloiwng errors when I compile my project if I don't remove it

isqr.s:101: Error: bad instruction `use_rounding'
isqr.s:104: Error: bad instruction `endif'

What should I do?


Either convert the Goldroad directives to their GAS equivalents, or take it out and don't round your results, or leave it in but take out the first and fourth lines to round your results.

Dan.

#19534 - tepples - Wed Apr 21, 2004 5:12 pm

poslundc wrote:
I could divide using GCC's built-in divide routine if I wanted to instead of a faster, custom method and it wouldn't make any visible difference so long as my program can still keep up with 60 FPS.

More efficient code is nicer on the battery. Code run from RAM is also nicer on the battery.

Quote:
alek wrote:
isqr.s:101: Error: bad instruction `use_rounding'
isqr.s:104: Error: bad instruction `endif'

What should I do?

Either convert the Goldroad directives to their GAS equivalents

Specifically, you have to rename isqr.s to isqr.S (the capital triggers use of the C preprocessor, which handles #ifdef) and change the conditional lines:
Code:
@ Rounding
#ifdef USE_ROUNDING
   CMP     r0,r2
   ADC     r2,r2,#1
#endif

_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#19553 - alek - Wed Apr 21, 2004 9:09 pm

poslundc wrote:
Just because a function is called a lot doesn't mean you're necessarily going to tap out the GBA's resources during the VDraw/VBlank period. I could divide using GCC's built-in divide routine if I wanted to instead of a faster, custom method and it wouldn't make any visible difference so long as my program can still keep up with 60 FPS. Do you know for a fact that the routine isn't executing any faster on hardware?


Yeah, you are right the program was running in 60fps... I'm doing a three-body problem solver and it was the step size in my Runge-Kutta4 that fooled me...

tepples wrote:

Specifically, you have to rename isqr.s to isqr.S (the capital triggers use of the C preprocessor, which handles #ifdef) and change the conditional lines:
Code:
 
@ Rounding
#ifdef USE_ROUNDING
   CMP     r0,r2
   ADC     r2,r2,#1
#endif


The project compiles now

Thanks for the help guys I really appreciate it

#19580 - alek - Thu Apr 22, 2004 10:29 am

alek wrote:
poslundc wrote:
Just because a function is called a lot doesn't mean you're necessarily going to tap out the GBA's resources during the VDraw/VBlank period. I could divide using GCC's built-in divide routine if I wanted to instead of a faster, custom method and it wouldn't make any visible difference so long as my program can still keep up with 60 FPS. Do you know for a fact that the routine isn't executing any faster on hardware?


Yeah, you are right the program was running in 60fps... I'm doing a three-body problem solver and it was the step size in my Runge-Kutta4 that fooled me...



I tried it on hardware without the Wait for Vblank function and it runs in about 63fps independent of if I remove the .section .iwram and .code 32 or leave it in... The sqrt function gets called 12 times/loop. Shouldn't it make any difference?

isqr readme wrote:

It goes to show you how much you want to put key algorithms, and important loop code
as ARM in IWRAM as much as possible!
If you try to run isqr() from ARM ROM it will be very slow for obvious technical reasons (yet still faster then the BIOS function).

#19587 - poslundc - Thu Apr 22, 2004 2:26 pm

tepples wrote:
poslundc wrote:
I could divide using GCC's built-in divide routine if I wanted to instead of a faster, custom method and it wouldn't make any visible difference so long as my program can still keep up with 60 FPS.

More efficient code is nicer on the battery. Code run from RAM is also nicer on the battery.


Oh, tepples, only you would visibly notice the battery draining slightly faster. :)

Quote:
Specifically, you have to rename isqr.s to isqr.S (the capital triggers use of the C preprocessor, which handles #ifdef) and change the conditional lines:
Code:
@ Rounding
#ifdef USE_ROUNDING
   CMP     r0,r2
   ADC     r2,r2,#1
#endif


If you prefer to use the GAS directives instead of the C pre-processor, it's just .ifdef and .endif instead.

Quote:
I tried it on hardware without the Wait for Vblank function and it runs in about 63fps independent of if I remove the .section .iwram and .code 32 or leave it in... The sqrt function gets called 12 times/loop. Shouldn't it make any difference?


Let's say your optimized square-root function takes 100 cycles, or 1,200 cycles per frame. Then for the sake of argument say that your unoptimized square-root function takes 300 cycles, or 3,600 cycles per frame.

The GBA has 280,896 cycles per frame. So unless you were within 2,400 cycles of this limit (ie. you were already consuming 99.1% of the CPU's resources before optimizing the square-root routine) you will not notice any visible speed change.

(Note that I've simplified these numbers for the sake of argument, and there are obviously VDraw/VBlank issues to consider as well.)

The point is that it may be making a difference by reducing the workload on the CPU, but it won't make your game's loop run any faster so long as you still cycle on the pattern of VDraw/VBlank every 1/60th of a second.

Dan.

#19594 - tepples - Thu Apr 22, 2004 4:30 pm

poslundc wrote:
Oh, tepples, only you would visibly notice the battery draining slightly faster. :)

It's a LOT easier to notice battery drain on a GBA than on a GBC, especially if your GBA's batteries are at that point where the power light flashes green and red. But if you're really doing only about 12 square roots per frame, and you don't plan to add additional game objects that would need their own square roots, then you probably don't need to pay that much attention to optimizing out every last cycle.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#19611 - alek - Thu Apr 22, 2004 9:41 pm

Ok, I think I understand now.

Thanks for the help... This forum is really a life saver =)