gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Coding > Efficiency

#15116 - punchy - Sat Jan 17, 2004 3:59 pm

Hi,
i'm trying to squeeze some more speed out of the GBA and could do with a few pointers. I've already got all the most important code running from IWRAM so i'm looking at using a few lookup tables to replace some calculations i'm doing. I'll be putting these in either EWRAM or ROM so i want to know which will be faster.

How do i know if accessing value from a LUT in memory will be faster than using a few shifts and adds in code? I'm a bit flakey on the number of cycles instructions and memory accesses take, and also how to change the waitstates of memory on the GBA.

Any help would be most appreciated.

Thanks.

#15123 - jma - Sat Jan 17, 2004 5:05 pm

If the calculation really only takes a couple shifts and an add, then doing the math will be much faster than loading a lookup table value (provided there are no branches in the calculation).

This is because pipelining, etc. will allow the math to occur at (about) 1 cycle per instruction. This is much better provided your expression is about 5 instructions or less.

On the other hand, using the lookup table you do have to deal with waitstates and the added timing of just loading the value.

Jeff
_________________
massung@gmail.com
http://www.retrobyte.org

#15126 - poslundc - Sat Jan 17, 2004 5:28 pm

LUTs should be used for complicated mathematical operations: divison, trigonometric functions, square root, etc. None of these are accoplishable with a few shifts and adds!

Have you done any profiling on your code to find out where the bottlenecks are? There have been several recent threads on this subject that you can find easily by searching the forum. Most game applications spend 95% of the time in 5% of the code; if you can figure out what that 5% is then you know where to optimize.

Also, make sure that anything you put into IWRAM is being compiled as ARM code (or better still, written in ARM assembly). This is the most efficient way to use code that's in IWRAM.

Dan.

#15129 - ampz - Sat Jan 17, 2004 5:50 pm

...And make sure the code you put in ROM or EWRAM is THUMB.

#15130 - punchy - Sat Jan 17, 2004 5:58 pm

Thanks for the info.

I have thumb running from ROM/EWRAM and ARM from IWRAM. How do i change the waitstate of the memory accesses from different areas?

#15132 - torne - Sat Jan 17, 2004 6:10 pm

You can *change* the wait states by writing to the system control registers; however not all speeds work on all carts. If you just want to know how long a read will take from a specific area of memory, there is a table at the beginning of the gbatek document.

#15296 - iq - Tue Jan 20, 2004 4:34 pm

poslundc wrote:
LUTs should be used for complicated mathematical operations: divison, trigonometric functions, square root, etc. None of these are accoplishable with a few shifts and adds!


Umm...trig and divisions can be done with shifts and adds....look up Cordic...and how do you think hardware does a division anyways?

#15298 - tepples - Tue Jan 20, 2004 4:42 pm

I think poslundc was trying to imply that trig functions implemented with linear interpolation on LUTs will run faster than trig functions implemented with CORDIC.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#15301 - poslundc - Tue Jan 20, 2004 5:33 pm

tepples wrote:
I think poslundc was trying to imply that trig functions implemented with linear interpolation on LUTs will run faster than trig functions implemented with CORDIC.


Yes, he was. :)

Dan.

#15318 - punchy - Tue Jan 20, 2004 8:32 pm

Thanks to all.