#71648 - pure_ev1l - Mon Feb 13, 2006 9:37 pm
ok, basicly this is kinda a question about how the compiler compiles so I can optimize it. as I REALLY need VERY optimised code as I am making a kinda particle simulator with 1000's of particles. I am using the newest devkitadv incase it makes any odds
basicly I have a 2 dimentional array, but as all the reading will be relative (eg 1 block up, down, left or right) should I make it a 1D array and just use + and -? (so insted of map[x+1][y]... do map[location+1])
also, I have an array that has 2 elements(is that the right word?) so does it do a muliply to access it? so should I split into 2? (eg, pixel[3].r and pixel[3].g etc... into pixelr[3]) to stop it multiplying to find entries?
also, if its a unsigned integer for example (16 bit) does it use multiply to access it? (eg test[13]=4 does it do 13*2 to get the location to modify) so should I only use u8?
any other tips about getting rid of multiply?
thanks alot in advance, this forum is so damn helpful!
_________________
-Rory
"Planning makes for a boring life, but a good game"
#71651 - sajiimori - Mon Feb 13, 2006 9:50 pm
There's only one way to know the answers to these questions: Look at the compiler's output using the -S command line option. It's too hard to predict what the optimizer will do.
In general, multiplies and divides are converted to shifts if they are known at compile time to be by a power of two.
#71653 - tepples - Mon Feb 13, 2006 9:53 pm
Multiplies are fast on ARM7. Divides aren't, unless you use some hack such as multiplying by a reciprocal from a lookup table.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#71654 - DekuTree64 - Mon Feb 13, 2006 9:55 pm
Yeah, the compiler will do a multiply for 2-dimensional arrays. One way you can optimize there is if you're looping over one row, use a pointer to the start of that row, and then increment the pointer each time instead of doing array offsets:
Code: |
int array[MAX_Y][MAX_X];
for (y = 0; y < MAX_Y; y++)
{
int *arrayPtr = array[y];
for (x = MAX_X - 1; x >= 0; x--)
{
*arrayPtr = 1;
arrayPtr++;
}
} |
Note the backward loop for x, because it's never actually used as an offset so it doesn't matter which way it goes, and going to 0 is a tiny bit faster than incrementing and then comparing with an end value (pretty insignificant when comparing with a constant like this though).
And yes to your other question too, the compiler will multiply the array index by the size of the data type. This is just a shift for built-in types, so it's free in ARM, or one extra cycle in THUMB. Structures will usually have to do a real multiply though.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#71661 - sajiimori - Mon Feb 13, 2006 10:43 pm
Hey Deku, will GCC not optimize a 2D array lookup if it knows the width of it (in bytes) is a power of 2? I would expect the same to go for structs that are powers of 2 in size.
#71685 - DekuTree64 - Tue Feb 14, 2006 12:26 am
sajiimori wrote: |
Hey Deku, will GCC not optimize a 2D array lookup if it knows the width of it (in bytes) is a power of 2? |
I'd assume it would, but I've never actually checked it.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#71708 - Joat - Tue Feb 14, 2006 1:59 am
1) After algorithmic optimizations, the biggest win by far on the GBA is the move to ARM code in IWRAM (check the FAQ, I think there is a topic on how to do this, if not, google for it, and failing that ask and I'll explain)
2) In a similar vein to (1), memory speed is a big issue. If your particle tables also fit into iwram, do it, but if they have to be in exwram, make sure the elements are halfwords (16 bits wide), don't use words if you can help it (16 bit external bus, word reads/writes cost basically twice as much). Bytes versus halfwords: The ldrb / strb share the nice addressing modes of ldr/str, while ldrh/strh have restricted addressing. However, it's *never* a win to read two bytes and reassemble, read a halfword if your data needs a bigger range.
2) devkitadv isn't much (probably not any) slower than devkitarm, but it is well worth the hour or two it will take to switch over to devkit ARM. It's using a much newer version of GCC, is still being maintained, etc...
3) Profile! Setup a pair of timers in cascade mode to make cycle-accurate measurements of how fast your code is. Don't second-guess what a code change does to time, measure it. Once you have a baseline for your current code, you can try different things and see what is a winner. As sajimori suggested, it's also a good idea to look at the compiler output to see what is going on.
4) Keep a working, unoptimized (or only sanely optimized) copy of your code before you start trying weird stuff. It will help you find bugs introduced during optimization, provides a path back to working code if you botch it, etc...
_________________
Joat
http://www.bottledlight.com
#71711 - tepples - Tue Feb 14, 2006 2:10 am
Generalization of 4) is to learn to set up a CVS repository on your computer and check files in and out of it (e.g. with TortoiseCVS client and CVSNT Open Source server). This will let you set up a "reference" branch and an "optimization" branch.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#71738 - gauauu - Tue Feb 14, 2006 4:36 am
Well said. Using cvs for all my projects made a huge impact in my productivity.
And I've found that if I'm ever working on multiple computers, Freepository is a great free way to host my cvs repository, so I can access it from anywhere.
It's a little slower than putting your repository on your own computer but also provides the benefit of always having a backup of your code on a different machine. (not to mention all the other basic benefits of cvs)