#16258 - rapso - Wed Feb 11, 2004 8:47 am
I've some strange problems with the release-mode *.gba.
I'm using (visual)HAM and with the debugmode all code seems to work just fine, but on release mode it works for some frames and freez.
I've read on this forum that the EWRAM is untouched, but when I call some functions, I've seen on the VisualBoyA and BoyCottA that some byte at the beginning of the EWRAM are changed.
the really strange thing is, all I have to do is to call a function looking like this
Code: |
void foo(u8* pDst,const u8* pStr1,const u8* pStr2)
{
return;
}
|
to get the freez.
the main function:
Code: |
u8 Test[32];
void main()
{
.
.
.
while(1)
{
//working version
foo(NULL,NULL,"foo");
//not working version
// foo(Test,Test,"foo");
Render();
}
|
I can get it work if I use the outcommandet function just by disabling the "Render();" call, but it does not work together.
at first I deleted the CODE_IN_IWRAM on all functions, 'cause I thought it might be a stackoverflow, but it still does not work.
so, please tell me the difference between the debug and the release mode, is it just the -o3 switch?
an other small question,
Code: |
u32 xm,x1,x2;
.
.
.
xm = (x1+x2)>>1;
.
.
.
|
is compiled to something like that
Code: |
add r1,r2,r3
mov r0, r1 asr #1
|
is there a reason not to write "add r0,r2,r3 asr #1" ?
greets
rapso
#16260 - FluBBa - Wed Feb 11, 2004 11:24 am
rapso wrote: |
Code: |
u32 xm,x1,x2;
.
xm = (x1+x2)>>1;
.
|
is compiled to something like that
Code: |
add r1,r2,r3
mov r0, r1 asr #1
|
is there a reason not to write "add r0,r2,r3 asr #1" ? |
Yes that would mean something like
But its still strange that it uses ASR instead of LSR as it an u32 and not s32.
_________________
I probably suck, my not is a programmer.
#16261 - rapso - Wed Feb 11, 2004 11:35 am
FluBBa wrote: |
rapso wrote: |
Code: |
u32 xm,x1,x2;
.
xm = (x1+x2)>>1;
.
|
is compiled to something like that
Code: |
add r1,r2,r3
mov r0, r1 asr #1
|
is there a reason not to write "add r0,r2,r3 asr #1" ? |
Yes that would mean something like
But its still strange that it uses ASR instead of LSR as it an u32 and not s32. |
sorry, yes it's s32....
rapso
#16264 - batblaster - Wed Feb 11, 2004 12:44 pm
try to ask in the HAM Forum on www.ngine.de i think you can found all the solutions you need...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------
#16270 - torne - Wed Feb 11, 2004 1:20 pm
Try using different compiler optimisation flags until you find exactly which is the culprit; then maybe you can narrow it down.
#16380 - rapso - Sat Feb 14, 2004 2:20 am
ok,
while i was trying to make a minimum version for download, I trow a lot of functions and variables out of the code, then I found one global
Code: |
register u16 *g_P asm("r9");
|
and this was the bug. I though that the compiler would 'save' the register across all sources, but it does it just for the one I defined it, and that's why it was overwritten in the release version ('cause of parameter passing over the register, not the stack) and did work with the debug version.
I hope my bug can help someone else to avoid it.
thanks for the help torne,batblaster && FluBBa
greets
rapso
#16382 - torne - Sat Feb 14, 2004 2:49 am
You should rarely if ever force things into registers; the compiler usually knows much better than you (and definately does if you use profiling and feedback). If the compiler's not being smart enough about register allocation, btw, try using -fnew-ra to turn on the graph colouring register allocator on recent GCC versions (but beware that it's not fully tested and *might* make broken code).
#16390 - Paul Shirley - Sat Feb 14, 2004 3:13 pm
removed
Last edited by Paul Shirley on Sun Mar 28, 2004 9:13 pm; edited 1 time in total
#16400 - poslundc - Sat Feb 14, 2004 10:28 pm
By the time you get to that level of optimization, you might want to start thinking about recoding your routine in assembly...
Dan.
#16401 - torne - Sat Feb 14, 2004 10:39 pm
Not neccecarily, poslundc; you can get some good performance boosts by doing careful register allocation for select globals as Paul says; but, well, it's one of those things where if you don't already know how to do it, you shouldn't be doing it. =)
#16435 - rapso - Mon Feb 16, 2004 12:51 pm
torne wrote: |
Not neccecarily, poslundc; you can get some good performance boosts by doing careful register allocation for select globals as Paul says; but, well, it's one of those things where if you don't already know how to do it, you shouldn't be doing it. =) |
yes, but someday everybody has to start doing this things, you know, this crazy things, you're on unknown terrain and just the stupid try&error game gives you that kick you need 'cause you've tried everything else :o)...
I'm not that familar with the gcc, I never used the manual register allocation, but my pointer to the actuall screen is used very often, I got the last bit of performance out of the code (I took a look at the assembler, not more to do than just inline the code to save the call and the return). so I tried to optimise it with some compiler "specials"
maybe someone knows some max polygone throughput values of other 3d engines for the gba? (just to be able to compare)
greets
rapso
#16444 - torne - Mon Feb 16, 2004 6:23 pm
rapso wrote: |
yes, but someday everybody has to start doing this things, you know, this crazy things, you're on unknown terrain and just the stupid try&error game gives you that kick you need 'cause you've tried everything else :o)... |
I know; it was sort of a joke. =) Paul's post told you what changes you need to make to get it to work.