#147028 - bpoint - Thu Dec 13, 2007 11:57 am
Hello all,
I have run into a problem where it seems gcc is generating invalid prolog code for a function in my audio engine at optimization levels -O2 and higher (the code works fine at -O0 and -O1). The actual problem is that generated code tries accesses invalid/unaligned memory addresses on the GBA. From what I can tell, gcc is mistakingly using r3 (a register passed as a parameter) when it should be using the stack pointer.
Here is a dissassembly of the problem code at level -O3, with the relevant C++ code and my comments/notes mixed in:
The point where Device::playSample() is called is correct, register r0 contains the Device's "this" pointer, r1->r3 contain the first 3 parameters and the rest are pushed onto the stack (I've also noted the actual values on the stack at the bl instruction). However, once inside playSample(), the volume (r2) and frequency (r3) are properly retrieved, but the instruction at 0x080193EC is attempting to store the volume data back into memory using the frequency as a base address! Since the frequency is 0x000001E0, this is obviously incorrect.
For comparison, here is the disassembly of just the prolog at -O1:
I really don't like blaming the compiler when code breaks at higher optimization levels (since it usually turns out to be my fault anyway), but I don't understand what I could possibly be doing wrong here -- especially since it's in the prolog before any of my code executes.
If it matters, the CFLAGS I am using are: -DGBA -mcpu=arm7tdmi -mtune=arm7tdmi -mthumb -mthumb-interwork -ffunction-sections -fdata-sections -g -O3 -Wall -fomit-frame-pointer -ffast-math
Does anyone have any suggestions? Could this just be a compiler bug?
I have run into a problem where it seems gcc is generating invalid prolog code for a function in my audio engine at optimization levels -O2 and higher (the code works fine at -O0 and -O1). The actual problem is that generated code tries accesses invalid/unaligned memory addresses on the GBA. From what I can tell, gcc is mistakingly using r3 (a register passed as a parameter) when it should be using the stack pointer.
Here is a dissassembly of the problem code at level -O3, with the relevant C++ code and my comments/notes mixed in:
Code: |
(89): return snddev->playSample(this, volume, frequency, panning, offset, priority, createPaused, autoFree, cbFunc, cbArg);
0801966C: 9B12 ldr r3, [sp, #0x048] ; [sp+0x048]=00000000 (offset) 0801966E: 464A mov r2, r9 ; r9=00000002 (priority) 08019670: 9301 str r3, [sp, #0x004] 08019672: 9202 str r2, [sp, #0x008] 08019674: 4653 mov r3, r10 ; r10=00000000 (createPaused) 08019676: 4642 mov r2, r8 ; r8=00000000 (autoFree) 08019678: 9303 str r3, [sp, #0x00C] 0801967A: 9204 str r2, [sp, #0x010] 0801967C: 9B16 ldr r3, [sp, #0x058] ; [sp+0x058]=00000000 (cbfunc) 0801967E: 9A17 ldr r2, [sp, #0x05C] ; [sp+0x05C]=00000000 (cbarg) 08019680: 6830 ldr r0, [r6] ; r6=this, [r6]=02006E2C (device ptr) 08019682: 9305 str r3, [sp, #0x014] 08019684: 9206 str r2, [sp, #0x018] 08019686: 1C31 add r1, r6, #0x0 ; r6=this (0201B80C) 08019688: 465A mov r2, r11 ; r11=00000000 (volume) 0801968A: 1C3B add r3, r7, #0x0 ; r7=000001E0 (freq) 0801968C: 9500 str r5, [sp] ; r5=FFFFFF81 (panning) 0801968E: F7FF bl ; r0=02006E2C, r1=0201B80C, r2=00000000, r3=000001E0, sp=03007B5C 08019690: FEA3 bl ::playSample ; [sp]=FFFFFF81, [sp+4]=00000000, [sp+8]=00000002, [sp+12]=00000000, [sp+16]=00000000 (90): } ; [sp+20]=00000000, [sp+24]=00000000 (92): Channel *Device::playSample(Sample *sample, u8 volume, u32 frequency, s8 panning, uint offset, Priority priority, bool createPaused, bool autoFree, SampleCallback cbFunc, void *cbArg) 080193D8: B5F0 push {r4, r5, r6, r7, lr} 080193DA: 465F mov r7, r11 080193DC: 4656 mov r6, r10 080193DE: 464D mov r5, r9 080193E0: 4644 mov r4, r8 080193E2: B4F0 push {r4, r5, r6, r7} 080193E4: B085 add sp, #-0x014 ; sp -> 0x03007B24 080193E6: 4693 mov r11, r2 ; r2=00000000 (volume) 080193E8: 9A0F ldr r2, [sp, #0x03C] ; [sp+0x03C] -> [p.sp+4] = 00000000 (offset) 080193EA: 1C1F add r7, r3, #0x0 ; r3=000001E0 (frequency) 080193EC: 605A str r2, [r3, #0x04] ; [r3+0x04]=!?!? 080193EE: 9B13 ldr r3, [sp, #0x04C] 080193F0: 9A14 ldr r2, [sp, #0x050] 080193F2: 615B str r3, [r3, #0x14] 080193F4: 619A str r2, [r3, #0x18] 080193F6: AB0E add r3, sp, #0x038 080193F8: 781B ldrb r3, this 080193FA: 469A mov r10, r3 080193FC: AB10 add r3, sp, #0x040 080193FE: 781B ldrb r3, this 08019400: 4699 mov r9, r3 08019402: AB11 add r3, sp, #0x044 08019404: 781B ldrb r3, this (93): { |
The point where Device::playSample() is called is correct, register r0 contains the Device's "this" pointer, r1->r3 contain the first 3 parameters and the rest are pushed onto the stack (I've also noted the actual values on the stack at the bl instruction). However, once inside playSample(), the volume (r2) and frequency (r3) are properly retrieved, but the instruction at 0x080193EC is attempting to store the volume data back into memory using the frequency as a base address! Since the frequency is 0x000001E0, this is obviously incorrect.
For comparison, here is the disassembly of just the prolog at -O1:
Code: |
(92): Channel *Device::playSample(Sample *sample, u8 volume, u32 frequency, s8 panning, uint offset, Priority priority, bool createPaused, bool autoFree, SampleCallback cbFunc, void *cbArg)
08019410: B5F0 push {r4, r5, r6, r7, lr} 08019412: 465F mov r7, r11 08019414: 4656 mov r6, r10 08019416: 464D mov r5, r9 08019418: 4644 mov r4, r8 0801941A: B4F0 push {r4, r5, r6, r7} 0801941C: B083 add sp, #-0x00C 0801941E: 1C05 add r5, r0, #0x0 08019420: 4688 mov r8, r1 08019422: 4693 mov r11, r2 08019424: 1C1E add r6, r3, #0x0 08019426: AB0C add r3, sp, #0x030 08019428: 781B ldrb r3, this 0801942A: 469A mov r10, r3 0801942C: AB0E add r3, sp, #0x038 0801942E: 781F ldrb r7, this 08019430: AB0F add r3, sp, #0x03C 08019432: 781B ldrb r3, this 08019434: 4699 mov r9, r3 08019436: AB10 add r3, sp, #0x040 08019438: 781B ldrb r3, this 0801943A: 9301 str r3, [sp, #0x004] (93): { |
I really don't like blaming the compiler when code breaks at higher optimization levels (since it usually turns out to be my fault anyway), but I don't understand what I could possibly be doing wrong here -- especially since it's in the prolog before any of my code executes.
If it matters, the CFLAGS I am using are: -DGBA -mcpu=arm7tdmi -mtune=arm7tdmi -mthumb -mthumb-interwork -ffunction-sections -fdata-sections -g -O3 -Wall -fomit-frame-pointer -ffast-math
Does anyone have any suggestions? Could this just be a compiler bug?