#52604 - Peter - Thu Sep 01, 2005 8:23 am
Hi,
I wrote the following routine to interpolate between two 16 color palettes and store the interpolation in Vram. Unfortunately it's pretty slow and I would like to know if you have any advice for me how to get it faster.
I measured in 'ticks', where 4399 ticks equal one frame. The routine takes 60 ticks (when it is not located in iwram) and 15 ticks in iwram. I need it to flash 32 palettes in worst case, this is ~2000 ticks which equals almost an half frame. When it's located in iwram, it's of course a lot faster but I'm pretty sure it could be even faster when written with better asm skills than mine :)
Thanks for your helps
I wrote the following routine to interpolate between two 16 color palettes and store the interpolation in Vram. Unfortunately it's pretty slow and I would like to know if you have any advice for me how to get it faster.
Code: |
.code 32 @.section .iwram, "ax", %progbits .global hel_PalInterpolate16 .type hel_PalInterpolate16, %function .align @ r0 = pPaletteMemory, pointer to target palette (BG or OBJ palette in Vram) @ r1 = pPaletteA, source palette @ r2 = pPaletteB, source palette @ r3 = Step, between 0..31 @ @ r4 = inner_loop Counter @ @ extern void hel_PalInterpolate16(u16 *pPaletteMemory, const void *pPaletteA, const void *pPaletteB, u32 Step); hel_PalInterpolate16: @mov r11, r11 @ save registers on stack stmfd sp!, {r0-r10} @ set the loop counter. run from last entry down to 0. @ decrement by 2 each iteration mov r4, #30 @ here starts the inner loop .inner_loop: ldrh r5, [r1, r4] @ get u16 color value from pPaletteA, lets call it colorA ldrh r8, [r2, r4] @ get u16 color value from pPaletteB, lets call it colorB @ we have both color values at this point. @ now we must extract their rgb components @ start with colorA mov r7, r5, lsr #10 @ get blue component from colorA mov r6, r5, lsr #5 @ get green component from colorA and r6, r6, #31 and r5, r5, #31 @ get red component from colorA @ extract components from colorB now mov r10, r8, lsr #10 @ get blue component from colorB mov r9, r8, lsr #5 @ get green component from colorB and r9, r9, #31 and r8, r8, #31 @ get red component from colorB @ now substract components of colorA from colorB @ to get the difference between each component. @ we store the result in the registers used by components @ from colorB, since we don't need them after this anymore sub r8, r8, r5 @ red sub r9, r9, r6 @ green sub r10, r10, r7 @ blue @ multiply difference with Step and shift 5 bits to the rigth again @ Step must be between 0..31. if Step equals 0, colors are entirely @ used from pPaletteA, if it is 31 colros entirely used from pPaletteB mul r8, r8, r3 @ red mul r9, r9, r3 @ green mul r10, r10, r3 @ blue @ now we add them to the components from pPaletteA @ and shift them back to have them in the right unitsize (0..31) add r5, r5, r8, lsr #5 @ red add r6, r6, r9, lsr #5 @ green add r7, r7, r10, lsr #5 @ blue @ now create a new 15bit bgr555 value @ r5 contains the red component, we basically logical OR the @ green and blue components and shift them to their required positions orr r5, r5, r6, lsl #5 @ green orr r5, r5, r7, lsl #10 @ blue @ now store the interpolated @ color to pPaletteMemory strh r5, [r0, r4] @ dec counter and loop until r4 equals zero subs r4, r4, #2 bne .inner_loop @ load registers from stack ldmfd sp!, {r0-r10} bx lr .size hel_PalInterpolate16, .-hel_PalInterpolate16 |
I measured in 'ticks', where 4399 ticks equal one frame. The routine takes 60 ticks (when it is not located in iwram) and 15 ticks in iwram. I need it to flash 32 palettes in worst case, this is ~2000 ticks which equals almost an half frame. When it's located in iwram, it's of course a lot faster but I'm pretty sure it could be even faster when written with better asm skills than mine :)
Thanks for your helps