#97474 - ProblemBaby - Thu Aug 10, 2006 1:38 am
Hi
My program needs a heavy speed up! I think I need to speed up the current code about 8 times or more cause some things isnt added yet. Currently all code and data is placed in Main Memory how much can I gain by using the everything fastest the DS can achieve?
Then Ive some general questions that think could me made better
1. Often Ive two check if a register is between 2 values is there some better way to do that instead of having two compares and two condbranches, I was trying to think if I could use some logic operator if the two "between"-values where powers of two, but I couldn't figure out anything. Any ideas?
2. Often I want to load/store a WORD but Ive to shift a register to get the address so Ive to do first a Shift and then a Shift in the LDR/STR to get it WORD-aligned, is there a way to force the address to be aligned or something.
Example:
Code: |
MOV r2, r10, LSR#11
BIC r0, r10, #0xF800
ADD r10, r10, #1
LDR r2, [r12, r2, LSL#2]
LDRB r1, [r2, r0]
|
Is it something I can remove here?
All small improvements you can think of are very welcome, even if I havent talked about it here!
Thanks
#97477 - Mighty Max - Thu Aug 10, 2006 2:19 am
ProblemBaby wrote: |
Hi
My program needs a heavy speed up! I think I need to speed up the current code about 8 times or more cause some things isnt added yet.
|
You should rethink your logic's and the used algorithms instead of nitpicking with assembler then.
Quote: |
Code: |
MOV r2, r10, LSR#11
BIC r0, r10, #0xF800
ADD r10, r10, #1
LDR r2, [r12, r2, LSL#2]
LDRB r1, [r2, r0]
|
|
Thats a good example, the code itself isn't much tuneable. However just slighty reorganising:
Put the 32 seperated tables in a continous chain, the access can now be (1x 64kB instead of 32x 2kB)
Code: |
LDR r2,[r12]
LDRB r1, [r10,r2],#1
|
Next step would now be to check if r2 can be stored somewhere other then [r12]. It keeps constant now for every byte access so maybe you can move that LDR outside of a loop, or at least write/read it from DTCM (Stack)
_________________
GBAMP Multiboot
#97478 - ProblemBaby - Thu Aug 10, 2006 2:26 am
The problem is that this pointers may change over time, I could copy instead of having pointers, but I aint sure that it would speed it up
but I'll probably try, thanks!
#97488 - sajiimori - Thu Aug 10, 2006 3:37 am
You need to find out where your program is spending its time. Otherwise, you're most likely wasting your effort.
#97499 - tepples - Thu Aug 10, 2006 4:43 am
What you want is a profiler. For each stage in the frame computation, check VCOUNT to see how much time has elapsed. Then repeat the process for each stage within that stage.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#97541 - ProblemBaby - Thu Aug 10, 2006 11:10 am
But I know exactly where the time is spent, its in my "render screen"-routine. its a routine that is called about 28000 times a frame, and even fully optimized code in that function takes a hell lot of time, so I need faster memory.
Ive simply set up a timer that count cycles each frame in steps of 64.
The program runs in about 6200~ when rendering are off. If I add a LDRH + 3 STRH to VRAM to the rendering routine Iam up at 11900 and more complicated things really have to be done to calculate pixels.
Ive thinked of another way that could make it work fairly well, But that requires that I can control the hardware rendering. I need it to wait to draw each scanline Ive tried something like
if (myScanline >= 0 && myScanline < 192)
while (VCOUNT != myScanline);
in a HBlank interrupt, but I couldn't manage it to work at all
#97689 - sajiimori - Fri Aug 11, 2006 1:09 am
Cool, timing your code is a good thing. ^_^
As for optimization, especially of the 8x sort, discussions should start with algorithms. The very very last thing to discuss is low-level cycle counting.
What does "render screen" do, specifically?
#97743 - Mighty Max - Fri Aug 11, 2006 11:40 am
ProblemBaby wrote: |
if (myScanline >= 0 && myScanline < 192)
while (VCOUNT != myScanline);
in a HBlank interrupt, but I couldn't manage it to work at all |
You shouldn't stall the cpu in the irq state with the while(). If the hblank is not the one you need, just leave it alone. Only act when the current hblank is the needed:
Code: |
if (myScanline >= 0 && myScanline < 192)
if (VCOUNT == myScanline) {
...
}
|
_________________
GBAMP Multiboot
#98500 - DynamicStability - Tue Aug 15, 2006 5:27 pm
I'd like to see said 28000 / frame 'renderScreen' code...or hear what its doing.
Find ways to reduce the number of compares.... You probably have a lot of redundant calculations also.
_________________
Framebuffer is dead.
DynaStab.DrunkenCoders.com