gbadev.org forum archive

Has someone got a fast routine to load a linear framebuffer into OBJ/BG Vram?
This silly C implementation takes so many cycles:(

Hear you soon!

Code:

void CopyFromLinearBuffer( u16* a_Dest, u16* a_Source, u32 a_WidthInBlocks, u32 a_HeightInBlocks )
{
// Dummy counters;
u32 x, y;

// Source data pointers to the 8x8 block
u16 *s0, *s1, *s2, *s3, *s4, *s5, *s6, *s7;

// Calculate the number of 16 bit halfwords for each image row
u32 widthInHalfwords = a_WidthInBlocks << 3 >> 1;

// Let's start from the first upper-left block
for ( y = 0; y < a_HeightInBlocks; y++ )
{
   s0 = &a_Source[ widthInHalfwords * y * 8 ];
   s1 = s0 + widthInHalfwords;
   s2 = s1 + widthInHalfwords;
   s3 = s2 + widthInHalfwords;
   s4 = s3 + widthInHalfwords;
   s5 = s4 + widthInHalfwords;
   s6 = s5 + widthInHalfwords;
   s7 = s6 + widthInHalfwords;

   // Process the entire 8x8 block
   for ( x = 0; x < a_WidthInBlocks; x++ )
   {
      *a_Dest++ = *s0++;
      *a_Dest++ = *s0++;
      *a_Dest++ = *s0++;
      *a_Dest++ = *s0++;

      *a_Dest++ = *s1++;
      *a_Dest++ = *s1++;
      *a_Dest++ = *s1++;
      *a_Dest++ = *s1++;

      *a_Dest++ = *s2++;
      *a_Dest++ = *s2++;
      *a_Dest++ = *s2++;
      *a_Dest++ = *s2++;

      *a_Dest++ = *s3++;
      *a_Dest++ = *s3++;
      *a_Dest++ = *s3++;
      *a_Dest++ = *s3++;

      *a_Dest++ = *s4++;
      *a_Dest++ = *s4++;
      *a_Dest++ = *s4++;
      *a_Dest++ = *s4++;

      *a_Dest++ = *s5++;
      *a_Dest++ = *s5++;
      *a_Dest++ = *s5++;
      *a_Dest++ = *s5++;

      *a_Dest++ = *s6++;
      *a_Dest++ = *s6++;
      *a_Dest++ = *s6++;
      *a_Dest++ = *s6++;

      *a_Dest++ = *s7++;
      *a_Dest++ = *s7++;
      *a_Dest++ = *s7++;
      *a_Dest++ = *s7++;
   }
}
}

_________________
http://www.geocities.com/gabriele_scibilia/

Use Dma. There's plenty of example about it on this site, just make a search on it.

Dma or cpuCopy swi call for 8 byte?
I thought it was expensive to set up a Dma copy for it.

Many thanks
Ga
_________________
http://www.geocities.com/gabriele_scibilia/

There's at least 8x8Bytes to copy for a Tile.

niltsair wrote:

There's at least 8x8Bytes to copy for a Tile.

Yeah! but since I'm copying from a linear framebuffer there are 8 non-consecutive rows (8bytes a row) to be considered. Am I wrong:-?
Set up 8 Dma-copies can be slower than a simple 4halfwords copy repeated for 8 times (one per row). Can't it?
_________________
http://www.geocities.com/gabriele_scibilia/

They all seem to be consecutive.

niltsair wrote:

They all seem to be consecutive.

Ehm, since the routine above works...
it copies 4 halfwords from different rows (8) within a linear framebuffer into a consecutive 64byte tile/block (in Vram).
The source isn't consecutive memory.

I'm certainly missing something, excuse me;)

Ga
_________________
http://www.geocities.com/gabriele_scibilia/

OP is trying to copy from a linear framebuffer to a tile-based framebuffer. Such a copy does not take place in "consecutive memory".
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 10:04 pm; edited 1 time in total

Quote:

1: use u32's instead of u16's in the copy, less instructions -> faster code

Can I use 32 bit access to Vram?

Quote:

This is a classic piece of code that will be dramatically easier to write well in assembler.

Well, my asm skill is about zero:)
Many thanks for your precious suggestions.
_________________
http://www.geocities.com/gabriele_scibilia/

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 10:04 pm; edited 1 time in total

I'm glad to read your tips.
I received a fast ldmia/stmia implementation working 8byte by 8byte (8 ldmia/stdmia), I edited a little to work 16byte a time (four 32-bit registers, 4 load&stores). I'll try to figure out how use 8 registers and do it with 2 sets of load&stores as suggested, thank you again!
_________________
http://www.geocities.com/gabriele_scibilia/

Excuse me again, I hope I'm working on the right direction.

I properly setted up a 64bytes block-copy (rows 0-3 first, 4-7 then), is it the way you suggested?
I got about 14% of cpu time for a full screen 240x160 image.
This code is necessarily explanatory, repeat this 2 times for a full 8*8 block:

Code:

@ source: r14 = the start of the linear buffer
@ dest: r1 = the start of the tileset in vram
@ width: r2 = the width of the buffer in pixels

ldmia r14,{r5, r6} @ load 8 bytes from iwram
add r14,r14, r2 @ increase the source pointer with "width" bytes
ldmia r14,{r7, r8} @ load 8 bytes from iwram
add r14,r14, r2 @ increase the source pointer with "width" bytes
ldmia r14,{r9, r10} @ load 8 bytes from iwram
add r14,r14, r2 @ increase the source pointer with "width" bytes
ldmia r14,{r11,r12} @ load 8 bytes from iwram
stmia r1!,{r5 -r12} @ store 32 bytes in vram, and writeback the pointer
add r14,r14, r2 @ increase the source pointer with "width" bytes

I have to thanks all the readers!
_________________
http://www.geocities.com/gabriele_scibilia/

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 10:04 pm; edited 1 time in total

Quote:

Getting there, you can improve it substantially by allocating 4 source pointers and using writeback mode in the ldmia. Remember: you've got a lot of registers to play with so use them.

Yeahh!! That's could be really a speed up, I cannot imagine how asm could speed your work up.

Today I'm trying to modify my routine using your new hint... withous success anyway:(

First, I haven't got 3 registers left and I'm running out of registers for the loop counters (across width and height).
Well, I don't know asm very well (I know it absolutely nothing), excuse me.
Anyhow I'm trying to write just a "row" of tiles but I got weird stuffs; the source pointers point something else they should.
The loop itself works (I verified it with imaginary 32bit values assigned myself without the ldmia instructionS, just some ldr xx, =((150)+(150<<8)+(150<<16)+(150<<24))) but using ldmia it loads (and stores then) trash.

Assuming such a prototype

Code:

void Copy( u32* source, u32* dest, u32 width, u32 height );
@ source: r0 = the start of the linear buffer in iwram
@ dest: r1 = the start of the sprite in vram
@ width: r2 = the width of the iwram buffer in pixels
@ height: r3 = the height of the iwram buffer in pixels

I'm trying with

Code:

@ r11 source pointer
@ r12 destination pointer

mov r11,r0
mov r12,r1

@ r14 indicates how many pixels to copy per line
@ r3 indicates how many lines of pixels to copy

mov r14,r2

CopyNextRow:

@ r8,r9,r10,r11 source row pointers for row,row+1,row+2,row+3

mov r8, r11
add r9, r8, r14
add r10,r9, r14
add r11,r10,r14

Copy8x8Block:

@ Copy a 8*8 block from the linear buffer to vram
@ r0-r7 working store for half a char

ldmia r8!, {r0-r1} @ load 8 bytes from iwram, and writeback the pointer
ldmia r9!, {r2-r3} @ load 8 bytes from iwram, and writeback the pointer
ldmia r10!,{r4-r5} @ load 8 bytes from iwram, and writeback the pointer
ldmia r11!,{r6-r7} @ load 8 bytes from iwram, and writeback the pointer
stmia r12!,{r0-r7} @ store 32 bytes in vram, and writeback the pointer

ldmia r8!, {r0-r1} @ repeat this 2 times for a full 8*8 block
ldmia r9!, {r2-r3}
ldmia r10!,{r4-r5}
ldmia r11!,{r6-r7}
stmia r12!,{r0-r7}

@ I need r14 from "row" to "row" so don't trash it!
@ I run out of registers

subs r14, r14, #8 @ Substract 8 from the total number of pixels to copy
bne Copy8x8Block @ Zero left?? then go to next row, else copy the next block

@ subs ??, ??, #8 @ Decrease the number of rows with 8
@ beq CopyEnd @ Zero left?? then branch to end

@ b CopyNextRow @ Start the next row

@ we're finished

CopyEnd:

_________________
http://www.geocities.com/gabriele_scibilia/

gbadev.org forum archive

Coding > Loading a linear framebuffer into Vram

#7715 - wizardgsz - Mon Jun 23, 2003 2:18 pm

#7716 - niltsair - Mon Jun 23, 2003 2:23 pm

#7717 - wizardgsz - Mon Jun 23, 2003 2:27 pm

#7718 - niltsair - Mon Jun 23, 2003 2:32 pm

#7721 - wizardgsz - Mon Jun 23, 2003 3:45 pm

#7722 - niltsair - Mon Jun 23, 2003 3:53 pm

#7723 - wizardgsz - Mon Jun 23, 2003 3:58 pm

#7731 - tepples - Mon Jun 23, 2003 6:15 pm

#7752 - Paul Shirley - Tue Jun 24, 2003 12:06 am

#7759 - wizardgsz - Tue Jun 24, 2003 7:20 am

#7761 - Paul Shirley - Tue Jun 24, 2003 8:59 am

#7957 - wizardgsz - Sun Jun 29, 2003 1:17 pm

#7972 - wizardgsz - Sun Jun 29, 2003 6:32 pm

#7975 - Paul Shirley - Sun Jun 29, 2003 7:15 pm

#8247 - wizardgsz - Sat Jul 05, 2003 10:40 am