gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Beginners > plotting large groups of pixels [solved]

#168697 - wallacoloo - Sat May 16, 2009 1:32 am

So I have several cases in which I simply need to display a rectangle of a solid color. Rather than individually plotting each pixel, there's got to be another way. But how? I've tried using memcpy to copy 1 row at a time. I'm in mode 3, 16bit colors. When I tried implementing memcpy to redraw the entire screen gray, it worked quickly, but it didn't draw over the entire screen. Portions of the screen were solid gray, and other portions were interlaced gray and black. I have no idea why. Any suggestions?

Here's the code:
Code:

int main()
{
    *(unsigned long*)0x4000000 = 0x403; // mode3, bg2 on
    unsigned short* Screen = (unsigned short*)0x6000000;
    unsigned char toCopy[480];
    unsigned short spot = 0;
    while(spot < 480) { //fill an array so it represents 1 row of solid grey
        toCopy[spot] = 148; //byte 1
        spot++;
        toCopy[spot] = 82; //byte 2
        spot++;
    }
    unsigned int row = 0;
    while(row < 76800) { //thats 240 (width) * 160 (height) * 2 (bytes per pixel)
        memcpy(Screen+row,toCopy,480); //copy it onto the screen
        row += 480;
    }
     while(1) {} //wait forever...
}


thanks for the help!
-Colin


Last edited by wallacoloo on Thu Jun 25, 2009 11:54 pm; edited 3 times in total

#168699 - Miked0801 - Sat May 16, 2009 1:45 am

Depending on how memcpy is implemented, it may be doing u8 copies to VRAM which is bad. To check, use a for loop to do your copy, u16s or u32s at a time.

#168700 - wallacoloo - Sat May 16, 2009 2:09 am

I fixed it...

OK, I think I get what's happening. Because Screen is declared as a short, I only need to add 240 to "row" every loop. But thinking about it, since I'm adding "row" to the "Screen" pointer, it shouldn't work that way. So now I'm just confused as to why changing row += 480 to row += 240 fixed it...

#168701 - elhobbs - Sat May 16, 2009 2:31 am

wallacoloo wrote:
I fixed it...

OK, I think I get what's happening. Because Screen is declared as a short, I only need to add 240 to "row" every loop. But thinking about it, since I'm adding "row" to the "Screen" pointer, it shouldn't work that way. So now I'm just confused as to why changing row += 480 to row += 240 fixed it...
because you are incrementing a short array. the compiler knows thart a short is 2 bytes. in other words the compiler takes care of changing your 240 short offset into 480 bytes.

#168702 - Dwedit - Sat May 16, 2009 2:42 am

The implementations of memset and memcpy in DevkitARM aren't the best. But ideally, you'd want to use memset to set a bunch of pixels, along with a GOOD implementation of memset.

Here's an ARM ASM 32-bit memset if you want it... On the GBA, this is better than using DMA to fill memory.
Code:

@void memset32(u32 *dest, u32 word, u32 size);

memset32:
   @r0 = dest
   @r1 = word to fill
   @r2 = number of BYTES to write
   @all paramters MUST be word aligned, and size must be a multiple of 4
   bic r2,r2,#3  @forces size to be a multiple of 4, remove this line if you don't need it

   @pre-subtract, jump ahead if not enough remaining
   subs r2,r2,#32
   bmi 2f
   stmfd sp!,{r3-r7,lr}
   mov r3,r1
   mov r4,r1
   mov r5,r1
   mov r6,r1
   mov r7,r1
   mov r12,r1
   mov lr,r1
1:
   stmia r0!,{r1,r3-r7,r12,lr} @32 bytes
   subs r2,r2,#32
   bmi 3f
   stmia r0!,{r1,r3-r7,r12,lr}
   subs r2,r2,#32
   bpl 1b
3:
   ldmfd sp!,{r3-r7,lr}
2:
   adds r2,r2,#32
   bxle lr
1:   
   str r1,[r0],#4
   subs r2,r2,#4
   bxle lr
   str r1,[r0],#4
   subs r2,r2,#4
   bgt 1b
   bx lr


edit: edited to fix dumb bugs
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."


Last edited by Dwedit on Fri May 29, 2009 1:22 pm; edited 1 time in total

#168703 - wallacoloo - Sat May 16, 2009 2:58 am

I would've gone with memset, but as I understand, memset can only handle chars, not shorts. And I needed to use 16bit colors. Anyone know a workaround? I'm not in need of the extra speed anymore, but I might be in the future :D

Also, Dwedit, can I use that code along with c++? (if so, how?)
thanks for the help again.

#168732 - sverx - Mon May 18, 2009 10:51 am

wallacoloo wrote:
I needed to use 16bit colors. Anyone know a workaround?


swiCopy() maybe?

#168842 - wallacoloo - Fri May 29, 2009 2:35 am

I didn't know whether to start a new thread or not, but I have another similar problem. I have been trying to learn mode 4. That's where you have the 256 color palette. I'm trying to do something real simple, draw a "0" onto the screen. What happens is that I get something that looks somewhat like a 0. But it appears that every 2 pixels are being treated as one. So sometimes, where there is supposed to be 1 pixel, 2 pixels are drawn. Or where there's supposed to be 1 pixel, 0 pixels are drawn. I'm not getting any single pixels. It's either 2 or 0. I think this has to do with memcpy. I've tried recreating the image data as a short array, but with no success. Here's the code with plenty of comments. Thanks for the help again!

Code:

#define RGB16(r,g,b)  (r+(g<<5)+(b<<10))
unsigned char* Screen = (unsigned char*)0x6000000;
unsigned short* pallete = (unsigned short*)0x5000000; //pointer to color pallete. (I'm using mode 4)
       
const unsigned char txt_0[64] =  //this is the number 0 (drawn with a slash through it so it doesn't look like an o)
{
    0,0,0,1,1,1,0,0,
    0,0,1,0,0,0,1,0,
    0,1,0,0,0,1,0,1,
    0,1,0,0,1,0,0,1,
    0,1,0,0,1,0,0,1,
    0,1,0,1,0,0,0,1,
    0,0,1,0,0,0,1,0,
    0,0,0,1,1,1,0,0
};
       
void draw_txt_0(unsigned short pos) { //take a position on screen and draw a number 0 there
     memcpy(Screen + pos,txt_0,8); //row 1
     memcpy(Screen+240 + pos,txt_0+8,8); //row 2
     memcpy(Screen+480 + pos,txt_0+16,8); //row 3
     memcpy(Screen+720 + pos,txt_0+24,8); //and so on...
     memcpy(Screen+960 + pos,txt_0+32,8);
     memcpy(Screen+1200 + pos,txt_0+40,8);
     memcpy(Screen+1440 + pos,txt_0+48,8);
     memcpy(Screen+1680 + pos,txt_0+56,8);
}
int main()
{
    *(unsigned long*)0x4000000 = 0x404; // mode4, bg2 on
    pallete[0] = RGB16(24,24,24); //background color
    pallete[1] = RGB16(4,4,4); //text color
    memset(Screen,0,38400); //set entire screen to color background color
    draw_txt_0(2430); //draw a 0 at pixel 2430.
    while(1) {} //wait forever...
}

#168844 - dantheman - Fri May 29, 2009 7:11 am

You can't write to the VRAM in 8-bit chunks, only 16 bits at a time. If you write to only the first 8 bits, it will mirror the change in the last 8 bits. TONC has a better explanation and some code to work around the issue. Basically you read in two bytes, figure out which byte you're changing, modify it, then write both bytes back to VRAM. Mode 4 isn't nearly as efficient as it could be because of this hardware limitation.

Two other suggestions:
1. in your RGB16 macro, put parentheses around the r, g, and b letters. Your current macro will break if you call, for instance, "RGB15(15+someVariable, 16-someOtherVariable, 29)"
2. Might be easier to make an offset macro for calculating the right memory location to copy to. Something like
#define OFFSET(r, c) ((r)*240 + (c))

Regarding the original question though, I too am curious what the fastest way to plot rectangles in Mode 3 would be. I've been developing an application that could greatly benefit from a plotting speed increase, and I'm currently using DMA for that purpose. If there's a faster way, then I'm all ears. Dwedit, could your code be used in mode 3 for plotting single pixels? Or would it be restricted to widths of even sizes (which isn't too big of a constraint)?

#168848 - Dwedit - Fri May 29, 2009 1:04 pm

Here's the code again, wrapped up as a GCC function this time.
Edit: Now tested, stupid bugs fixed.
Code:

IWRAM_CODE __attribute__ ((naked)) __attribute__ ((noinline))   //replace with ITCM_CODE for nintendo ds arm9
void memset32(u32 *dest, u32 word, int size)
{
    //dest = destination address (must be word aligned)
    //word = word to fill
    //size = size in BYTES  (but smallest unit used by this function is words)
    __asm__ volatile (
    "   adr r3,0f"                      "\n\t"  //switch to ARM mode
    "   bx r3"                          "\n\t"
    "   .balign 4"                      "\n\t"
    "   .arm"                           "\n\t"
    "0: bic r2,r2,#3"                   "\n\t"
    "   subs r2,r2,#32"                 "\n\t"  //pre-subtract, jump ahead if not enough remaining
    "   bmi 2f"                         "\n\t"
    "   stmfd sp!,{r3-r7,lr}"           "\n\t"
    "   mov r3,r1"                      "\n\t"
    "   mov r4,r1"                      "\n\t"
    "   mov r5,r1"                      "\n\t"
    "   mov r6,r1"                      "\n\t"
    "   mov r7,r1"                      "\n\t"
    "   mov r12,r1"                     "\n\t"
    "   mov lr,r1"                      "\n\t"
    "1: stmia r0!,{r1,r3-r7,r12,lr}"    "\n\t" //32 bytes
    "   subs r2,r2,#32"                 "\n\t"
    "   bmi 3f"                         "\n\t"
    "   stmia r0!,{r1,r3-r7,r12,lr}"    "\n\t"
    "   subs r2,r2,#32"                 "\n\t"
    "   bpl 1b"                         "\n\t"
    "3: ldmfd sp!,{r3-r7,lr}"           "\n\t"
    "2: adds r2,r2,#32"                 "\n\t"
    "   bxle lr"                        "\n\t"
    "1: str r1,[r0],#4"                 "\n\t"
    "   subs r2,r2,#4"                  "\n\t"
    "   bxle lr"                        "\n\t"
    "   str r1,[r0],#4"                 "\n\t"
    "   subs r2,r2,#4"                  "\n\t"
    "   bgt 1b"                         "\n\t"
    "   bx lr "                         "\n\t"
    "   .thumb"                         "\n\t"
    );
}

_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."