gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Coding > Mode 4 PlotPixel and DrawLine Needed...

#17282 - batblaster - Fri Mar 05, 2004 6:32 pm

Hello ,
I want to know if someone have a good plotpixel e/o drawLine for mode 4 but expecially for mode 4 with the check of odd and even pixel address..

Thanks a lot...

P.S. "C" or ASM no probs...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------

#17288 - Lupin - Fri Mar 05, 2004 9:15 pm

#define TIMES_240(x) (((x) << 8) - ((x) << 4))
#define TIMES_120(x) (((x) << 7) - ((x) << 3))

void vplotixel16(int x, int y, u16 c) {
register u16* p = VideoBuffer + TIMES_240(y) + x;
register u16 rc = c;
*p = rc;
}

#17296 - poslundc - Fri Mar 05, 2004 9:41 pm

Ugh... sorry Lupin, but you've gotta take a second look...

1. Mode 4 is 8-bit paletted, not 16-bit direct.

2. Even with optimizations turned off, gcc will automatically optimize multiplication by a constant to shifts and adds where appropriate.

3. Your intermediate variable rc doesn't do anything.

4. Please use BBCode tags when posting code.

In addition to those more important points, the register directives are unlikely to do anything helpful (in fact, declaring the second one as a register is a very bad suggestion to the compiler), and using the u16 datatype as a parameter is just going to cause additional overhead in the function call.

Dan.

#17301 - Miked0801 - Fri Mar 05, 2004 11:03 pm

Ok, blind coding here

This would need a C header like
void PlotPixel(u32 xPos, u32 yPos, u8 color, u16 *videoBuffer);

Code:

@ Assume 1st param is X and 2nd is Y and both are on screen (should be asserted)

    .GLOBAL     PlotPixel
    .CODE 32

@ r0 = x
@ r1 = y
@ r2 = 8-bit value
@ r3 = Video buffer (probably should be a constant, but no biggie)

PlotPixel:
   add   r3,r1,r1,#lsl 8      @ Get x240 by 256 - 16
   sub   r3,r3,r1,#lsr 4      @ ...
   movs  r0,r0,#0,#lsr 1      @ /2 and set carry if we want low byte saved
   adc   r0,r0,#0         @ Add carry back in
   eor   r0,r0,#1         @ Reverse low byte to read opposite of the address
   ldrb  r0,[r3,r0]         @ Read the byte
   orrcs r0,r0,r2,#lsl 8      @ if Carry, move color value high
   orrcc r0,r2,r0,#lsl 8      @ if not, color low, video value high
   strh  r0,[r3]            @ store the result
   
   bx    r14               @ return


For real speed, get rid of call/return (bx), and just macro. The bx takes more than than the inside code. Also, I'm pretty sure 1 more cycle can be shaved by combining the mov, eor, adc in a more clever manner, but I can't see it right now. Anyone else care to compile at test :)

Mike

#17307 - batblaster - Sat Mar 06, 2004 2:32 am

Many thanks, i will try to sompile and check if work correctly but on r3 i need to pass the videomemory location like 0x06000000 or 0x0600a000 for the backbuffer...

Is not good to use a variant like this ???

ADD r3, r3, #0x6000000 @ r3 = r3 + VRAM_ADDR

@ START - F_DISCNT_BUFSEL_GET
MOV r7, #0x04000000 @ r3 = 0x04000000
LDR r7, [r7] @ r4 = val from addr 0x04000000
AND r7, r7, #0x10 @ r4 = r4 & 0x10
MOV r7, r7, LSR #4 @ r4 = r4 >> 4
@ End - F_DISCNT_BUFSEL_GET

CMP r7, #1 @ r4 == 1
ADDNE r3, r3, #0xA000 @ r3 = r3 + 0xA000 (Backbuffer Addr)


I'm not very good in asm on ARM and i don't know if is faster or not. A check of what memory buffer you want fill in "C" before calling the routine is needed...

Thanks thanks...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------

#17308 - Miked0801 - Sat Mar 06, 2004 2:45 am

Just past in the 4th param as your buffer (r3) and it will work as is. That's way I made it a register to begin with. You're supposed to pass in the base address into the register for it to work right.

Make a 100 pixel vertical line down the screen.

for(i=0; i<100, i++)
{
PlotPixel(0, i, 0x80, (u16 *)(0x06000000));
}

or

make it to the back buffer

for(i=0; i<100, i++)
{
PlotPixel(0, i, 0x80, (u16 *)(0x0600a000));
}

or

to EWRAM if you want

for(i=0; i<100, i++)
{
PlotPixel(0, i, 0x80, (u16 *)(0x02000000));
}


The code doesn't care.

#17320 - Miked0801 - Sat Mar 06, 2004 9:43 am

Was trying to go to sleep when my mind chimed in and told me there was a bug in my pixel code:
Code:

PlotPixel:
   add   r3,r1,r1,#lsl 8      @ Get x240 by 256 - 16
   sub   r3,r3,r1,#lsr 4      @ ...
   movs  r0,r0,#0, #lsr 1     @ x/2 + address and use carry for low/high
   add   r3,r3,r0             @ Add x to address
   rsc   r0,r0,#1             @ Use reverse subtract carry to reverse the carry for address read. 
   ldrb  r0,[r3,r0]           @ Read the byte (r0 is 0 or 1 for offset)
   orrcs r0,r0,r2,#lsl 8      @ if Carry, move color value high
   orrcc r0,r2,r0,#lsl 8      @ if not, color low, video value high
   strh  r0,[r3]              @ store the result with base address only.
   
   bx    r14                  @ return


The previous version wasn't storing the value in the right memory location due to halfword accesses not being able to register offset. BTW, if I'm wrong on halfword access mode (don't have docs here - also the reverse subtract carry pnumonic may be incorrect), we can shave 1 more cycle off this by throwing away the adds X to address add and just offseting by r3 below. Not bad: 16 cycles (not including bx which I think would be 8! cycles here if jumping back to ROM code) for a mode 4 pixel write. I'd love to be able to combine the movs and the following add instructions as they aren't pulling their own weight in this code. I know there's 1 more cycle to be had here. Also, correct me if I'm wrong on the timing. I believe the read in 4 cycles and write is 5 here (assuming VRAM access) - at least that is what no$gba is telling me.

#17325 - FluBBa - Sat Mar 06, 2004 12:19 pm

I think it should be something like this...
Code:
PlotPixel:
  rsb  r1,r1,r1,#lsl 4   @ Get x15
  add  r3,r3,r0       @ Add x to address
  tst  r0,r0,#1       @ test odd address.
  ldrb r0,[r3,r1,#lsl 4]      @ Read the byte (r1=y*15*16=y*240)
  orrcs r0,r0,r2,#lsl 8   @ if Carry, move color value high
  orrcc r0,r2,r0,#lsl 8   @ if not, color low, video value high
  strh r0,[r3,r1,#lsl4]       @ store the result with base address only.
 
  bx  r14         @ return


Not tested yet though.
_________________
I probably suck, my not is a programmer.

#17328 - torne - Sat Mar 06, 2004 1:36 pm

FluBBa: that code does not work. Halfword loads and stores use ARM addressing mode 3, which does not allow use of the shifter to construct addresses. Only offsets by a constant or a register are allowed, no scaled register offsets.
Mike: that's the answer to your question too. Do you not have a copy of the ARM ARM? If you google about, many ARM partners have accidentally made it available for download on their website (which makes it their fault, not yours, if you download it) - I got mine from Altera but that link's been removed since.

#17345 - Miked0801 - Sat Mar 06, 2004 6:40 pm

It was 1:00am at home so I didn't have any docs available :)

That said, is it true that it is broken? If so, I'll look at it some more and figure it out.

BTW, no need to use test. The mov instruction earlier should save the flag. If it's lost on the rsc, then set the flag on that instruction and chcek against the zero flag instead. Again, just blind coding, but the results should be really close to working if not already.

#17348 - Lupin - Sat Mar 06, 2004 7:09 pm

poslundc: the function shows how to plot 2 8 bit pixels and that's why it is important to have the color data type 16 bit. Plotting single 8 bit pixels in mode 4 makes no sense to me because it is a slowdown.

#17354 - Miked0801 - Sat Mar 06, 2004 8:42 pm

Ok, at work - have a compiler and imagine that, my syntax was absolute crap. Here's a version that is tested to work.

Code:

PlotPixel:
   add    r3,r3,r1, lsl #8      @ Get x240 by 256 - 16
   sub    r3,r3,r1, lsl #4      @ ...
   movs   r0,r0, lsr #1         @ Place low bit on Carry (shifting X off by 1)   
   add    r3,r3,r0, lsl #1      @ Add x (without low bit) back into base address
   ldrcsb r0,[r3]               @ Read the low byte if Carry set
   ldrccb r0,[r3,#1]            @ Read the high byte if Carry Clear
   orrcs  r0,r0,r2, lsl #8      @ if Carry Set, move color value high
   orrcc  r0,r2,r0, lsl #8      @ else, color low, video value high
   strh   r0,[r3]               @ store the result with base address only.
   
   bx    r14                    @ return



There is 1 more cycle to be had here - I just know it. I'm using mov, I've got 2 wasted cycles on nops, and I'm not using a register for offseting on my load/saves. Anyways, enjoy.

#17370 - FluBBa - Sat Mar 06, 2004 11:56 pm

Ok, thought that one over and came up with this...
Code:
PlotPixel:
  rsb  r1,r1,r1,#lsl 4   @ Get Yx15
  eor r0,r0,#1       @read the other byte
  add  r3,r3,r0       @ Add X to address
  tst  r0,r0,#1       @ test odd address.
  ldrb r0,[r3,r1,#lsl 4]!      @ Read the byte (Y*15*16=Y*240), writeback adr to r3
  orrne r0,r0,r2,#lsl 8   @ if Carry, move color value high
  orreq r0,r2,r0,#lsl 8   @ if not, color low, video value high
  strh r0,[r3]       @ store the result with base address only.
 
  bx  r14         @ return

Not much different in timing from Miked0801's though.
_________________
I probably suck, my not is a programmer.

#17371 - LOst? - Sun Mar 07, 2004 12:02 am

I did this for my GDI driver in my Windows XP port. It's only a Put Pixel and it is used for drawing lines and circles. I haven't found a way to move a whole buffered picture that isn't ^2 though.

Code:

   void Display::PutPixel (int x, int y, u8 PalEntry)
   {
      /** Delta check start **/
   
      int x1;
      int y1;
      int x2;
      int y2;
   
      int deltax;
      int deltay;
   
      x1 = ClippingRect.left;
      y1 = ClippingRect.top;
      x2 = ClippingRect.right;
      y2 = ClippingRect.bottom;
   
      // First source check
      if (x2 - x1 <= 0 || y2 - y1 <= 0)
         return;
   
      // Start and stop source correction
      if (x1 < 0)
         x1 = 0;
      if (y1 < 0)
         y1 = 0;
      if (240 < x2)
         x2 = 240;
      if (160 < y2)
         y2 = 160;
   
      // Calculate source deltas
      deltax = x2 - x1;
      deltay = y2 - y1;
      
      // Final check
      if (deltax <= 0 || deltay <= 0)
         return;
   
      /** Delta check end **/
      
      u16 color;
   
      if (x1 <= x && y1 <= y && x < x2 && y < y2)
      {
         if ((x & 1) == 0)
         {
            color = (PalEntry & 0xFF);
            VirtualScreen [y * 120 + (x >> 1)] = (VirtualScreen [y * 120 + (x >> 1)] & 0xFF00) | color;
         }
         else
         {
            color = (PalEntry & 0xFF) << 8;
            VirtualScreen [y * 120 + (x >> 1)] = (VirtualScreen [y * 120 + (x >> 1)] & 0xFF) | color;
         }
      }
   }


And it worked pretty good after all, which made me happy :)

/LOst

#17389 - Miked0801 - Sun Mar 07, 2004 5:24 am

Ok, I knew there was an extra cycle to be had :) Your's is 11% faster with the extra cycle gone (flubba). Well done. :)

#17396 - Miked0801 - Sun Mar 07, 2004 9:18 am

Ok, found a problem. On X is even (0,2,4...), you eor to Odd, then write that address to r3. Later you're halfword storing to that odd address which is (according to my documentation an experience) illegal. We'll have to try again - see the same thread in the asm section :)

#17402 - batblaster - Sun Mar 07, 2004 12:18 pm

This is the faster Way to write a "C" Version

Code:

inline void PutPixel(u16 *pScreen, u32 X, u32 Y, u16 Color)
{
   u16 *StoreHere = &pScreen[((Y*240)+X)>>1];

   if(X&1)
      *StoreHere = (Color<<8) + (*StoreHere & 0x00FF);
   else
      *StoreHere = (*StoreHere & 0xFF00) + Color;
}

_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------