#17282 - batblaster - Fri Mar 05, 2004 6:32 pm
Hello ,
I want to know if someone have a good plotpixel e/o drawLine for mode 4 but expecially for mode 4 with the check of odd and even pixel address..
Thanks a lot...
P.S. "C" or ASM no probs...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------
#17288 - Lupin - Fri Mar 05, 2004 9:15 pm
#define TIMES_240(x) (((x) << 8) - ((x) << 4))
#define TIMES_120(x) (((x) << 7) - ((x) << 3))
void vplotixel16(int x, int y, u16 c) {
register u16* p = VideoBuffer + TIMES_240(y) + x;
register u16 rc = c;
*p = rc;
}
#17296 - poslundc - Fri Mar 05, 2004 9:41 pm
Ugh... sorry Lupin, but you've gotta take a second look...
1. Mode 4 is 8-bit paletted, not 16-bit direct.
2. Even with optimizations turned off, gcc will automatically optimize multiplication by a constant to shifts and adds where appropriate.
3. Your intermediate variable rc doesn't do anything.
4. Please use BBCode tags when posting code.
In addition to those more important points, the register directives are unlikely to do anything helpful (in fact, declaring the second one as a register is a very bad suggestion to the compiler), and using the u16 datatype as a parameter is just going to cause additional overhead in the function call.
Dan.
#17301 - Miked0801 - Fri Mar 05, 2004 11:03 pm
Ok, blind coding here
This would need a C header like
void PlotPixel(u32 xPos, u32 yPos, u8 color, u16 *videoBuffer);
Code: |
@ Assume 1st param is X and 2nd is Y and both are on screen (should be asserted)
.GLOBAL PlotPixel
.CODE 32
@ r0 = x
@ r1 = y
@ r2 = 8-bit value
@ r3 = Video buffer (probably should be a constant, but no biggie)
PlotPixel:
add r3,r1,r1,#lsl 8 @ Get x240 by 256 - 16
sub r3,r3,r1,#lsr 4 @ ...
movs r0,r0,#0,#lsr 1 @ /2 and set carry if we want low byte saved
adc r0,r0,#0 @ Add carry back in
eor r0,r0,#1 @ Reverse low byte to read opposite of the address
ldrb r0,[r3,r0] @ Read the byte
orrcs r0,r0,r2,#lsl 8 @ if Carry, move color value high
orrcc r0,r2,r0,#lsl 8 @ if not, color low, video value high
strh r0,[r3] @ store the result
bx r14 @ return
|
For real speed, get rid of call/return (bx), and just macro. The bx takes more than than the inside code. Also, I'm pretty sure 1 more cycle can be shaved by combining the mov, eor, adc in a more clever manner, but I can't see it right now. Anyone else care to compile at test :)
Mike
#17307 - batblaster - Sat Mar 06, 2004 2:32 am
Many thanks, i will try to sompile and check if work correctly but on r3 i need to pass the videomemory location like 0x06000000 or 0x0600a000 for the backbuffer...
Is not good to use a variant like this ???
ADD r3, r3, #0x6000000 @ r3 = r3 + VRAM_ADDR
@ START - F_DISCNT_BUFSEL_GET
MOV r7, #0x04000000 @ r3 = 0x04000000
LDR r7, [r7] @ r4 = val from addr 0x04000000
AND r7, r7, #0x10 @ r4 = r4 & 0x10
MOV r7, r7, LSR #4 @ r4 = r4 >> 4
@ End - F_DISCNT_BUFSEL_GET
CMP r7, #1 @ r4 == 1
ADDNE r3, r3, #0xA000 @ r3 = r3 + 0xA000 (Backbuffer Addr)
I'm not very good in asm on ARM and i don't know if is faster or not. A check of what memory buffer you want fill in "C" before calling the routine is needed...
Thanks thanks...
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------
#17308 - Miked0801 - Sat Mar 06, 2004 2:45 am
Just past in the 4th param as your buffer (r3) and it will work as is. That's way I made it a register to begin with. You're supposed to pass in the base address into the register for it to work right.
Make a 100 pixel vertical line down the screen.
for(i=0; i<100, i++)
{
PlotPixel(0, i, 0x80, (u16 *)(0x06000000));
}
or
make it to the back buffer
for(i=0; i<100, i++)
{
PlotPixel(0, i, 0x80, (u16 *)(0x0600a000));
}
or
to EWRAM if you want
for(i=0; i<100, i++)
{
PlotPixel(0, i, 0x80, (u16 *)(0x02000000));
}
The code doesn't care.
#17320 - Miked0801 - Sat Mar 06, 2004 9:43 am
Was trying to go to sleep when my mind chimed in and told me there was a bug in my pixel code:
Code: |
PlotPixel:
add r3,r1,r1,#lsl 8 @ Get x240 by 256 - 16
sub r3,r3,r1,#lsr 4 @ ...
movs r0,r0,#0, #lsr 1 @ x/2 + address and use carry for low/high
add r3,r3,r0 @ Add x to address
rsc r0,r0,#1 @ Use reverse subtract carry to reverse the carry for address read.
ldrb r0,[r3,r0] @ Read the byte (r0 is 0 or 1 for offset)
orrcs r0,r0,r2,#lsl 8 @ if Carry, move color value high
orrcc r0,r2,r0,#lsl 8 @ if not, color low, video value high
strh r0,[r3] @ store the result with base address only.
bx r14 @ return
|
The previous version wasn't storing the value in the right memory location due to halfword accesses not being able to register offset. BTW, if I'm wrong on halfword access mode (don't have docs here - also the reverse subtract carry pnumonic may be incorrect), we can shave 1 more cycle off this by throwing away the adds X to address add and just offseting by r3 below. Not bad: 16 cycles (not including bx which I think would be 8! cycles here if jumping back to ROM code) for a mode 4 pixel write. I'd love to be able to combine the movs and the following add instructions as they aren't pulling their own weight in this code. I know there's 1 more cycle to be had here. Also, correct me if I'm wrong on the timing. I believe the read in 4 cycles and write is 5 here (assuming VRAM access) - at least that is what no$gba is telling me.
#17325 - FluBBa - Sat Mar 06, 2004 12:19 pm
I think it should be something like this...
Code: |
PlotPixel:
rsb r1,r1,r1,#lsl 4 @ Get x15
add r3,r3,r0 @ Add x to address
tst r0,r0,#1 @ test odd address.
ldrb r0,[r3,r1,#lsl 4] @ Read the byte (r1=y*15*16=y*240)
orrcs r0,r0,r2,#lsl 8 @ if Carry, move color value high
orrcc r0,r2,r0,#lsl 8 @ if not, color low, video value high
strh r0,[r3,r1,#lsl4] @ store the result with base address only.
bx r14 @ return |
Not tested yet though.
_________________
I probably suck, my not is a programmer.
#17328 - torne - Sat Mar 06, 2004 1:36 pm
FluBBa: that code does not work. Halfword loads and stores use ARM addressing mode 3, which does not allow use of the shifter to construct addresses. Only offsets by a constant or a register are allowed, no scaled register offsets.
Mike: that's the answer to your question too. Do you not have a copy of the ARM ARM? If you google about, many ARM partners have accidentally made it available for download on their website (which makes it their fault, not yours, if you download it) - I got mine from Altera but that link's been removed since.
#17345 - Miked0801 - Sat Mar 06, 2004 6:40 pm
It was 1:00am at home so I didn't have any docs available :)
That said, is it true that it is broken? If so, I'll look at it some more and figure it out.
BTW, no need to use test. The mov instruction earlier should save the flag. If it's lost on the rsc, then set the flag on that instruction and chcek against the zero flag instead. Again, just blind coding, but the results should be really close to working if not already.
#17348 - Lupin - Sat Mar 06, 2004 7:09 pm
poslundc: the function shows how to plot 2 8 bit pixels and that's why it is important to have the color data type 16 bit. Plotting single 8 bit pixels in mode 4 makes no sense to me because it is a slowdown.
#17354 - Miked0801 - Sat Mar 06, 2004 8:42 pm
Ok, at work - have a compiler and imagine that, my syntax was absolute crap. Here's a version that is tested to work.
Code: |
PlotPixel:
add r3,r3,r1, lsl #8 @ Get x240 by 256 - 16
sub r3,r3,r1, lsl #4 @ ...
movs r0,r0, lsr #1 @ Place low bit on Carry (shifting X off by 1)
add r3,r3,r0, lsl #1 @ Add x (without low bit) back into base address
ldrcsb r0,[r3] @ Read the low byte if Carry set
ldrccb r0,[r3,#1] @ Read the high byte if Carry Clear
orrcs r0,r0,r2, lsl #8 @ if Carry Set, move color value high
orrcc r0,r2,r0, lsl #8 @ else, color low, video value high
strh r0,[r3] @ store the result with base address only.
bx r14 @ return
|
There is 1 more cycle to be had here - I just know it. I'm using mov, I've got 2 wasted cycles on nops, and I'm not using a register for offseting on my load/saves. Anyways, enjoy.
#17370 - FluBBa - Sat Mar 06, 2004 11:56 pm
Ok, thought that one over and came up with this...
Code: |
PlotPixel:
rsb r1,r1,r1,#lsl 4 @ Get Yx15
eor r0,r0,#1 @read the other byte
add r3,r3,r0 @ Add X to address
tst r0,r0,#1 @ test odd address.
ldrb r0,[r3,r1,#lsl 4]! @ Read the byte (Y*15*16=Y*240), writeback adr to r3
orrne r0,r0,r2,#lsl 8 @ if Carry, move color value high
orreq r0,r2,r0,#lsl 8 @ if not, color low, video value high
strh r0,[r3] @ store the result with base address only.
bx r14 @ return |
Not much different in timing from Miked0801's though.
_________________
I probably suck, my not is a programmer.
#17371 - LOst? - Sun Mar 07, 2004 12:02 am
I did this for my GDI driver in my Windows XP port. It's only a Put Pixel and it is used for drawing lines and circles. I haven't found a way to move a whole buffered picture that isn't ^2 though.
Code: |
void Display::PutPixel (int x, int y, u8 PalEntry)
{
/** Delta check start **/
int x1;
int y1;
int x2;
int y2;
int deltax;
int deltay;
x1 = ClippingRect.left;
y1 = ClippingRect.top;
x2 = ClippingRect.right;
y2 = ClippingRect.bottom;
// First source check
if (x2 - x1 <= 0 || y2 - y1 <= 0)
return;
// Start and stop source correction
if (x1 < 0)
x1 = 0;
if (y1 < 0)
y1 = 0;
if (240 < x2)
x2 = 240;
if (160 < y2)
y2 = 160;
// Calculate source deltas
deltax = x2 - x1;
deltay = y2 - y1;
// Final check
if (deltax <= 0 || deltay <= 0)
return;
/** Delta check end **/
u16 color;
if (x1 <= x && y1 <= y && x < x2 && y < y2)
{
if ((x & 1) == 0)
{
color = (PalEntry & 0xFF);
VirtualScreen [y * 120 + (x >> 1)] = (VirtualScreen [y * 120 + (x >> 1)] & 0xFF00) | color;
}
else
{
color = (PalEntry & 0xFF) << 8;
VirtualScreen [y * 120 + (x >> 1)] = (VirtualScreen [y * 120 + (x >> 1)] & 0xFF) | color;
}
}
}
|
And it worked pretty good after all, which made me happy :)
/LOst
#17389 - Miked0801 - Sun Mar 07, 2004 5:24 am
Ok, I knew there was an extra cycle to be had :) Your's is 11% faster with the extra cycle gone (flubba). Well done. :)
#17396 - Miked0801 - Sun Mar 07, 2004 9:18 am
Ok, found a problem. On X is even (0,2,4...), you eor to Odd, then write that address to r3. Later you're halfword storing to that odd address which is (according to my documentation an experience) illegal. We'll have to try again - see the same thread in the asm section :)
#17402 - batblaster - Sun Mar 07, 2004 12:18 pm
This is the faster Way to write a "C" Version
Code: |
inline void PutPixel(u16 *pScreen, u32 X, u32 Y, u16 Color)
{
u16 *StoreHere = &pScreen[((Y*240)+X)>>1];
if(X&1)
*StoreHere = (Color<<8) + (*StoreHere & 0x00FF);
else
*StoreHere = (*StoreHere & 0xFF00) + Color;
}
|
_________________
Batblaster / 7 Raven Studios Co. Ltd
------------------------------------------