#161492 - Jeremysr - Tue Aug 05, 2008 1:41 am
I'm making an Image class and need a way to draw an image zoomed out. Zooming in is easy, you just take each pixel and make a 2x2 (or whatever) square filled with the colour of that pixel. But to zoom out I guess you have to take a 2x2 (or larger) square of pixels from the image, and convert those 4 pixels into 1 pixel. So I guess my question is how would I take the four colours of the 2x2 square of pixels and change it into 1 pixel?
If the way I explained it was confusing, here's an example. This is an image of an "X" with 0 representing black and 1 representing white
1100000011
1100000011
0011001100
0011001100
0000110000
0000110000
0011001100
0011001100
1100000011
1100000011
I want to change it to:
10001
01010
00100
01010
10001
See how the 4 highlighted 1's are changed into a single 1 in the lower image, and the highlighted 0's are changed into a single 0? What if the 4 pixels were all different colours? Then what would the single pixel in the bottom image be?
_________________
viewsourcecode
#161493 - Dwedit - Tue Aug 05, 2008 1:48 am
The two most common ways to shrink an image by 2x are:
* Take one pixel from the four and use that ("Nearest Neighbor")
* Take the average of all four pixels (linear interpolaton)
Remember to average the R G and B components separately of course.
There are other methods of interpolation as well.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#161494 - Jeremysr - Tue Aug 05, 2008 2:01 am
Ok, I kind of thought of both those methods actually, I just thought the first method wouldn't look good and the second one I couldn't think of how to "put 4 colours together". I guess I'll try both methods and see which one works best for the image I want to be scaled.
Thanks!
_________________
viewsourcecode
#161498 - gauauu - Tue Aug 05, 2008 3:26 am
And to state the obvious: it also depends on how much time/processor power you have available and how much image data you are working with. The linear interpolation would be significantly slower than the nearest neighbor. Of course, depending on what you're doing, that "significantly slower" could be pretty insignificant ;-)
#161500 - Jeremysr - Tue Aug 05, 2008 3:55 am
gauauu wrote: |
And to state the obvious: it also depends on how much time/processor power you have available and how much image data you are working with. The linear interpolation would be significantly slower than the nearest neighbor. Of course, depending on what you're doing, that "significantly slower" could be pretty insignificant ;-) |
Yes, I think I'll have to go with the Nearest Neighbour method. My image is 4 times as big as the DS screen, and my RGB values are packed into 2 bytes so a lot of time would be spent unpacking them.
_________________
viewsourcecode
#161517 - Dwedit - Tue Aug 05, 2008 7:34 pm
Not that hard to unpack and average 15 bit color...
Code: |
static __inline u16 average_color(u16 a, u16 b, u16 c, u16 d)
{
int redsum,greensum,bluesum;
redsum=(a&0x31) + (b&0x31) + (c&0x31) + (d&0x31);
greensum=(a&0x3E0) + (b&0x3E0) + (c&0x3E0) + (d&0x3E0);
bluesum=(a&0x7C00) + (b&0x7C00) + (c&0x7C00) + (d&0x7C00);
return ((redsum/4)&0x31) + ((greensum/4)&0x3E0) + ((bluesum/4)&0x7C00);
}
|
No, dividing by 4 isn't slow, any decent compiler turns a division by a power of two into a bitshift.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#161524 - DekuTree64 - Tue Aug 05, 2008 10:43 pm
If you don't mind a little darkening of the image, you can even do all the channels at once:
Code: |
static __inline u16 average_color(u16 a, u16 b, u16 c, u16 d)
{
const u16 mask = RGB(7,7,7);
return (a/4 & mask) + (b/4 & mask) + (c/4 & mask) + (d/4 & mask);
} |
Normally each channel ranges 0-31, so after div by 4, they're 0-7. But because of the divide, the low 2 bits of each channel gets shifted into the upper 2 bits of the one below it, so those bits have to be masked off.
You can make it even faster by putting a and b in one u32, and c and d in another u32 so you can add 2 pixels at once, and then add those sums together.
Code: |
static __inline u16 average_color(u32 ab, u32 cd)
{
const u32 mask = RGB(7,7,7) | (RGB(7,7,7) << 16);
u32 temp = (ab/4 & mask) + (cd/4 & mask);
return (u16)(temp + (temp >> 16));
} |
There's an even better trick though, that won't cause any darkening, and can be used with alpha blending too. I call it the poor man's MMX. It's not much faster than Dwedit's version unless you do 2 pixels at once, but basically the idea is to separate every other channel so there is an unused space between each value. So if you have a pair of pixels in a u32 as bgrbgr, you separate them like 0g0b0r and b0r0g0. Then you can do some multiplies and shifts and adds that affect all 3 values in each u32, and then OR them back together.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#161529 - simonjhall - Wed Aug 06, 2008 12:00 am
Fantastic stuff! I'll have to remember that! However, (it's late here, and I'm tired) how is there darkening in the first version of yours?
_________________
Big thanks to everyone who donated for Quake2
#161530 - Dwedit - Wed Aug 06, 2008 12:09 am
It's darkening because you divide by 4 before adding them together.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#161531 - DekuTree64 - Wed Aug 06, 2008 12:23 am
Yeah, if all pixels were white (31), you still end up with 28 because you're effectively setting the low 2 bits to 0 before adding.
Another trick to sort of fix darkening problems like that, is to OR the upper bits of each channel into the lower bits, so 0-7 scales somewhat linearly to 0-31 (with a bit of randomness because those lower bits may not be 0 before the OR):
Code: |
static __inline u16 average_color(u32 ab, u32 cd)
{
const u32 mask = RGB(7,7,7) | (RGB(7,7,7) << 16);
u32 temp = (ab/4 & mask) + (cd/4 & mask);
temp += temp >> 16;
temp |= (temp >> 3) & RGB(3,3,3);
return (u16)temp;
} |
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#161533 - TwentySeven - Wed Aug 06, 2008 12:54 am
Neat discussion!
Assuming the obvious, you can't do this offline and just have an asset stored at the correct resolution?
#161554 - keldon - Wed Aug 06, 2008 7:57 am
You can also use the classic ((coloura & RGB(30,30,30))/2)+((colourb & RGB(30,30,30))/2)) in two steps for the four pixels. You can also use (colour & RGB(31,0,31)) | ((colour & RGB (0,31,0)) << 16) to add and shift freely in full comfort of integer precision!
p.s. that's a pretty neat trick to scale Deku ... oh and I see you already mentioned seperating (poor-man MMX), only your way is better as it deals with two colours at once.
#161559 - simonjhall - Wed Aug 06, 2008 8:54 am
Dwedit wrote: |
It's darkening because you divide by 4 before adding them together. |
Ah, that's what I thought. Though, isn't that more like it reduces the colour resolution rather than the brightness?
_________________
Big thanks to everyone who donated for Quake2
#161560 - keldon - Wed Aug 06, 2008 9:17 am
(31+31) / 2 = 31
(31/2) + (31/2) = 30
#161562 - kusma - Wed Aug 06, 2008 10:21 am
Here's my attempt:
Two pixels in parallel, accurate result (minus rounding). Not tested.
Code: |
u32 avg4(u32 a, u32 b, u32 c, u32 d)
{
u32 e = a & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
e += b & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
e += c & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
e += d & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
u32 f = (a & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
f += (b & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
f += (c & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
f += (d & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
return (e >> 2) | f;
}
|
The idea is to mask out every second color channel, and doing the additions in two passes to get all channels done.
#161563 - DekuTree64 - Wed Aug 06, 2008 10:48 am
Kusma: I'm confused. Shouldn't there only be 2 inputs (4 pixels as 2 pairs), and add the upper and lower 16 bits together at some point?
And Keldon, thanks for the single-pixel variation :)
For some reason I hadn't thought of putting the pixel's own green value in the upper bits. Will be handy if I ever need to do per-pixel alpha in software.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#161564 - Cearn - Wed Aug 06, 2008 11:32 am
DekuTree64 wrote: |
Yeah, if all pixels were white (31), you still end up with 28 because you're effectively setting the low 2 bits to 0 before adding.
Another trick to sort of fix darkening problems like that, is to OR the upper bits of each channel into the lower bits, so 0-7 scales somewhat linearly to 0-31 (with a bit of randomness because those lower bits may not be 0 before the OR):
Code: | static __inline u16 average_color(u32 ab, u32 cd)
{
const u32 mask = RGB(7,7,7) | (RGB(7,7,7) << 16);
u32 temp = (ab/4 & mask) + (cd/4 & mask);
temp += temp >> 16;
temp |= (temp >> 3) & RGB(3,3,3);
return (u16)temp;
} |
|
Joining in the choir: that is really very clever.
kusma wrote: |
Code: | u32 avg4(u32 a, u32 b, u32 c, u32 d)
{
u32 e = a & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
e += b & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
e += c & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
e += d & (RGB(31,0,31) | (RGB( 0,31,0) << 16));
u32 f = (a & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
f += (b & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
f += (c & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
f += (d & (RGB( 0,31,0) | (RGB(31,0,31) << 16))) >> 2;
return (e >> 2) | f;
} |
|
That doesn't quite work. Using an R_B_G_ mask (0x7C1F03E0), blue is in bits 26-30. The result of the four additions would span bits 26-32; it would overflow. Second, you still need to apply a mask after the division by 4.
This should work:
Code: |
//! Rotate bits right. Yes, this does lead to a ror instruction.
static inline u32 ROR(u32 x, uint ror)
{ return (x<<(32-ror)) | (x>>ror); }
//! Averages 4 colors.
u16 color_average(u16 a, u16 b, u16 c, u16 d)
{
u32 ab, cd, avg;
const u32 mask= 0x03E07C1F;
// Combine 2 colors into one u32
ab= a | b<<16;
cd= c | d<<16;
// Split into _G_R_B and add
ab= (mask & ab) + (mask & ROR(ab, 16));
cd= (mask & cd) + (mask & ROR(cd, 16));
// Average _G_R_B
avg= mask & ((ab+cd)>>2);
// Combine to RGB___ and cast to u16
return (avg | avg<<16)>>16;
}
|
I think it's possible to round as well by adding mask/2 somewhere, but I haven't tested that yet. Oh, and what about the alpha-bit.
About alpha blending, I did a routine for that a few years ago here.
On a final note: sometimes GCC produces very silly ARM code for ANDing large constants. Instead of loading the constant once and using a single instruction for the AND (or AND NOT), it'll split it over multiple byte-wise ANDs. YMMV.
#161565 - kusma - Wed Aug 06, 2008 11:59 am
DekuTree64 wrote: |
Kusma: I'm confused. Shouldn't there only be 2 inputs (4 pixels as 2 pairs), and add the upper and lower 16 bits together at some point?
|
I was actually thinking that you'd do two complete sets in parallel, but I guess that complicates the fetching-logic a bit.
Cearn wrote: |
That doesn't quite work. Using an R_B_G_ mask (0x7C1F03E0), blue is in bits 26-30. The result of the four additions would span bits 26-32; it would overflow. |
Notice the right-shift by two before summing for the value f.
Cearn wrote: |
Second, you still need to apply a mask after the division by 4.
|
Good point, that's a bug. Should be pretty easily solvable ;)
#161567 - keldon - Wed Aug 06, 2008 3:01 pm
Hmm; I only just cottoned on that those extra 5 bits allow for a 5 bit (positive) multiplication on each colour component.
#161568 - kusma - Wed Aug 06, 2008 3:15 pm
Code: |
u32 avg4(u32 a, u32 b, u32 c, u32 d)
{
const u32 e_mask = RGB(31,0,31) | (RGB( 0,31,0) << 16);
u32 e = a & e_mask;
e += b & e_mask;
e += c & e_mask;
e += d & e_mask;
const u32 f_mask = RGB( 0,31,0) | (RGB(31,0,31) << 16);
u32 f = (a & f_mask) >> 2;
f += (b & f_mask) >> 2;
f += (c & f_mask) >> 2;
f += (d & f_mask) >> 2;
return ((e >> 2) & e_mask) | (f & f_mask);
}
|
There. Now it's even tested! Here's the test-code (Yes yes, a bit dirty.. but who cares?):
Code: |
u32 avg4_ref(u32 a, u32 b, u32 c, u32 d)
{
u32 r_low = (GET_R(a) + GET_R(b) + GET_R(c) + GET_R(d)) / 4;
u32 g_low = (GET_G(a) + GET_G(b) + GET_G(c) + GET_G(d)) / 4;
u32 b_low = (GET_B(a) + GET_B(b) + GET_B(c) + GET_B(d)) / 4;
u32 r_hi = (GET_R(a >> 16) + GET_R(b >> 16) + GET_R(c >> 16) + GET_R(d >> 16)) / 4;
u32 g_hi = (GET_G(a >> 16) + GET_G(b >> 16) + GET_G(c >> 16) + GET_G(d >> 16)) / 4;
u32 b_hi = (GET_B(a >> 16) + GET_B(b >> 16) + GET_B(c >> 16) + GET_B(d >> 16)) / 4;
return RGB(r_low, g_low, b_low) | (RGB(r_hi, g_hi, b_hi) << 16);
}
#include <stdlib.h>
int main(int argc, char* argv[])
{
for (int i = 0; i < 10000000; ++i)
{
u32 a = RGB(rand() & 31, rand() & 31, rand() & 31) | RGB(rand() & 31, rand() & 31, rand() & 31) << 16;
u32 b = RGB(rand() & 31, rand() & 31, rand() & 31) | RGB(rand() & 31, rand() & 31, rand() & 31) << 16;
u32 c = RGB(rand() & 31, rand() & 31, rand() & 31) | RGB(rand() & 31, rand() & 31, rand() & 31) << 16;
u32 d = RGB(rand() & 31, rand() & 31, rand() & 31) | RGB(rand() & 31, rand() & 31, rand() & 31) << 16;
u32 res = avg4(a,b,c,d);
u32 ref = avg4_ref(a,b,c,d);
if (res != ref)
{
printf("err - got <%02x %02x %02x>, expected <%02x %02x %02x>\n",
GET_R(res), GET_G(res), GET_B(res),
GET_R(ref), GET_G(ref), GET_B(ref)
);
}
}
return 0;
}
|
#161578 - tepples - Wed Aug 06, 2008 7:33 pm
You'll have to mask e and f before returning.
Here's tested code to perform palette fading, copied straight from the source code of luminesweeper. You should be able to adapt the general algorithm for this 2x2 linear shrink.
Code: |
void set_palette_dimmed(void *in_dst,
const void *in_src,
unsigned int dim_to,
unsigned int dim_level)
{
u32 *dst = in_dst;
const u32 *src = in_src;
unsigned int n;
unsigned int to_expanded = (dim_to & 0xFFFF) | (dim_to << 16);
unsigned int to_0 = ((to_expanded & 0x7C1F03E0) >> 4) * dim_level;
unsigned int to_1 = ((to_expanded & 0x03E07C1F)) * dim_level;
/* optimization for fade all the way out */
if(dim_level == 0)
{
dma_memcpy(in_dst, in_src, 32);
return;
}
dim_level = 16 - dim_level;
for(n = 8; n > 0; n--)
{
unsigned int from = *src++;
unsigned int from_0 = ((from & 0x7C1F03E0) >> 4) * dim_level;
unsigned int from_1 = ((from & 0x03E07C1F)) * dim_level;
from_0 = ((from_0 + to_0)) & 0x7C1F03E0;
from_1 = ((from_1 + to_1) >> 4) & 0x03E07C1F;
*dst++ = from_0 | from_1;
}
}
|
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#161582 - kusma - Wed Aug 06, 2008 8:02 pm
tepples wrote: |
You'll have to mask e and f before returning. |
...like the last version I posted does...