gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Graphics > 2D Bitmap Drawing Optimization

#157667 - thegamefreak0134 - Tue May 27, 2008 7:42 pm

I have the following code in an image editing program that I'm working on for DS:

Code:

for (i = 0; i < maxDraw; i++)
      {
         for (j = 0; j < maxDraw; j+=2)
         {
            if (
            i + aOffY < height &&
            j + aOffX < width
            )
            {
               
               BG_GFX[i * 128 + (j/2)] = (picture->getIndex(j + 1 + aOffX, i + aOffY) << 8) + picture->getIndex(j + aOffX, i + aOffY);
               
            } else {
               
               BG_GFX[i * 128 + (j/2)] = 0;
               
            }
         }
      }


picture is a pcx_file object, from a class I wrote myself to handle pcx files. The getIndex() member of that object looks like this:

Code:

int pcx_file::getIndex(int x, int y)
{
   if (x >= 0 && y >= 0 && x <= width && y <= height )
   {
   //The file position depends heavily on the color depth.
   if (colorDepth == p24bit)
   {
      return -1; //BAD
   }
   if (colorDepth == p256)
   {
      return imageData[y * header.bytesPerLine * header.colorPlanes + x];
   }
   if (colorDepth == p16)
   {
      return
      ( ( ( imageData[y * header.bytesPerLine * header.colorPlanes + header.bytesPerLine * 0 + (x / 8)] & (0x80 >> (x % 8) ) ) >> (7 - (x % 8)) ) << 0 ) +
      ( ( ( imageData[y * header.bytesPerLine * header.colorPlanes + header.bytesPerLine * 1 + (x / 8)] & (0x80 >> (x % 8) ) ) >> (7 - (x % 8)) ) << 1 ) +
      ( ( ( imageData[y * header.bytesPerLine * header.colorPlanes + header.bytesPerLine * 2 + (x / 8)] & (0x80 >> (x % 8) ) ) >> (7 - (x % 8)) ) << 2 ) +
      ( ( ( imageData[y * header.bytesPerLine * header.colorPlanes + header.bytesPerLine * 3 + (x / 8)] & (0x80 >> (x % 8) ) ) >> (7 - (x % 8)) ) << 3 ) ;
      
      
      
   }
   
   }
   //If we get here, something has gone wrong.
   //Return index 0 for all out of bounds things, or undefined things.
   return 0;
}


PCX files are weird, so the part of getIndex() that handles 16 color palette indexes is reeeally strange looking. That's not the part I'm worried about though, namely because I'm trying not to think about it yet. I'm using a 256 color image, and when the image data takes up the entire screen, it can draw the image in about 4 frames.

I can't really code this in a way that would eliminate the for loop, as it would break my PCX class and make it not portable anymore. (The edited image still gets saved properly, with 8 bits per color, even though the DS only displays it with 5.)

Is there an obvious optimization that I'm missing here? I've restructured the code quite a bit so that as little is calculated inside the loop as possible, sacrificing code size in a couple of places elsewhere. I'd just like to see if I can get it running faster than that if possible.

I can clarify any parts of the code if necessary, and I guarantee that it does, in fact, work without any bugs.

Thanks!

-thegamefreak
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157671 - silent_code - Tue May 27, 2008 8:00 pm

i know, this is "slightly" what you *didn't* ask for, but well, here i go:

try not to recalculate stuff you already have the answer to... ;^D

i see some multiplications that are redundant, like "y * header.bytesPerLine * header.colorPlanes", where you don't need to calculate it every time.

although it is possible, that the compiles already does it for you (i'm not a compiler expert), i wouldn't bet on it.

try to think in that direction as you step through your code. look at what is being calculated when and if that is really neccessary.

i haven't looked very closely, but it appears to me, that you can eliminate the inner for loop and maybe you could even used a pointer instead of the array indexing.

have you tried profiling your code? simonjhall has released his profiler (search the forum) and other may have aswell. it's worth checking out where you're spending most of your cycles.

happy optimizing! :^)

ps: modulo is slooooooooooooooooooooooooooooooooooooooooooooooooooow! ;^D
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.


Last edited by silent_code on Tue May 27, 2008 8:09 pm; edited 1 time in total

#157673 - DensitY - Tue May 27, 2008 8:08 pm

Quote:
x / 8


^^ I'd avoid divides as well, esp like that as it isn't using the DS's hardware divider

(x >> 3) will be faster :)

and as silent_code said, pointer indexing would be faster

say for sample

uint16 *screen = VRAM_A;

screen+= Y*width+X;

(note this is just an example :p)
for(y = 0; y < something; y++ )
{
for(x = 0; x < something;x++)
{
(*screen++) = pixel
}
screen += width - something;
}

that example isn't really that optimal but you get the idea :)


Last edited by DensitY on Tue May 27, 2008 8:11 pm; edited 1 time in total

#157675 - silent_code - Tue May 27, 2008 8:09 pm

the compiler already takes care of that... (division by power of 2's) ;^)

EDIT: i admit, i also use it explicitly, though. :^D
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.


Last edited by silent_code on Tue May 27, 2008 8:19 pm; edited 2 times in total

#157676 - DensitY - Tue May 27, 2008 8:11 pm

silent_code wrote:
the compiler already takes care of that... ;^)


my experence with GCC has lead me never to trust it lol :p

#157677 - thegamefreak0134 - Tue May 27, 2008 8:11 pm

I know of this issue, I've been working to fix areas like that in my code. I haven't messed with the 16 color mode quite yet (the code you referenced doesn't get run when running a 256 color or a 24bit image) so I'm not worried about it quite yet. I'll update that code anyway if I can though.

I don't think the compiler can quite optimize something like that, as it has to assume that in-between calls to that function, those values can change. The fact that I use the multiply is irrelevant, it would still have to do it at least once per pixel, although I'll grant that it's better than 4. What I can do there is calculate that value once and only once when I load/create the image. That should speed that section up a bit.

Hang on a second and I'll get an approximate speed calc for that code, it should be a lot slower than the 4 frames I'm getting with 256 colors

*goes to test*

...interesting...

Using the code that you mentioned (running a 16 color image) still seems to draw at about the same speed. So I can assume that the 16 color code is... well... fast enough I guess. Odd... That's the code above, I didn't change it at all.

-gamefreak

*EDIT* EEP, you guys are fast! I was aiming this at silent_code's first post.

OOhh.... Pointer indexing... I never really thought of that. I feel so silly. ^_^ I'll go try that now.
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157681 - silent_code - Tue May 27, 2008 8:22 pm

this indicates that some other section might be slowing it down!
but not knowing your code, it's hard to tell (i'm not an expert, just a programmer.) :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#157683 - DensitY - Tue May 27, 2008 8:27 pm

Quote:
*EDIT* EEP, you guys are fast! I was aiming this at silent_code's first post.


nothing else better todo before I head off to work *rumbles* ;)

#157684 - thegamefreak0134 - Tue May 27, 2008 8:35 pm

OK, I changed the code to use a pointer index instead of calculating on the fly, no effect. The tearing on the screen is still in the same spot. There's something else going on here.

It's entirely possible that the DS simply can't handle it, but I'd like to believe there's a way I can make it think otherwise. It's a bit of an odd situation to tell the truth.

This is the edited code:

Code:

//New code:
      u16* screen = BG_GFX;
      for (i = 0; i < maxDraw; i++)
      {
         for (j = 0; j < maxDraw; j+=2)
         {
            if (
            i + aOffY < height &&
            j + aOffX < width
            )
            {
               
               (*screen++) = (picture->getIndex(j + 1 + aOffX, i + aOffY) << 8) + picture->getIndex(j + aOffX, i + aOffY);
               
            } else {
               
               (*screen++) = 0;
               
            }
         }
         screen += 128  - (j>>1);
      }


I will note that some of the code causes the drawing to become optimized when the display is zoomed. (That's the maxDraw thing.) Since I'm designing this to be a pixel perfect editor (the model is Usenti, actually) and it works fine when zoomed, only slowing down when viewing the image at normal size, I'm tempted to just leave it be. However, if there's a way, I still want to try to get it working faster. ^_^ It's going to haunt me forever.

-thegamefreak

*EDIT* That's not to say it hasn't been effective... I counted again, and it appears to be drawing in 3 frames instead of 4. Neat.
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157687 - silent_code - Tue May 27, 2008 9:10 pm

it's definitely time to profile. what else are you doing each frame? (yes, i know you said it was ok when zoomed in, but how "ok" is it?)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#157691 - thegamefreak0134 - Tue May 27, 2008 9:20 pm

The only other things I do every frame involve checking for things like menus that need to pop out. Those have been going just fine, full framerate on all of them. I can perform this same drawing operation on other, smaller images of the same type, and I can even drag a large image offscreen almost and it will draw in a single frame. The issue was when a lot of pixels actually had to be drawn, which is how I knew it was this piece of code.

The nifty part? Large images now draw in a single frame. I decided to outsmart my class a little bit. The class is very nifty on a PC, and very fast for a class, but the following code works without using the class call. Losing that overhead let me put a pointer in place of a calculation, which was the speed boost I needed.

Code:

//New code:
      u16* screen = BG_GFX;
      int newHeight = height - aOffY;
      int newWidth = width - aOffX;
      unsigned char* pImage = &picture->imageData[aOffY * picture->totalBytesPerLine + aOffX];
      
      for (i = 0; i < maxDraw; i++)
      {
         pImage = &picture->imageData[(aOffY + i) * picture->totalBytesPerLine + aOffX];
         for (j = 0; j < maxDraw; j+=2)
         {
            if (
            i < newHeight &&
            j < newWidth
            )
            {
               
               //(*screen++) = (picture->getIndex(j + 1 + aOffX, i + aOffY) << 8) + picture->getIndex(j + aOffX, i + aOffY);
               
               (*screen++) = ((*(pImage+1))<<8) + (*pImage);
               pImage+=2;
               
            } else {
               
               (*screen++) = 0;
               
            }
         }
         screen += 128  - (j>>1);
         
      }


The class is still there, I'm just grabbing a pointer to the bulk of image data (decoded from the file) and doing the conversion inside this loop, with more optimizations than I can do using a separate function call. That's how I want the class to work anyway, the functions are just there to make things easier. In the end, if you need to play with the data yourself, the class needs to allow that, especially for an image type that's designed to be extended.

I still need to re-write this for 16 color images, as they're still really slow and, at the moment, not drawing at all because I was using this same code for both types. However, I think it's going quite well at the moment.

Heck, I might even have 16 color editing occur on a 256 color image, and do the conversion between the two when it's time to save and load. That's a quick fix though... ^_^

-gamefreal
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157694 - silent_code - Tue May 27, 2008 9:29 pm

yeah, *gamefreal* <lol> ;^D

it's good to see you finally made it. calling a function (twice) for every pixel that gets drawn is pretty "much", you're right. but i honestly didn't expect it to be *that* much! :^)

thanks for sharing!

i've learned something today: if you try really hard and ...
that's crap. i'm stopping this right now. :^D
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#157695 - DensitY - Tue May 27, 2008 9:37 pm

awesome work there gamefreak!

I might try removing the array entry after start of the first for loop

pImage = &picture->imageData[(aOffY + i) * picture->totalBytesPerLine + aOffX]

replace it with
pImage = (picture->imageData + ((aOffY + i) * picture->totalBytesPerLine + aOffX));

#157699 - silent_code - Tue May 27, 2008 9:55 pm

what you might also want to try is: "(a + b) * c = ac + bc" ... only, if you could effectively turn it into "d = ac" and "d += c"... get what i mean? (i bet i'm too nerdy and you don't)... i mean in stuff like "(aOffY + i) * picture->totalBytesPerLine". be warned, i'm being unpredictively random today, so didn't give it the though it probably deserves! ;?D
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#157706 - thegamefreak0134 - Tue May 27, 2008 10:25 pm

WAAAAAAAHHHH!!!

Code:

(*screen++) =
                  (
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 0 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 0) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 1 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 1) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 2 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 2) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 3 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 3)
                  ) +
                  ( (
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 0 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 0) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 1 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 1) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 2 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 2) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 3 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 3)
                  ) << 8)
                  ;


WAAAAAAAHHH!!!

OK, here's what this is supposed to do. First, this is how PCX files work with 16 color images. PCX files store everything in planes of color. This means that with a 24bit image, you have 3 planes, red, green, and blue. It stores the data for these planes in chunks per scanline. So, for 8 pixels in PCX 24bit, you get:

RRRRRRRRGGGGGGGGBBBBBBBB

in that order in the file. This is fine and dandy for 24bit images. For 16 color images, PCX files assume you are using 4 planes, similar to how an old 16 color monitor worked. So, and these are bits mind you, we have:

RRRRGGGGBBBBIIII...

etc. The trouble is, we don't use this method at all on the GBA or DS, we use a custom 16 color palette. PCX has space for this custom palette, but you still basically have to scan through the image data and do weird funky math and masking and shifting to get all 4 bits that represent the index for the color at the pixel you're looking standing in a row with all of their ducks in order.

That being said, this code is complicated in its nature. I sense that somewhere, off in the distance, there is a magical faerie who knows how to snag these bits using fancy pointer stuff. I do not know that faerie in person, so perhaps someone here does? ^_^

-thegamefreak

PS: There are a lot of additions here and there, I'm having errors with the implementation as it stands, it's yelling at me when I try to drag it offscreen, which has nothing to do with the issue at hand, I'll fix it as we go.

*EDIT* I should note that header.bytesPerLine is per PLANE, not per scanline.
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157708 - silent_code - Tue May 27, 2008 10:34 pm

sorry if i'm wrong, but can't you just write that all into a 24bit color image buffer? i don't see the point in handling the image data so close to the file format. converting stuff when saving and loading will make your life so much easier. :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#157710 - thegamefreak0134 - Tue May 27, 2008 10:42 pm

This is true, and like I said before, I considered this.

Technically, it's faster to write into a 256 color buffer, since I can go between that format natively. I do write into a 24bit buffer when I'm working with that format. 16 color is the only one that really has issues. I'm seriously considering working with it in 256 color mode within the class, as it will make things a lot easier and faster all around.

The reason I try to keep it close to the file format is that keeping many copies of a 16 color image in RAM is far far easier than keeping many copies of the same image in 24bit color. I'm trying to keep this working in such a way that an undo system will be simple to implement, and thus memory is a concern for me. Now, I realize that this will likely be used for DS size sprite images in most cases, but that's beside the point, I still don't want it to crash if I can manage it.

-thegamefreak
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157712 - Cearn - Wed May 28, 2008 12:12 am

thegamefreak0134 wrote:
WAAAAAAAHHHH!!!

Code:

(*screen++) =
                  (
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 0 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 0) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 1 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 1) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 2 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 2) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 3 + ((j+aOffX) >> 3)] >> (7 - ((j+aOffX) & 7)) ) & 0x01) << 3)
                  ) +
                  ( (
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 0 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 0) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 1 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 1) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 2 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 2) +
                  ( ( ( imageData[yValue + picture->header.bytesPerLine * 3 + ((j+1+aOffX) >> 3)] >> (7 - ((j+1+aOffX) & 7)) ) & 0x01) << 3)
                  ) << 8)
                  ;


Not to be sarcastic, but have you ever heard of the term 'temporary variable'? As a rule, whenever you have lots of repeated terms, it's better to create local variables for these and use those instead:

Code:

// NOTE: untested
PCXHEADER *hdr= &picture->header;
u32 pitch= hdr->bytesPerLine;
u32 ofsA= j+aOffX, ofsB=ofsA+1, shiftA= ~ofsA&7, shiftB= ~(ofsB+1)&7;
u8 *imgLine= &imageData[yValue];


(*screen++) =
   (
      ( ( imgLine[pitch * 0 + ofsA/8] >> shiftA & 0x01) << 0) +
      ( ( imgLine[pitch * 1 + ofsA/8] >> shiftA & 0x01) << 1) +
      ( ( imgLine[pitch * 2 + ofsA/8] >> shiftA & 0x01) << 2) +
      ( ( imgLine[pitch * 3 + ofsA/8] >> shiftA & 0x01) << 3)
   ) +
   (
      ( ( imgLine[pitch * 0 + ofsB/8] >> shiftB & 0x01) << 0) +
      ( ( imgLine[pitch * 1 + ofsB/8] >> shiftB & 0x01) << 1) +
      ( ( imgLine[pitch * 2 + ofsB/8] >> shiftB & 0x01) << 2) +
      ( ( imgLine[pitch * 3 + ofsB/8] >> shiftB & 0x01) << 3)
   ) << 8;

Okay, it may not be faster, but at least you can read what it says now. And actually, it may in fact be faster as well. While the compiler can move some loop-invariant calculations/memory accesses outside the loop, it doesn't always do that. I have come across examples where loading things into local variables manually saved me half the calculation time.

I'd have to look into it more closely, but there's probably a way to speed this up by a factor a lot. Instead of doing things byte for byte, try grabbing a number of words of pixels, taking them apart and the reassembling them in a non-planar format.

Quote:
The reason I try to keep it close to the file format is that keeping many copies of a 16 color image in RAM is far far easier than keeping many copies of the same image in 24bit color

Like silent_code says, it's much easier to use a a generalized format for editing, and only use the complicated formats for loading/saving. It doesn't even have to be a 24bit image (in fact, that's probably not a good idea on the DS anyway), just using 4bit/8bit/16bit linear will be a good deal easier than what you have now.

When loading files with multiple layers of complexity (like 4bit PCX) work in stages! I cannot stress this enough. If you just try to go from image-format to the the format you prefer in one go, you'll not only go insane, but you'll end up with a routine that only works for that one thing only. But if you work in stages, you can create filters for each stage (bit-planing, bit-unpacking, tiling) that you can reuse. Yes, it'll probably be slower; but this is an image editing program, you can probably take a few frames for the thing to load.

#157713 - thegamefreak0134 - Wed May 28, 2008 12:26 am

OK, I have my solution. I'm going to accept slowness if the program is running in 24bit mode, that's just going to have to be normal. When I load a 16 color image, I convert it to a 256 color image, and work with that for all of my display routines. (Just the DS version does this, the routines work fine on faster platforms as they are.)

I considered taking the calculations out, and I considered a number of other ways to work the equation, but in the end I decided to just scrap it and do things the way I knew worked.

Oh, and I am doing these things in stages, but in a slightly different sense. The first thing I made sure I was doing was working the file format in the most standard way possible. I then started layering functionality on top of that, with each new piece of functionality designed to work will all of the formats I planned to support. In that way, I keep everything separate (there are different internal routines that I can add to or change completely) and adding new functionality is a breeze.

My main goal with the project is to have a pixel editor on the DS, preferably with some paint like tools and good undo support. My goal with the class itself is to provide a standard way of loading and saving PCX files, and some basic functionality for dealing with the most common types. Because PCX actually allows more types than I would know how to handle, I also let the class leave a lot up to the end programmer. I tried to avoid it, but the end solution for everything that the DS project needed speed wise was to ignore the class functionality and write in my own, custom functionality. It didn't harm the portability of the class itself, and that's exactly what I was aiming for.

Oh and yes, I agree, I'm willing to wait any number of frames to load the image. That was the goal in the first place. And at the end of the day, I still think I learned how to better optimize the routines, so I'm happy.

On a side note, it's almost ready for a release. The keyboard routine is done and in, and it can technically load and save anything, but I want to polish up my file routines first.

-thegamefreak
_________________
What if the hokey-pokey really is what it's all about?

[url=http:/www.darknovagames.com/index.php?action=recruit&clanid=1]Support Zeta on DarkNova![/url]

#157717 - silent_code - Wed May 28, 2008 2:11 am

you know that code size *could* be an issue, if you repeated a lot of highly specialized code? it surely matters if you code is 180kb or 1.8mb when supporting ten (or whatever) file formats, doesn't it?

doing things in a more general (i'll settle with 8bit image, as i have to admit, that i more or less have "spit" out 24bit previously) way most of the time and then specializing only those *parts* that need to be, is likely to save you some memory... i bet you don't want to implement overlays or plugins or whatever (virtual memory, yay!) ;^)
and as you already load the file into memory for manipulation, what's whrong with dealing with the data in a more comfortable way?

i know this all sounds a bit ... i don't know, i'm very tired and sleepy... provocative, but i don't mean it that way. i'm quite cool about it and i'm not saying you're doing anything wrong (it's your job to know what's good for you). i'm just trying to help. :^)


good night europe and africa and parts of asia!
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#157773 - tepples - Thu May 29, 2008 1:47 am

At this point, I'd say drop 4-bit PCX and use 4-bit BMP, unless you have end users who demand PCX.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.