gbadev.org forum archive

Does anyone know of any research on optimised tile-based graphics quantisation? For my map exporter presently I just find the palette for each tile and search for the palette page which best matches it and sticks it in there. Depending upon tile order, this can be perfectly optimal or as bad as it can be. Plus, it doesn't address the case of tiles with more than fifteen colours, so it won't work for a general image.

Trying to think here... how about:

Get the palettes for all tiles. Calculate their differences (the summed difference of each color in each palette). Yikes.

Split the palette into two pages that have balanced similarities. Start with the two most dissimilar palettes, then add palettes to each team based on their similarities to the palettes already in the team.

Do this again to each team. And again. And again. And again! So that there's sixteen teams.

Quantise each team individually to fifteen colours, with dithering.

This would seem to work, but it has the same problem with anything else I can think of, which is that the teams with the most populous palettes get the least detail. Perhaps choosing initial seeds based on whether they are both very different yet as equal in number members as possible would be best, although that would require yet more computation.

One technique I've thought of when trying to figure out S3 texture compression:

Average out all the colors in each tile to produce a mean color for that tile.
Use traditional color-quantization methods to find sixteen representative mean colors.
Break the image down into separate images for each tile, and find a 15-color palette for each one.

(1001 w00t!)
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

This was some evil code to write for some reason. I've implemented Tepple's idea with an octree quantizer, no dithering. Here's a Van Gogh in 8-bit and in 4-bit paged, using the same quantizer.

Quantized to 8-bit
[url=http://members.shaw.ca/burton-radons/tiled-4bit-paged.gif]
Quantized to 16 4-bit palettes in 8x8 tiles[/url]

If it's hard to tell the difference: black random pixels in the sky, occasional 8x8 blocking (not bad when it's just luminance, but note the blue noise on the cottage), a total loss of road detail, and a lot of dampening of highlights. It looks better than the straight 4-bit quantization anyway. ;-)

Next is to try to make it group palettes better.

Last edited by Burton Radons on Tue Nov 04, 2003 4:10 pm; edited 1 time in total

Wow! It's nice to see some results.

Now some suggestions for improvement: To get the road detail back, try using some sort of dithering. To help get rid of the blue blocks on the cottage, try "stretching" each 15-color palette's chrominance slightly before dithering the image, or try using a color reduction algorithm that better preserves outliers. Your random black sky pixels may just be implementation bugs.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Indeed it was a bug in the quantiser. I was trying to stretch an octree quantiser far beyond its limits; octree gets relatively worse the fewer colours you ask for, and there's no control over how many colours are killed off. I tried but it wasn't capable of it.

So I flailed around for a bit and ported over Xiaolin Wu's modified Median Cut code. The results are quite astonishing.

http://members.shaw.ca/burton-radons/tiled-8bit.gif
255-colour, as above (it's the same copy, so that comparison is no longer entirely analogous)

http://members.shaw.ca/burton-radons/tiled-4bit-paged-new.gif
16 palette, 15 colours per palette

Now that's the kind of result where you feel compelled to check your code over to make sure you didn't accidentally cause it to create the same image. Wow!

This is making me paranoid; it's too good. I need to test it with other images.

Last edited by Burton Radons on Sun Nov 16, 2003 3:41 pm; edited 1 time in total

You're trying to color-reduce images that are much larger than 240x160 pixels, and the blocking on such large images might not be as noticeable as it would on GBA-sized images. Try it on something Nintendo might think to try it on, a partial box shot from Super Smash Bros. Melee for GCN scaled down to GBA screen size:

[Images not permitted - Click here to view it]
Original image

Any possibility of you publishing your source code?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Last edited by tepples on Thu Oct 23, 2003 7:08 pm; edited 1 time in total

Wooow, that's impressive!
Could really be usefull for a lot of things.
_________________
I probably suck, my not is a programmer.

Gradients will naturally be the worst case, since they both aggravate the low bit depth and show the underlying 15-color palette. I've put up a page with the comparison for the SSB box here.

This uses different page selection and random dithering. Your page selection failed this image, relatively. The one I'm experimenting with sends all tiles through the quantiser, reduces to 16 colours, and then for each tile selects the page that has the most matching colours. I've got to try more metrics.

The dithering, uh... for the first time, Google has failed me. You'd think I was searching for lung-headed dinglewhoppers. I have three ditherers raring to go but no information on how best to dither to an arbitrary low-colour palette. My technique is horribly slow.

Source-wise this is being put in as part of my converter utility tool chain, so source will be included there with Armistice. However, it's not in C/C++.

Burton Radons wrote:

The dithering, uh... for the first time, Google has failed me. You'd think I was searching for lung-headed dinglewhoppers. I have three ditherers raring to go but no information on how best to dither to an arbitrary low-colour palette.

Try Floyd-Steinberg dithering.
http://www.visgraf.impa.br/Courses/ip00/proj/Dithering1/floyd_steinberg_dithering.html
Essence of error diffusion dithering: Take each pixel's quantization error and add fractions of it to the surrounding pixels.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

I knew how to dither for monochrome, but the method used for dithering colour would seem to degrade when working with minimal palettes.

I've updated the page. There are a few changes. First, I dither when doing the initial reduction from 24-bit to 15-bit; since Xiaolin Wu's quantiser does this to colours anyway, I'm not losing anything. Then I dither for the reduction to the pages. An additional twist is that I don't use Wu's quantiser to search for optimal output colours; instead I use the proper colour distance calculation here. This gives a huge difference in quality, which is why I think I'm justified in worrying about how colour dithering is non-perceptual.

I'd appreciate it if someone could test these images on the GBA.

Man, that's impressive. Are you planning to release it for other people to use, or should I get started on writing my own quantizer^_^? I'm planning to have a lot of full screen pictures in my RPG, but I'd planned on making them 256 colors cause I didn't think it was possible to get results that good with 16 colors per tile. 16 color tiles would save a lot of ROM space, and save me some trouble writing code to display them and swap tiles in and out when scrolling across a big picture that wouldn't fit in VRAM, so seeing how good it looks I'm pretty sure I'll go with 16. It's just a matter of wether or not I'll have to work for it^^
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Hmm, I went to test those images on the GBA, and it seems that the palettes aren't arranged properly or something. You should be able to open one of the 16x16 color images in Photoshop or something and blank out one set of 16 consecutive colors and only blank out full tiles, not just scattery colors like a regular paletted image, if that makes any sense.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

What it does for showing a paged image is it converts an animation to a tiled map and then back to an animation before exporting as a BMP. The map-to-animation stage uses an octree to generate the minimal palette, so if the same colour is in different pages, they'll be merged. It should be treated as a regular 256-colour image for the purpose of testing. I have comments on your previous post but I want to try something first.

I'm using off-the-shelf components (Xiaolin Wu's quantiser source can be found on the web, and I use a standard Floyd-Steinberg). As I said, I'm going to be releasing the source as part of my converter tool, but it's not in C/C++. I'll see what I can do to make the tool more generic.

I did some more fooling around and am satisfied that I can't make it significantly better using this quantiser and ditherer. They're not really built for such low-colour imagery; I particularly point out Figure 8 of Dithered Color Quantization. If I could spend two minutes quantising each image by using this algorithm, I would - it's a great algorithm.

Anyway, here are the steps I've finalised on.

Do the initial import and dither from 24-bit to 15-bit.
Feed the whole image into a quantiser, reduce to 16 colours. Each colour is a page.
Split the image into 8x8 tiles.
For each tile, send it into the quantiser and choose the page which matches the most colours.
Create a quantiser for each page. Feed the tiles using this page into it and reduce to 15 colours.
Quantise and dither the tiles. You must use a real colour metric, and not the quantiser's builtin colour finder.

Although it's off-this-topic, I thought that some might be interested in what I've been working on.

I've written a JPEG decompressor from scratch. I've always wanted to do one, so now I have. It only handles baseline DCT Huffman, but there's no reason to do anything else anyway, since progressive doesn't make sense, arithmetic coding can't even be triggered in Photoshop, and the rest is truly obsolete stuff like lossless. Ignoring all this fat and making the whole thing a single, condensed stream resulted in it taking 763 lines of C code. That's with the headers; 599 without.

I've been inspired by this Gamasutra article. I take it that they used jpeglib, as there's no mention that JFIF (as used normally) doesn't work with RGB. Potentially, my code will be faster, since I don't have an input stream, which requires certain checks at the worst parts of the code. Moreover, for palettised I can just use a YCbCr palette for searching, with improved visual results and only 255 YCbCr->RGB conversions per image, rather than 46080 and up, as well as folding in some bit-shifts. Finally, while coefficient decoding doesn't lend itself very well to assembler conversion, the AA&N IDCT and conversion definitely do.

At the current time it's terribly slow. 4.6 seconds decompressing to Mode 3. I'm optimistic since there's one huge optimisation I've not yet implemented and many little opportunities that someone working with a hacked jpeglib would never see.

This will be written in C/ARM assembler, so people apart from me will be able to use it. If you're wondering why I'm doing this, it's for a code sample for an application; I'm not a professional programmer yet.

JPEG decoder looks interesting. From your description, your code seems to take about 500,000 cycles per 16x16 pixel macroblock, or about 2000 cycles per pixel. Have you moved critical code and data structures to IWRAM?

And once you have it optimized, under what sort of license will you release the source code?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

This image I'm testing with is actually using 8x8 macroblocks with very high-quality settings (it's the first example on the Gamasutra article, 288x160), so coefficient decoding and IDCT are proportionately taking more time than they would otherwise (particularly coefficient decoding). I want that as I'm concentrating on optimising Huffman decoding right now. It's also running in ARM ROM. So the current time is not representative; at a minimum it will be twice as fast with real images, right now.

The code will be in the public domain.

Okay, a real image (288x160, the same as before compressed to 10kb) takes 518 milliseconds to decompress. IDCT takes 2145 cycles per 8x8 block (26% of total time), coefficient decoding and dequantisation takes 1577 cycles per 8x8 block (19% of total time), colour conversion and storage takes 97 cycles per pixel (51% of total time). This is coming from VBA, and while I take it that it has accurate timers I'm not entirely certain. It takes 188 cycles per pixel altogether.

The colour conversion has no specialised logic, generically handling any component factors; adding that will certainly beat that down nicely. So I'm targeting about 300 milliseconds to decompress a JPEG.

Other people working on the IDCT on the ARM7TDMI (such as in this dispiritingly generic paper) have gotten remarkable results. I tried plugging around in it in assembler and didn't see many opportunities, although I'm not adept at ARM machine code by any stretch.

I got it down to 350 milliseconds. I know it'll be slower on hardware because VBA doesn't emulate speed differences in reading from IWRAM as opposed to ROM or EWRAM, only for executing code from one or the other, and there are a few places where I use EWRAM. Here's a breakdown on usage, as default:

2684 bytes of IWRAM
3948 bytes of ROM
68 bytes of EWRAM, somehow
676 bytes of stack space at maximum
6628 bytes of allocations for a representative image

Here's a usage sample:

Code:

REG_DISPCNT = 3 | (1 << (8 + 2)); /* Enable mode 3 and turn on background 2. */
REG_BG2CNT = 0; /* Clear all BG2 parameters. */
JPEG_DecompressImage (data, VideoBuffer, 240, 160);

And here's the link: http://members.shaw.ca/burton-radons/gba-jpeg.zip. It's self-sufficient outside of a need for malloc, free, and memset.

A note. Photoshop doesn't appear to generate optimal images unless if you use Save for Web. Otherwise it will not supersample the colour channels, even if you save to terrible quality, which is silly since it results in even less visual quality than normal for that many bits. Also, while it handles grayscale images, it still does colour conversion for them; size won out over speed.

(Edit: got rid of C99 usage and realised I could easily save 2016 bytes of allocations by exploiting the lack of streams further. Zip-file has been updated.)

(Edit 2: rebalanced ROM and IWRAM usage to drop 800 bytes of the latter. The code's also setup more legibly, and I shaved off a couple hundred lines of that.)

I'm either on a roll or in a rut. I'm having fun in any case. This morning I've written an implementation of inflate and a PNG decompressor.

Inflate - which is also used in pkzip - is an easy, 350 bloated lines of code. Its main problem is that its state usage is horrible; over 35104 bytes of state, and I can do nothing about it. If you're decompressing to flat data you can get rid of the window (32768 bytes) and speed things up, but PNG doesn't encourage such. Oh well. Inflate is similar to LZSS in principle, but it's more sophisticated.

PNG isn't nearly as cycle-aware as JPEG. The encoder can cut up the compressed stream arbitrarily, so every byte you read you have to check whether you're going to need to skip over another chunk header. Hard-to-swallow as it is, they should have just gone the JPEG route and made the image data a stream without a length and restricted post-data blocks so that a dumb non-image-reading decoder can just stop at the first IDAT without missing anything. Anyway, this means that if I port it to the GBA they'll have to be tightly integrated. Does anyone want a separated LZSS decompressor?

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 8:51 pm; edited 1 time in total

I know what it is supposedly for, but it makes no sense. If I'm streaming data, I want to load it into a fixed buffer; I won't make allocations that I toss out later. So, that requires that I record how much of the current chunk I've downloaded so that I know when to stop and read the next chunk header. (Edited for language.)

It does provide one thing, which is the knowledge of when to stop (or at least places that we might stop). This information is transmitted better and automatically through the file or network system. When PNG is embedded in true don't-know-when-to-stop formats (such as in a Flash animation stream), you'd want to use the input buffer for all data types so you'd just keep going on after finishing the PNG. Just as, for example, it does when JPEG is embedded in a Flash stream.

This was a principled choice, not an engineering one. They wanted a chunk format, they realised that that would compromise streaming, so they sacrificed portability.

I wouldn't say that PNG is flexible just because it can be extended. Under that criteria, all data formats are equally flexible, except for raw. In any case it has nothing to do with PNG's overapplication of chunking.

Last edited by Burton Radons on Mon Nov 10, 2003 1:15 am; edited 1 time in total

I've implemented Inflate on the GBA side and put a function in the converter that sends chosen data through a GZIP compressor (7-Zip gives the best I've seen); in this case, the tile data, palette, and tile palettes. It looks to be fairly stable at around 15000 bytes; so 30% is taken off through compression.

The current speed of decompression is 102 cycles per byte, so it's about 51 cycles per pixel, compared to 130 per pixel for JPEG. That's not too great an improvement until you consider that I would also need to quantise the JPEG before being able to truly use it, and the Broken Sword guys said that was the highest cost involved. Also, that would require attaching a palette and the necessary search tables to keep the speed up.

So, with a better palette choosing metric and possibly a better colour quantiser it appears that paged is the way to go.

(Edit: Forgot to mention an extra step before compression; I sort the pixels in the tiles so that the most commonly used colours are first, thus improving compression amongst tiles in different palettes.)

gbadev.org forum archive

Graphics > Tile-based graphics quantisation

#11846 - Burton Radons - Tue Oct 21, 2003 3:43 pm

#11847 - Burton Radons - Tue Oct 21, 2003 4:21 pm

#11849 - tepples - Tue Oct 21, 2003 5:10 pm

#11864 - Burton Radons - Wed Oct 22, 2003 7:33 am

#11872 - tepples - Wed Oct 22, 2003 5:10 pm

#11886 - Burton Radons - Thu Oct 23, 2003 3:43 am

#11889 - tepples - Thu Oct 23, 2003 4:23 am

#11899 - FluBBa - Thu Oct 23, 2003 10:48 am

#11921 - Burton Radons - Fri Oct 24, 2003 2:23 am

#11924 - tepples - Fri Oct 24, 2003 5:33 am

#11954 - Burton Radons - Fri Oct 24, 2003 10:02 pm

#12018 - DekuTree64 - Mon Oct 27, 2003 12:39 am

#12021 - DekuTree64 - Mon Oct 27, 2003 2:20 am

#12023 - Burton Radons - Mon Oct 27, 2003 3:36 am

#12024 - Burton Radons - Mon Oct 27, 2003 4:45 am

#12172 - Burton Radons - Sun Nov 02, 2003 5:45 am

#12173 - tepples - Sun Nov 02, 2003 6:42 am

#12176 - Burton Radons - Sun Nov 02, 2003 8:00 am

#12185 - Burton Radons - Mon Nov 03, 2003 8:08 am

#12199 - Burton Radons - Mon Nov 03, 2003 6:41 pm

#12251 - Burton Radons - Wed Nov 05, 2003 6:52 pm

#12287 - Paul Shirley - Thu Nov 06, 2003 4:22 pm

#12379 - Burton Radons - Sun Nov 09, 2003 7:18 pm

#12380 - Burton Radons - Sun Nov 09, 2003 7:29 pm