gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Coding > Huffman Compression

#83470 - ProblemBaby - Tue May 16, 2006 3:38 pm

Hello
Iam trying to write a huffman decompression routine.
I dont exactly understand the layout of the data.

It would be nice if someone who understands it could write some easy psuedo code for me?

thanks

#84209 - Ant6n - Sun May 21, 2006 6:38 am

i dont know how well you understand the algorithm behind huffman compression, but as far as data structures i could suggest this, for the final storage of the huffmann tree:
assuming you have a huffmann tree ...
emmm
...this is going to be long, are you sure you really want Huffmann tree and not attempt lempel ziv, its a little bit more straightforward (imho) and the data structres are simpler

anton

#84236 - tepples - Sun May 21, 2006 2:11 pm

Another option to consider if you have a lot of small values in your data set is Rice coding or ternary coding, depending on the distribution.

ProblemBaby: What kind of data do you need Huffman coding for?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#84267 - Ant6n - Sun May 21, 2006 6:39 pm

http://nocash.emubase.de/gbatek.htm#biosdecompressionfunctions
gba has decompression

#84287 - ProblemBaby - Sun May 21, 2006 8:21 pm

I know how it works, the idea. But I dont understand the tree.
I'll use exactly the same layout of the file as the BIOS decompression routine does, I want to write my own to get some more decompression options.

I dont understand the tree, In which order the nodes are stored.

#84293 - Ant6n - Sun May 21, 2006 8:48 pm

ok, lets see. We'll just use the data structure of the huffman tree that the gba uses for decompression. then we have, from the specs (see above link):

Code:
SWI 13h (GBA/NDS7/NDS9) - HuffUnComp (NDS: with Callback)
Expands Huffman-compressed data and writes in units of 32bits.
If the size of the compressed data is not a multiple of 4, please adjust it as much as possible by padding with 0.
Align the source address to a 4Byte boundary.

  r0  Source Address, aligned by 4, pointing to:
       Data Header (32bit)
         Bit 0-3   Data size in bit units (normally 4 or 8)
         Bit 4-7   Compressed type (must be 2 for Huffman)
         Bit 8-31  24bit size of decompressed data in bytes
       Tree Table
        u8      tree table size/2-1
        Each of the nodes below defined as:
         u8
          6bit  offset to next node -1 (2 byte units)
          1bit  right node end flag (if set, data is in next node)
          1bit  left node end flag
        1 node  Root node
        2 nodes Left, and Right node
        4 nodes LeftLeft, LeftRight, RightLeft, and RightRight node
        ...
       Compressed data
        ...
  r1  Destination Address
  r2  Callback parameter (NDS SWI 13h only, see Callback notes below)
  r3  Callback structure (NDS SWI 13h only, see Callback notes below)

Return: No return value, Data written to destination address.


so the datastructre is like that
-Header
-Tree
-compressed data

the tree is kind of like this (it is actually a trie), example:

Code:

              root
           0/      \1
         0/ \1    [a]
        [b] [c]



This trie would encode a data set with the possible symbols a,b,c. so now when you have a string of bits you just always walk along a path until you hit a symbol. Notice that the edges in the tree represent the 0s and 1s of the compressed data, the leaves are the symbols. So now, if we had a bitstring, i.e.
001011

it would decode it into the sequence of symbols
b (00), a (1), c (01), a (1)

usually we would store the tree as nodes, saving internal nodes as
- pointer to right child (1)
- pointer to left child (0)
and leaves as
- symbol
Of course we need a way to differentiate between internal nodes and leaves

Since we dont change the tree during runtime, it is convenient to store it as a an array.
The specs indicate that every node has 8 bit.
- the first 6 bit indicate the offset to next nodes (instead of using pointers)
I would assume that they just sit next to each other (the left and right child)
- the other two bit indicate whether the childeren are data nodes
- it they are data nodes i would assume they are just data

so the above tree would look like this (correct me, if i am wrong)
[0,0,1],[0,1,1],['a'],['b'],['c']

right?

anton