Optimized inner loop of bit-at-a-time Huffman decoder is indistinguishable from magic:

106: 22 0f add r18, r18

108: 33 1f adc r19, r19

10a: 37 1b sub r19, r23

10c: 11 97 sbiw r26, 0x01 ; 1

10e: fd 01 movw r30, r26

110: 54 91 lpm r21, Z

112: 75 0f add r23, r21

114: 4f 5f subi r20, 0xFF ; 255

116: 48 30 cpi r20, 0x08 ; 8

118: 58 f4 brcc .+22 ; 0x130

11a: 37 17 cp r19, r23

11c: a0 f7 brcc .-24 ; 0x106

##
Tree format

A header of 8-bit codeword counts per bit length in reverse order followed by a table of fixed-length values in codebook order. The header specifies the canonical huffman code; explicit codewords are not stored. The file header points at the beginning of the value table; codeword counts are found by decrementing that pointer (hence reverse order).

Example: Given this code:

0 = 4

10 = 5

110 = 1

111 = 2

The tree would be stored as {2,1,1,4,5,1,2} with its file pointer pointing at the first value byte (4).

##
Decoding the tree

The bit-at-time decoder works by reconstructing the last code in the codebook for each bit length and comparing it to the decoded codeword so far. It stops when the decoded codeword is less than the last codeword for that bit length. The result is an index into the value table computed from the codeword and the number of codewords one shorter than the resulting codeword.

This can be accomplished with the following psuedocode:

codeword = 0, max_codeword = 0, num_codewords_so_far = 0, n = 0

while True:

max_codeword = max_codeword << 1

codeword = (codeword << 1) + getbit(n)

num_codewords_this_length = codewords_of_length(n)

if codeword < max_codeword + num_codewords_this_length:

return num_codewords_so_far + codeword - max_codeword

num_codewords_so_far += num_codewords_this_length

max_codeword += num_codewords_this_length

n = n + 1

This can be simplified algebraically by subtracting num_codewords_so_far from both max_codeword and codeword just before each comparison in the loop. This doesn't affect the outcome of the comparison.

adj_codeword = 0, adj_max_codeword = 0, num_codewords_so_far = 0, n = 0

while True:

num_codewords_this_length = codewords_of_length(n)

adj_max_codeword = (adj_max_codeword << 1) - num_codewords_so_far

adj_codeword = (adj_codeword << 1) + getbit(n) - num_codewords_so_far

if adj_codeword < adj_max_codeword + num_codewords_this_length:

return num_codewords_so_far + adj_codeword - adj_max_codeword

num_codewords_so_far += num_codewords_this_length

adj_max_codeword += num_codewords_this_length

n = n + 1

However after this transformation adj_max_codeword always equals num_codewords_so_far, so we can eliminate the redundant variable and greatly simplify the loop.

adj_codeword = 0, num_codewords_so_far = 0, n = 0

while True:

num_codewords_this_length = codewords_of_length(n)

adj_codeword = (adj_codeword << 1) + getbit(n) - num_codewords_this_length

num_codewords_so_far += num_codewords_this_length

if adj_codeword < num_codewords_so_far:

return adj_codeword

n = n + 1

##
Assembly code

r18 = bits from bitstream

r19 = adjusted codeword (index into the codebook)

r20 = number of bits shifted from r18

r21 = number of codewords at this length

r23 = sum number of codewords up to this length
r26:r27 = pointer to table of codewords per length (reverse order)

It works out to 16 cycles per bit (15 when it terminates; and some more when r18 runs out if bits).

The add/adc consumes one bit from the bitstream by shifting: r18 bit 7 is shifted into r19 bit 0. The subtraction at 10a subtract the number of total codewords in the codebook previous so that r19 is an index into the codebook. If it is less than the total number of codewords at the length, the complete codeword is decoded. Otherwise the offset points at a longer codeword and more bits are needed.

The branch at 118 is a subroutine that gets more bits from the bitstream loading them into r18 and setting r20=0. That routine jumps to 11a when it's done.

##
Examples

Lengths = 0, 2, 0, 3 (not reversed)

Codebook:

00

01

1000

1001

1010

Say the input bits are 1001. After each loop:

r19 = 0 * 2 + 1 - 0 = 1 not less than 0

r19 = 1 * 2 + 0 - 0 = 2 not less than 2

r19 = 2 * 2 + 0 - 2 = 2 not less than 2

r19 = 2 * 2 + 1 - 2 = 3 is less than 5

The loop terminates and r19 = 3 is the number of codewords in the codebook before 1001. This is the offset into the table of values indexed by codeword position in the codebook.

Same but for input 01:

r19 = 0 * 2 + 0 - 0 = 0 not less than 0

r19 = 0 * 2 + 1 - 0 = 1 is less than 1

After the loop exits r19 = 1 is the number of codewords in the codebook before 01.