Sunday, March 25, 2018

Announcing ubaboot

ubaboot is a 512 byte open-source USB bootloader for atmega32u4.

100% size-optimized assembly packs many features into a small footprint:
  • Flash write and read (verify)
  • EEPROM write and read (verify)
  • Signature, lock, and fuse read

Its tiny size allows up to the maximum hardware-supported 31.5 KiB for user programs.

Check it out here:

How does it work?

ubaboot attaches to the system as a custom USB device. An included sample user-mode pyusb driver works on Linux and programs/verifies chip memories by sending control transfers. contains a full description of the protocol if you wish to program your own driver or work with other platforms.

How is it so small?

  • No interrupts. The ISR table itself takes a lot of space as does the stack push/pops needed to enter and exit interrupts. Instead the main loop polls for events.
  • All data is held in registers. There are no variables in RAM (and no stack). Absolute/indirect accesses to RAM/stack require too many instructions.
  • Logic optimized to fall-through. Instead of branching to the end, the logic falls-through to the next comparison and branches over each of the remaining cases.
  • Event handling uses bit twiddling to match both state and event in a single compare. State numbers are picked so that masking one interrupt bit produces a unique value for every combination of event and state. The dispatch then uses a series of single 8-bit comparisons.
  • Jumps and subroutines are used to reuse code. The setup logic is laid out specifically to optimize for this. Flash and descriptor reads use the same code path.
  • Setup logic eagerly loads registers used for most transfers. This avoids repeating those loads for each command.
  • Hardware registers are accessed through Y+offset loads/stores. Normal absolute register load/stores use 2 words each while indirect loads/stores use 1 word.
  • USB setup headers are read directly into %r2..%r9 aliases in data space ($2..$9) via indirect Z pointer access.

This visualization shows how it all fits together:


Wednesday, March 22, 2017

Coming soon!

ubaboot: an open source 512 byte USB bootloader for atmega32u4

Stay tuned...

Sunday, January 22, 2017

Headphone attenuator

A miniature surface-mount voltage divider for sensitive in-ear headphones. Without attenuation these devices can easily produce 130+ dBu SPL at full volume; loud enough to damage hearing in seconds.

Wiring this circuit into a pair of headphones (or an adapter) will reliably limit the maximum level produced.

It's tiny at 0.2 x 0.27 inches and mounts four 0805 (imperial) resistors, two per channel. The circuit below uses 20 ohms for R1,R4 and 0.2 ohms for R2,R3 which results in 40 dB of attenuation.

Friday, December 9, 2016

Bit-at-a-time huffman decoding on an AVR

Optimized inner loop of bit-at-a-time Huffman decoder is indistinguishable from magic:

     106:       22 0f           add     r18, r18
     108:       33 1f           adc     r19, r19
     10a:       37 1b           sub     r19, r23
     10c:       11 97           sbiw    r26, 0x01       ; 1
     10e:       fd 01           movw    r30, r26
     110:       54 91           lpm     r21, Z
     112:       75 0f           add     r23, r21
     114:       4f 5f           subi    r20, 0xFF       ; 255
     116:       48 30           cpi     r20, 0x08       ; 8
     118:       58 f4           brcc    .+22            ; 0x130
     11a:       37 17           cp      r19, r23
     11c:       a0 f7           brcc    .-24            ; 0x106

Tree format

A header of 8-bit codeword counts per bit length in reverse order followed by a table of fixed-length values in codebook order. The header specifies the canonical huffman code; explicit codewords are not stored. The file header points at the beginning of the value table; codeword counts are found by decrementing that pointer (hence reverse order).

Example: Given this code:
0 = 4
10 = 5
110 = 1
111 = 2

The tree would be stored as {2,1,1,4,5,1,2} with its file pointer pointing at the first value byte (4).

Decoding the tree

The bit-at-time decoder works by reconstructing the last code in the codebook for each bit length and comparing it to the decoded codeword so far. It stops when the decoded codeword is less than the last codeword for that bit length. The result is an index into the value table computed from the codeword and the number of codewords one shorter than the resulting codeword.

This can be accomplished with the following psuedocode:

codeword = 0, max_codeword = 0, num_codewords_so_far = 0, n = 0
while True:
  max_codeword = max_codeword << 1
  codeword = (codeword << 1) + getbit(n)
  num_codewords_this_length = codewords_of_length(n)
  if codeword < max_codeword + num_codewords_this_length:
    return num_codewords_so_far + codeword - max_codeword
  num_codewords_so_far += num_codewords_this_length
  max_codeword += num_codewords_this_length
  n = n + 1

This can be simplified algebraically by subtracting num_codewords_so_far from both max_codeword and codeword just before each comparison in the loop. This doesn't affect the outcome of the comparison.

adj_codeword = 0, adj_max_codeword = 0, num_codewords_so_far = 0, n = 0
while True:
  num_codewords_this_length = codewords_of_length(n)
  adj_max_codeword = (adj_max_codeword << 1) - num_codewords_so_far
  adj_codeword = (adj_codeword << 1) + getbit(n) - num_codewords_so_far
  if adj_codeword < adj_max_codeword + num_codewords_this_length:
    return num_codewords_so_far + adj_codeword - adj_max_codeword
  num_codewords_so_far += num_codewords_this_length
  adj_max_codeword += num_codewords_this_length
  n = n + 1

However after this transformation adj_max_codeword always equals num_codewords_so_far, so we can eliminate the redundant variable and greatly simplify the loop.

adj_codeword = 0, num_codewords_so_far = 0, n = 0
while True:
  num_codewords_this_length = codewords_of_length(n)
  adj_codeword = (adj_codeword << 1) + getbit(n) - num_codewords_this_length
  num_codewords_so_far += num_codewords_this_length
  if adj_codeword < num_codewords_so_far:
    return adj_codeword
  n = n + 1

Assembly code

r18 = bits from bitstream
r19 = adjusted codeword (index into the codebook)
r20 = number of bits shifted from r18
r21 = number of codewords at this length
r23 = sum number of codewords up to this length
r26:r27 = pointer to table of codewords per length (reverse order)

It works out to 16 cycles per bit (15 when it terminates; and some more when r18 runs out if bits).

The add/adc consumes one bit from the bitstream by shifting: r18 bit 7 is shifted into r19 bit 0. The subtraction at 10a subtract the number of total codewords in the codebook previous so that r19 is an index into the codebook. If it is less than the total number of codewords at the length, the complete codeword is decoded. Otherwise the offset points at a longer codeword and more bits are needed.

The branch at 118 is a subroutine that gets more bits from the bitstream loading them into r18 and setting r20=0. That routine jumps to 11a when it's done.


Lengths = 0, 2, 0, 3 (not reversed)

Say the input bits are 1001. After each loop:
r19 = 0 * 2 + 1 - 0 = 1 not less than 0
r19 = 1 * 2 + 0 - 0 = 2 not less than 2
r19 = 2 * 2 + 0 - 2 = 2 not less than 2
r19 = 2 * 2 + 1 - 2 = 3 is less than 5

The loop terminates and r19 = 3 is the number of codewords in the codebook before 1001. This is the offset into the table of values indexed by codeword position in the codebook.

Same but for input 01:
r19 = 0 * 2 + 0 - 0 = 0 not less than 0
r19 = 0 * 2 + 1 - 0 = 1 is less than 1

After the loop exits r19 = 1 is the number of codewords in the codebook before 01.

Friday, June 19, 2015

attiny48 avrdude config

avrdude doesn't support the attiny48 out of the box. This means editing your avrdude.conf if you want to program one (e.g. the SMD trainer).

Scott Shaw published a config here, and it works for me:

SMD trainer demo firmware

Blink all the LEDs!

Released: simple firmware for the SMD trainer.

It blinks! It PWMs! It includes source code and a binary .hex! CC0 license!

Sunday, June 7, 2015

SMD trainer v1.0

The smdtrainer is a bare-bones minimal cost training exercise for assembling SMD boards by hand. Build one or two to hone your tweezer and fine pitch soldering skills.

The trainer is based around an Atmel AVR ATtiny48 connected to 8 LEDs via 74HC595 shift register. The MCU can blink the LEDs in a timed pattern, or interface with additional hardware (buttons, sensors, etc.) All of MCU pins are broken out to headers, and all of the '595 pins can be controlled by software including PWM of the '595 output enable for dimming.

The board includes several surface mount footprints:

  • SO-16 (1mm pitch)
  • SOT-23
  • TQFP-32 (0.8mm pitch)
  • 0805 (80x50 mil)
  • 1206 (120x60 mil)

Also included are a DC power jack for 5V in and breakout headers for power and ICSP.


    The MCP1700 is wired incorrectly.  IN and OUT are reversed. This error exists in both the schematic and PCB.

    Workarounds: The regulator isn't needed if the input supply is suitably regulated (e.g. 5V).

    Choose one:
    1. Do nothing. The regulator will be reverse biased and pass current through a diode drop.
    2. Bridge pins 1 and 2 of P4. This bypasses the regulator and the reverse protection diode.
    3. Bridge pins 2 and 3 of U2. This bypasses the regulator but keeps the diode.
    Alternatively, to keep the regulator: wire it up swapping pads 2 and 3 or connect directly to P4.



      Bill of materials

      • (1) 2mm DC power jack
      • (1) Atmel ATtiny48-A
      • (1) Microchip MCP1700T3302E/TT 3.3v regulator
      • (1) NXP 74HC595D-Q100 shift register
      • (1) 1206 0.5A schottky diode (AVX SD1206S040S0R5 or similar)
      • (2) 0.1uF 0805 capacitors
      • (2) 1.0uF 0805 capacitors
      • (8) 1kOhm 0805 capacitors
      • (8) LED 0805 SMT (OSRAM LG R971-KN-1 or similar)
      • (1) 3x2 ICSP male header
      • Optional: (1) 1x4 header for power breakout
      • Optional: (4) 1x8 header for ATtiny48 breakouts