λbje - Advent of Z80

The 12 components of Christmas? Original design.

This post is the first in a series of articles implementing solutions for the 2022 Advent of Code in Z80 assembly. The series aims to show some examples of programming in Z80 assembly and help newcomers to assembly language begin to think about programs from the assembly perspective, and will focus less on finding clever AoC solutions and more on producing understandable assembly code.

        JP      DAY01

This post is a “literate assembly” document. The code fragments on the page are assembled and executed in your browser. To the right of each statement you can see the bytes that the statement assembled into.

The first puzzle in the AoC 2022 series provides you with a nested list of numbers and requires you to sum the nested lists and find the maximum. In a high level language this might look something like maximum (map sum input), which the compiler will helpfully translate into reasonably efficient code.

What I would like to do is iterate over the input summing numbers as I encounter them and keeping track of the current maximum sum at the end of each nested list. In principle this is not complicated, as even an ancient processor like the Z80 can add and compare. In practice there’s a small complication: the problem’s number space is greater than what a 16-bit number can store.

A processor like the Z80 adds up much the same way I learned to in my early schooling years: add up the 1s column, noting any carry, add up the 10s column including the carry from the 1s column, and so on until I’d added up all my columns. The Z80 does the same thing bit by bit: the least significant bit of both inputs is summed and the carry is pushed along to the next bit, and so on. The Z80 has a 4-bit ALU and does half an addition in one cycle then the other half in the next. For a 16-bit add the Z80 does the low byte then the high byte.

The carry result is exposed to applications through the carry flag: if the most significant bits of the addends plus the carry from the previous bit results in another carry, the carry flag is set. To make chaining additions easier there are also ADC (add-with-carry) instructions that use the value in the carry flag as the carry-in for the least significant bit addition. Adding two 32-bit numbers is more or less as simple as adding two 16-bit numbers, then adding two more 16-bit numbers with carry.

Another wrinkle will emerge about now, however - the Z80 doesn’t have a large number of registers. The other addend for ADC HL, … must be one of BC, DE, HL, or SP. There’s no point adding HL to itself, and the stack pointer (SP) is needed to hold the current stack value, which only leaves 48 bits of register for 64 bits of value. There are also the IX and IY index registers that can stand in for HL for many, but not all, operations. Importantly they can be used for ADD but not for ADC.

I will choose to store the addends for a 32-bit addition in HL:IY and DE:BC, so a SUM32 function would look like this - more comment than code.

; IN    HL:IY   the 32-bit accumulator
;       DE:BC   the 32-bit addend
; OUT:  HL:IY   HL:IY + DE:BC
;       S       set if the result is negative
;       Z       set if the top 16 bits are zero
;       H       set if carry from bit 27
;       P/V     set if signed overflow
;       N       reset
;       C       set if carry from bit 31
SUM32:  add     iy, bc
        adc     hl, de
        ret

Some of the flags are a bit nonsensical. The half-carry bit is the result of the penultimate 4-bit ALU operation, so for an 8-bit addition it’s the carry out of bit 3. For a 16-bit operation it’s the carry out of bit 3 in the top 8 bits, or the carry out of bit 11. The zero flag will be set by the adc discarding any information from the add. The other flags will all make sense, however, as they’re based only on the most significant bits.

With a 32-bit addition in hand I’ll also want a comparison of unsigned 32-bit numbers. This will happen using subtraction, because comparison on the Z80 is a subtraction that discards the result and there’s no 16-bit comparison instruction. The routine compares the high words first, only comparing the low words if the high words are equal.

There is no SUB instruction for 16-bit registers. There is, however, SBC for the HL register, which is “subtract with carry”. On the Z80, as with many other processors, this is implemented as “subtract with borrow” - the carry flag becomes a borrow flag. SBC HL, DE is computed as $HL - DE - C$ . Before using this instruction the carry flag must be put into a known state. This can be done using the SCF instruction to set the carry flag, but there is no dedicated instruction to clear the carry flag. Fortunately, OR A, A will clear the carry flag in the same number of cycles as SCF sets it, and with a similar set of side effects on flags.

After the subtraction the carry flag will be set if there was a borrow required for the final bit of the operation. This happens only when the minuend is less than the subtrahend. The carry flag distinguishes between a < b when set and a >= b when reset. The zero flag is set only if the two operands are equal.

The 32-bit comparison works by comparing the high words and checking the zero flag. If it’s not set then the carry flag will already show which is greater, and the routine is done. If the zero flag is set then the carry flag must be clear, another subtraction is done, and the flags from that indicate the overall comparison result.

; IN    HL:IY   the 32-bit minuend
;       DE:BC   the 32-bit subtrahend
; OUT:  Z       set if minuend == subtrahend
;       C       set if minuend < subtrahend
CMP32:  push    hl
        or      a, a    ; reset carry
        sbc     hl, de
        jr      nz, CMP32DONE
        push    iy
        pop     hl
        sbc     hl, bc
CMP32DONE:
        pop     hl
        ret

With 32-bit addition and comparison ready to go the main challenge left to solve is processing the input. For simplicity’s sake the input will be pre-processed to be 32-bit words with zeros separating the sub-lists:

INPUT:
        defw    1000, 0
        defw    2000, 0
        defw    3000, 0

        defw    0, 0

        defw    4000, 0

        defw    0, 0

        defw    5000, 0
        defw    6000, 0

        defw    0, 0

        defw    7000, 0
        defw    8000, 0
        defw    9000, 0

        defw    0, 0

        defw    10000, 0

        defw    0, 0
        defw    0, 0

My solution algorithm is going to look like this:

Initialise the maximum to zero
While there are lists of numbers left:
1. Initialise the sum to zero
2. Add each number in the list to the sum
3. If the sum is greater than the maximum save it as the new maximum
Print out the result

Translating this to Z80 assembly means figuring out what variables I need and where I’ll store them. I’ll use four bytes of memory to hold the maximum, because I will only need that once per outer loop. I’ll keep the current sum in HL:IY where it’s ready to use in my addition and comparison subroutines. BC and DE will be used for addends and subtrahends as needed, and that leaves A for general purpose arithmetic and IX to point to my input.

I will want to load DE:BC from (IX) multiple times, so I’ll pop that in its own routine. It’s important to remember that the Z80 is little endian: when loading or storing a multi-byte value the least significant byte is at the first address. The DEFW 10000, 0 above assembles to the hex bytes $10 $27 $00 $00, or 16, 39, 0, 0 in decimal. $39\times256 = 9984 + 16 = 10000$ . This means I want to load C from (IX+0) up to D from (IX+3). Every time I load the value I also need to test whether it’s zero, so I’ll set the zero flag appropriately as I load.

; IN    IX      address of next input to read
; OUT   DE:BC   32-bit value loaded from (IX)
;       IX      incremented past read input
;       A       clobbered
;       Z       set if the loaded value was zero
READ:   ld      c, (ix+0)
        ld      b, (ix+1)
        ld      e, (ix+2)
        ld      d, (ix+3)
        inc     ix
        inc     ix
        inc     ix
        inc     ix

Testing whether DE:BC is zero is nice and easy: load one of the 8-bit registers into A then OR each of the others in. Only if all of them are zero will A be zero afterwards, with the zero flag conveniently holding that information.

        ld      a, b
        or      a, c
        or      a, d
        or      a, e
        ret

It’s time to set up the main program and initialise the input and maximum variables, as well as of course setting a valid stack pointer value so I can call subroutines.

MAX:    DEFW    0, 0

DAY01:  ld      sp, $8000
        ld      ix, INPUT

For my outer loop I will clear HL:IY to zero then load DE:BC from (IX). The outer loop is done if that load sets the zero flag.

OUTER:  ld      hl, 0   ; init sum
        ld      iy, 0
        call    READ
        jr      z, DONE

I’m ready to enter the inner loop now, where I’ll add DE:BC to the current sum in HL:IY, load the next input number, and test whether that’s zero: if it is, I repeat the inner loop.

Because SUM32 is so trivial I’m not even going to call it - I will inline the two instructions right here.

INNER:  add     iy, bc
        adc     hl, de
        call    READ
        jr      nz, INNER

Having completed a sum I need to compare it to the maximum and update as needed. I can DE:BC directly from fixed memory, being careful with byte order, then use my CMP32 routine.

        ld      bc, (MAX)
        ld      de, (MAX+2)
        call    CMP32

The carry flag is set if DE:BC is greater than HL:IY -if so I can head straight back into the outer loop.

        jr      c, OUTER
        ld      (MAX), iy
        ld      (MAX+2), hl
        jr      OUTER

There’s nothing left to do now but print out the maximum, using the as-yet undescribed PRINT32 routine, then HALT.

DONE:   ld      iy, (MAX)
        ld      hl, (MAX+2)
        call    PRINT32
        ld      a, '\n'
        out     (1), a
        HALT

Printing a decimal number is a bit tricky on the Z80. The range of an unsigned 32-bit number is 0 to 4,294,967,295, so one common approach is to subtract 1,000,000,000 from the number to print until borrow is set and print that as the billions digit, then subtract 100,000,000 until borrow is set, and so on down to subtracting 1. A nice little extra feature is to also keep track of whether there’s been a non-zero digit yet and skip the print until either the first non-zero digit or the last digit.

For the in-browser emulator, writing a value to I/O port 1 will print it to a console HTML element.

I am also out of registers to store the flag indicating whether I’ve seen a non-zero digit or not. This is a good time to introduce a stack frame. When the Z80 CPU sees a CALL instruction it stores PC onto the stack and jumps to the target address. When the subroutine is done, it executes a RET and the PC is popped off the stack. Around that return address useful things can be stored. Before the CALL a bit of code might PUSH arguments that don’t fit into registers, and after the call the subroutine may need to allocate a bit of space on the stack for local variables. The structure of data around the return address is called a stack frame.

To set up a stack frame the first thing that happens on entry to a subroutine is to save IX and then set it to SP. There is no instruction to do this directly so IX is set to zero then SP is added to it. After that space for local variables is reserved by subtracting the required amount from SP. Once again there is no handy instruction to add or subtract values to SP. Open option is to load the fixed offset into HL, add SP to it, and transfer the result back to SP - but this clobbers HL. Another option is to push a register, which decrements SP twice but this does unnecessary memory writes. It’s also possible to just DEC SP repeatedly but this is 6 cycles per decrement and doesn’t scale well to multiple local variables.

CPU support is essential to working with stack frames effectively. The Z80 has indexed addressing with the IX and IY registers, allowing a local variable to be accessed as, eg, LD A, (IX-1), but it doesn’t have good support for setting up or tearing down a stack frame. The 8086 allowed directly moving the stack pointer to the base pointer register, as well as adding constant values to the stack pointer. The 80186 added ENTER and LEAVE instructions that do all the work for you - however, ENTER wound up overspecified and difficult to implement in processors, and as a result neither instruction is typically used by compilers. On the other end of the scale the 6502 lacks the required indexed addressing mode, and usually a different solution will be required.

For PRINT32 only a single byte of storage is needed, so DEC SP will work nicely.

; IN    HL:IY   A 32-bit number to print
; OUT   A       clobbered
PRINT32:
        push    ix
        ld      ix, 0
        add     ix, sp
        dec     sp

        push    bc
        push    de
        push    hl
        push    iy

The flag for whether a non-zero character has been printed will be stored at (IX-1). I will use ‘0’ as the flag indicating that only zeros have been seen so far so I can do an easy comparison between the digit to print and the flag: if they’re equal, don’t print. Any non-digit value will serve to indicate printing should happen.

        ld      (ix-1), '0'

For each of the 10 digit locations DE:BC will be loaded with the constant to subtract and a subroutine will be called. For the last digit location the non-zero character flag will be forced true, so at least one digit is printed each time.

One more limitation of the Z80 instruction set now rears its head: there is an ADD IY, DE instruction but no corresponding subtraction. Fortunately subtracing a number is the same as adding its negation and inverting the carry flag’s meaning, so instead of loading positive constants for subtraction I will load negative constants for addition.

        ld      bc, -1_000_000_000
        ld      de, -1_000_000_000 >> 16
        call    DIGIT

        ld      bc, -100_000_000
        ld      de, -100_000_000 >> 16
        call    DIGIT

        ld      bc, -10_000_000
        ld      de, -10_000_000 >> 16
        call    DIGIT

        ld      bc, -1_000_000
        ld      de, -1_000_000 >> 16
        call    DIGIT

        ld      bc, -100_000
        ld      de, -100_000 >> 16
        call    DIGIT

        ld      bc, -10_000
        ld      de, -10_000 >> 16
        call    DIGIT

        ld      bc, -1_000
        ld      de, -1_000 >> 16
        call    DIGIT

        ld      bc, -100
        ld      de, -100 >> 16
        call    DIGIT

        ld      bc, -10
        ld      de, -10 >> 16
        call    DIGIT

        ; ensure last digit always prints
        ld      (ix-1), 1

        ld      bc, -1
        ld      de, -1 >> 16
        call    DIGIT

To clean up the stack frame SP must be restored to where it was after saving IX. The Z80 does have an instruction to set SP to the value of IX, so this part is easy. It would also be possible here to simply increment SP to balance the decrement done on entry.

        pop     iy
        pop     hl
        pop     de
        pop     bc

        ld      sp, ix
        pop     ix
        ret

DIGIT will add DE:BC to HL:IY until carry is reset, incrementing A each time.

DIGIT:  ld      a, '0' - 1
DLOOP:  inc     a
        add     iy, bc
        adc     hl, de
        jr      c, DLOOP

Once the carry flag is reset DE:BC needs to be subtracted from HL:IY, which will be a bit of a shuffle through the stack due to the limited instructions for subtraction.

        push    hl
        push    iy
        pop     hl
        sbc     hl, bc ; "iy"-bc
        push    hl
        pop     iy
        pop     hl
        sbc     hl, de ; hl-de

And last of all, print A if it or the non-zero flag are non-zero.

        cp      (ix-1)
        jr      z, NOPRINT
        out     (1), a
        ld      (ix-1), 1
NOPRINT:
        ret

Let’s fire it up and see what it does.

        jp      DAY01

That looks great. I will ask you to take my word for it that it also succeeds with my puzzle input.

Star 2

The second star for the day simply expands the problem to tracking the top three values rather than the top one. This is easily accomplished by storing the top three sorted: when a new value comes in, compare to the lowest top three. If it’s greater then discard the 3rd place value and store the new one. Then, see if you need to swap 3rd place and 2nd place. If so, also then check if you need to swap 1st and 2nd place.

The main loop is practically unchanged.

TOP3:   defw    0, 0
        defw    0, 0
        defw    0, 0

STAR2:  ld      sp, $8000
        ld      ix, INPUT

OUTER2: ld      hl, 0   ; init sum
        ld      iy, 0
        call    READ
        jr      z, DONE2

INNER2: add     iy, bc
        adc     hl, de
        call    READ
        jr      nz, INNER2

Here I now do a set of comparisons instead of just one.

        ; check if the new sum is in the top 3
        ld      bc, (TOP3)
        ld      de, (TOP3+2)
        call    CMP32
        jr      c, OUTER2
        ld      (TOP3), iy
        ld      (TOP3+2), hl

        ; need to swap 3rd and 2nd?
        ld      bc, (TOP3+4)
        ld      de, (TOP3+6)
        call    CMP32
        jr      c, OUTER2
        ld      (TOP3), bc
        ld      (TOP3+2), de
        ld      (TOP3+4), iy
        ld      (TOP3+6), hl

        ; need to swap 2nd and 1st?
        ld      bc, (TOP3+8)
        ld      de, (TOP3+10)
        call    CMP32
        jr      c, OUTER2
        ld      (TOP3+4), bc
        ld      (TOP3+6), de
        ld      (TOP3+8), iy
        ld      (TOP3+10), hl

        jr      OUTER2

When done there’s also the job of summing the top 3 to take care of.


DONE2:  ld      iy, (TOP3)
        ld      hl, (TOP3+2)
        ld      bc, (TOP3+4)
        ld      de, (TOP3+6)
        add     iy, bc
        adc     hl, de
        ld      bc, (TOP3+8)
        ld      de, (TOP3+10)
        add     iy, bc
        adc     hl, de
        call    PRINT32
        ld      a, '\n'
        out     (1), a
        HALT

And let’s run this one.

        jp      STAR2

There is the expected value, all done and dusted.

The series

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | Day 11 | Day 12 | Day 13 | Day 14 | Day 15 | Day 16 | Day 17 | Day 18 | Day 19 | Day 20 | Day 21 | Day 22 | Day 23 | Day 24 | Day 25 |