
Advent of Z80 - Day 1, 2022

This post is the first in a series of articles implementing solutions for the 2022 Advent of Code in Z80 assembly. The series aims to show some examples of programming in Z80 assembly and help newcomers to assembly language begin to think about programs from the assembly perspective, and will focus less on finding clever AoC solutions and more on producing understandable assembly code.
JP DAY01
This post is a “literate assembly” document. The code fragments on the page are assembled and executed in your browser. To the right of each statement you can see the bytes that the statement assembled into.
The first puzzle in the AoC 2022 series provides you with a nested list of numbers and requires you to sum the nested lists and find the maximum. In a high level language this might look something like maximum (map sum input)
, which the compiler will helpfully translate into reasonably efficient code.
What I would like to do is iterate over the input summing numbers as I encounter them and keeping track of the current maximum sum at the end of each nested list. In principle this is not complicated, as even an ancient processor like the Z80 can add and compare. In practice there’s a small complication: the problem’s number space is greater than what a 16-bit number can store.
A processor like the Z80 adds up much the same way I learned to in my early schooling years: add up the 1s column, noting any carry, add up the 10s column including the carry from the 1s column, and so on until I’d added up all my columns. The Z80 does the same thing bit by bit: the least significant bit of both inputs is summed and the carry is pushed along to the next bit, and so on. The Z80 has a 4-bit ALU and does half an addition in one cycle then the other half in the next. For a 16-bit add the Z80 does the low byte then the high byte.
The carry result is exposed to applications through the carry flag: if the most significant bits of the addends plus the carry from the previous bit results in another carry, the carry flag is set. To make chaining additions easier there are also ADC
(add-with-carry) instructions that use the value in the carry flag as the carry-in for the least significant bit addition. Adding two 32-bit numbers is more or less as simple as adding two 16-bit numbers, then adding two more 16-bit numbers with carry.
Another wrinkle will emerge about now, however - the Z80 doesn’t have a large number of registers. The other addend for ADC HL, …
must be one of BC
, DE
, HL
, or SP
. There’s no point adding HL
to itself, and the stack pointer (SP
) is needed to hold the current stack value, which only leaves 48 bits of register for 64 bits of value. There are also the IX
and IY
index registers that can stand in for HL
for many, but not all, operations. Importantly they can be used for ADD
but not for ADC
.
I will choose to store the addends for a 32-bit addition in HL:IY
and DE:BC
, so a SUM32
function would look like this - more comment than code.
; IN HL:IY the 32-bit accumulator
; DE:BC the 32-bit addend
; OUT: HL:IY HL:IY + DE:BC
; S set if the result is negative
; Z set if the top 16 bits are zero
; H set if carry from bit 27
; P/V set if signed overflow
; N reset
; C set if carry from bit 31
SUM32: add iy, bc
adc hl, de
ret
Some of the flags are a bit nonsensical. The half-carry bit is the result of the penultimate 4-bit ALU operation, so for an 8-bit addition it’s the carry out of bit 3. For a 16-bit operation it’s the carry out of bit 3 in the top 8 bits, or the carry out of bit 11. The zero flag will be set by the adc
discarding any information from the add
. The other flags will all make sense, however, as they’re based only on the most significant bits.
With a 32-bit addition in hand I’ll also want a comparison of unsigned 32-bit numbers. This will happen using subtraction, because comparison on the Z80 is a subtraction that discards the result and there’s no 16-bit comparison instruction. The routine compares the high words first, only comparing the low words if the high words are equal.
There is no SUB
instruction for 16-bit registers. There is, however, SBC
for the HL
register, which is “subtract with carry”. On the Z80, as with many other processors, this is implemented as “subtract with borrow” - the carry flag becomes a borrow flag. SBC HL, DE
is computed as . Before using this instruction the carry flag must be put into a known state. This can be done using the SCF
instruction to set the carry flag, but there is no dedicated instruction to clear the carry flag. Fortunately, OR A, A
will clear the carry flag in the same number of cycles as SCF
sets it, and with a similar set of side effects on flags.
After the subtraction the carry flag will be set if there was a borrow required for the final bit of the operation. This happens only when the minuend is less than the subtrahend. The carry flag distinguishes between a < b
when set and a >= b
when reset. The zero flag is set only if the two operands are equal.
The 32-bit comparison works by comparing the high words and checking the zero flag. If it’s not set then the carry flag will already show which is greater, and the routine is done. If the zero flag is set then the carry flag must be clear, another subtraction is done, and the flags from that indicate the overall comparison result.
; IN HL:IY the 32-bit minuend
; DE:BC the 32-bit subtrahend
; OUT: Z set if minuend == subtrahend
; C set if minuend < subtrahend
CMP32: push hl
or a, a ; reset carry
sbc hl, de
jr nz, CMP32DONE
push iy
pop hl
sbc hl, bc
CMP32DONE:
pop hl
ret
With 32-bit addition and comparison ready to go the main challenge left to solve is processing the input. For simplicity’s sake the input will be pre-processed to be 32-bit words with zeros separating the sub-lists:
INPUT:
defw 1000, 0
defw 2000, 0
defw 3000, 0
defw 0, 0
defw 4000, 0
defw 0, 0
defw 5000, 0
defw 6000, 0
defw 0, 0
defw 7000, 0
defw 8000, 0
defw 9000, 0
defw 0, 0
defw 10000, 0
defw 0, 0
defw 0, 0
My solution algorithm is going to look like this:
- Initialise the maximum to zero
- While there are lists of numbers left:
- Initialise the sum to zero
- Add each number in the list to the sum
- If the sum is greater than the maximum save it as the new maximum
- Print out the result
Translating this to Z80 assembly means figuring out what variables I need and where I’ll store them. I’ll use four bytes of memory to hold the maximum, because I will only need that once per outer loop. I’ll keep the current sum in HL:IY
where it’s ready to use in my addition and comparison subroutines. BC
and DE
will be used for addends and subtrahends as needed, and that leaves A
for general purpose arithmetic and IX
to point to my input.
I will want to load DE:BC
from (IX)
multiple times, so I’ll pop that in its own routine. It’s important to remember that the Z80 is little endian: when loading or storing a multi-byte value the least significant byte is at the first address. The DEFW 10000, 0
above assembles to the hex bytes $10 $27 $00 $00
, or 16, 39, 0, 0
in decimal. . This means I want to load C
from (IX+0)
up to D
from (IX+3)
. Every time I load the value I also need to test whether it’s zero, so I’ll set the zero flag appropriately as I load.
; IN IX address of next input to read
; OUT DE:BC 32-bit value loaded from (IX)
; IX incremented past read input
; A clobbered
; Z set if the loaded value was zero
READ: ld c, (ix+0)
ld b, (ix+1)
ld e, (ix+2)
ld d, (ix+3)
inc ix
inc ix
inc ix
inc ix
Testing whether DE:BC
is zero is nice and easy: load one of the 8-bit registers into A
then OR
each of the others in. Only if all of them are zero will A
be zero afterwards, with the zero flag conveniently holding that information.
ld a, b
or a, c
or a, d
or a, e
ret
It’s time to set up the main program and initialise the input and maximum variables, as well as of course setting a valid stack pointer value so I can call subroutines.
MAX: DEFW 0, 0
DAY01: ld sp, $8000
ld ix, INPUT
For my outer loop I will clear HL:IY
to zero then load DE:BC
from (IX)
. The outer loop is done if that load sets the zero flag.
OUTER: ld hl, 0 ; init sum
ld iy, 0
call READ
jr z, DONE
I’m ready to enter the inner loop now, where I’ll add DE:BC
to the current sum in HL:IY
, load the next input number, and test whether that’s zero: if it is, I repeat the inner loop.
Because SUM32
is so trivial I’m not even going to call it - I will inline the two instructions right here.
INNER: add iy, bc
adc hl, de
call READ
jr nz, INNER
Having completed a sum I need to compare it to the maximum and update as needed. I can DE:BC
directly from fixed memory, being careful with byte order, then use my CMP32
routine.
ld bc, (MAX)
ld de, (MAX+2)
call CMP32
The carry flag is set if DE:BC
is greater than HL:IY
-if so I can head straight back into the outer loop.
jr c, OUTER
ld (MAX), iy
ld (MAX+2), hl
jr OUTER
There’s nothing left to do now but print out the maximum, using the as-yet undescribed PRINT32
routine, then HALT
.
DONE: ld iy, (MAX)
ld hl, (MAX+2)
call PRINT32
ld a, '\n'
out (1), a
HALT
Printing a decimal number is a bit tricky on the Z80. The range of an unsigned 32-bit number is 0 to 4,294,967,295, so one common approach is to subtract 1,000,000,000 from the number to print until borrow is set and print that as the billions digit, then subtract 100,000,000 until borrow is set, and so on down to subtracting 1. A nice little extra feature is to also keep track of whether there’s been a non-zero digit yet and skip the print until either the first non-zero digit or the last digit.
For the in-browser emulator, writing a value to I/O port 1 will print it to a console HTML element.
I am also out of registers to store the flag indicating whether I’ve seen a non-zero digit or not. This is a good time to introduce a stack frame. When the Z80 CPU sees a CALL
instruction it stores PC
onto the stack and jumps to the target address. When the subroutine is done, it executes a RET
and the PC
is popped off the stack. Around that return address useful things can be stored. Before the CALL
a bit of code might PUSH
arguments that don’t fit into registers, and after the call the subroutine may need to allocate a bit of space on the stack for local variables. The structure of data around the return address is called a stack frame.
To set up a stack frame the first thing that happens on entry to a subroutine is to save IX
and then set it to SP
. There is no instruction to do this directly so IX
is set to zero then SP
is added to it. After that space for local variables is reserved by subtracting the required amount from SP
. Once again there is no handy instruction to add or subtract values to SP
. Open option is to load the fixed offset into HL
, add SP
to it, and transfer the result back to SP
- but this clobbers HL
. Another option is to push a register, which decrements SP
twice but this does unnecessary memory writes. It’s also possible to just DEC SP
repeatedly but this is 6 cycles per decrement and doesn’t scale well to multiple local variables.
CPU support is essential to working with stack frames effectively. The Z80 has indexed addressing with the IX
and IY
registers, allowing a local variable to be accessed as, eg, LD A, (IX-1)
, but it doesn’t have good support for setting up or tearing down a stack frame. The 8086 allowed directly moving the stack pointer to the base pointer register, as well as adding constant values to the stack pointer. The 80186 added ENTER
and LEAVE
instructions that do all the work for you - however, ENTER
wound up overspecified and difficult to implement in processors, and as a result neither instruction is typically used by compilers. On the other end of the scale the 6502 lacks the required indexed addressing mode, and usually a different solution will be required.
For PRINT32
only a single byte of storage is needed, so DEC SP
will work nicely.
; IN HL:IY A 32-bit number to print
; OUT A clobbered
PRINT32:
push ix
ld ix, 0
add ix, sp
dec sp
push bc
push de
push hl
push iy
The flag for whether a non-zero character has been printed will be stored at (IX-1)
. I will use ‘0’ as the flag indicating that only zeros have been seen so far so I can do an easy comparison between the digit to print and the flag: if they’re equal, don’t print. Any non-digit value will serve to indicate printing should happen.
ld (ix-1), '0'
For each of the 10 digit locations DE:BC
will be loaded with the constant to subtract and a subroutine will be called. For the last digit location the non-zero character flag will be forced true, so at least one digit is printed each time.
One more limitation of the Z80 instruction set now rears its head: there is an ADD IY, DE
instruction but no corresponding subtraction. Fortunately subtracing a number is the same as adding its negation and inverting the carry flag’s meaning, so instead of loading positive constants for subtraction I will load negative constants for addition.
ld bc, -1_000_000_000
ld de, -1_000_000_000 >> 16
call DIGIT
ld bc, -100_000_000
ld de, -100_000_000 >> 16
call DIGIT
ld bc, -10_000_000
ld de, -10_000_000 >> 16
call DIGIT
ld bc, -1_000_000
ld de, -1_000_000 >> 16
call DIGIT
ld bc, -100_000
ld de, -100_000 >> 16
call DIGIT
ld bc, -10_000
ld de, -10_000 >> 16
call DIGIT
ld bc, -1_000
ld de, -1_000 >> 16
call DIGIT
ld bc, -100
ld de, -100 >> 16
call DIGIT
ld bc, -10
ld de, -10 >> 16
call DIGIT
; ensure last digit always prints
ld (ix-1), 1
ld bc, -1
ld de, -1 >> 16
call DIGIT
To clean up the stack frame SP
must be restored to where it was after saving IX
. The Z80 does have an instruction to set SP
to the value of IX
, so this part is easy. It would also be possible here to simply increment SP
to balance the decrement done on entry.
pop iy
pop hl
pop de
pop bc
ld sp, ix
pop ix
ret
DIGIT
will add DE:BC
to HL:IY
until carry is reset, incrementing A
each time.
DIGIT: ld a, '0' - 1
DLOOP: inc a
add iy, bc
adc hl, de
jr c, DLOOP
Once the carry flag is reset DE:BC
needs to be subtracted from HL:IY
, which will be a bit of a shuffle through the stack due to the limited instructions for subtraction.
push hl
push iy
pop hl
sbc hl, bc ; "iy"-bc
push hl
pop iy
pop hl
sbc hl, de ; hl-de
And last of all, print A
if it or the non-zero flag are non-zero.
cp (ix-1)
jr z, NOPRINT
out (1), a
ld (ix-1), 1
NOPRINT:
ret
Let’s fire it up and see what it does.
jp DAY01
That looks great. I will ask you to take my word for it that it also succeeds with my puzzle input.
Star 2
The second star for the day simply expands the problem to tracking the top three values rather than the top one. This is easily accomplished by storing the top three sorted: when a new value comes in, compare to the lowest top three. If it’s greater then discard the 3rd place value and store the new one. Then, see if you need to swap 3rd place and 2nd place. If so, also then check if you need to swap 1st and 2nd place.
The main loop is practically unchanged.
TOP3: defw 0, 0
defw 0, 0
defw 0, 0
STAR2: ld sp, $8000
ld ix, INPUT
OUTER2: ld hl, 0 ; init sum
ld iy, 0
call READ
jr z, DONE2
INNER2: add iy, bc
adc hl, de
call READ
jr nz, INNER2
Here I now do a set of comparisons instead of just one.
; check if the new sum is in the top 3
ld bc, (TOP3)
ld de, (TOP3+2)
call CMP32
jr c, OUTER2
ld (TOP3), iy
ld (TOP3+2), hl
; need to swap 3rd and 2nd?
ld bc, (TOP3+4)
ld de, (TOP3+6)
call CMP32
jr c, OUTER2
ld (TOP3), bc
ld (TOP3+2), de
ld (TOP3+4), iy
ld (TOP3+6), hl
; need to swap 2nd and 1st?
ld bc, (TOP3+8)
ld de, (TOP3+10)
call CMP32
jr c, OUTER2
ld (TOP3+4), bc
ld (TOP3+6), de
ld (TOP3+8), iy
ld (TOP3+10), hl
jr OUTER2
When done there’s also the job of summing the top 3 to take care of.
DONE2: ld iy, (TOP3)
ld hl, (TOP3+2)
ld bc, (TOP3+4)
ld de, (TOP3+6)
add iy, bc
adc hl, de
ld bc, (TOP3+8)
ld de, (TOP3+10)
add iy, bc
adc hl, de
call PRINT32
ld a, '\n'
out (1), a
HALT
And let’s run this one.
jp STAR2
There is the expected value, all done and dusted.
The series
Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | Day 11 | Day 12 | Day 13 | Day 14 | Day 15 | Day 16 | Day 17 | Day 18 | Day 19 | Day 20 | Day 21 | Day 22 | Day 23 | Day 24 | Day 25 |