
Advent of Z80 - Day 4, 2022

The Advent of Z80 continues with the Elves cleaning up their campsite and me using a different AI image generator for my banner.
JP DAY04
Today I am going to be a little bit cheeky and write the Star 1 solution already knowing how Star 2 will differ such that I can re-use the vast bulk of the code. Don’t be too outraged though - when solving these puzzles I’ll often tweak a Star 1 solution to support the added Star 2 requirements, and no-one is reading these posts in the belief that they’re the transcript of a live coding session. In fact, as far as I am aware no-one is reading these posts at all.
And that’s okay: blogging like this is not about you, dear hypothetical reader. It’s about me and my desires to marshal my thoughts more clearly.
The problem for today is filtering lists of pairs of number ranges; the star 1 predicate will match only pairs where one range fully contains the other, while the star 2 predicate will match only pairs where one range overlaps the other. The result in both cases is a count of matching pairs.
A high level solution might read something like:
.stream()
input.filter(predicate)
.count()
A decent compiler should be able to perform stream fusion on a bit of code like this, recognising that the intermediate list isn’t needed for the final output and effectively only maintaining a count of how many elements matched the filter predicate. That’s exactly what I will write in assembler: a subroutine taking an input list and a predicate and returning the count of matching elements in the list.
; IN: IX the input range pairs
; HL a predicate function to apply to pairs
; OUT: DE the number of matching pairs
.block
@COUNT_FILTER:
push af
push bc
push hl
push ix
ld de, 0
ld bc, 4 ; for ADD IX, BC
The COUNT_FILTER
subroutine will be the workhorse of day 4. Its input data consists of four bytes per pair, with all input numbers ranging between 1 and 99 inclusive. As with other days the end of the input will be marked by a zero byte. Its predicate function will take an input address in IX
and set the zero flag if the predicate matches.
Why does the predicate set the zero flag on a match? Returning a boolean value from a subroutine will either set some register to a “truthy” value such as zero for false and non-zero for true, or set a flag to indicate the answer. If it’s a register the caller is almost certainly going to want to test the register to set a flag anyway, because the Z80’s conditional execution flows depend on flags. I may as well use a flag directly. I still need to figure out which flag to use, and which value of the flag means “true”. In this case I chose to use the zero flag and have it set when the predicate matches, to make the code I write for the predicates a bit simpler.
LOOP: ld a, (ix+0)
or a
jr z, DONE
The Z80 has nine ways to call a subroutine: eight conditional calls and one unconditional call. All nine of them take an immediate address as the subroutine destination. Calling a dynamic address isn’t possible. What is possible, though, is jumping to a dynamic address. There are twelve ways to jump to another location, the same nine as for calling another location plus JP (HL)
, JP (IX)
and JP (IY)
, each of which jump to the given register’s location. The only difference between a CALL
and a JP
is that the processor pushes the program counter onto the stack before jumping for a CALL
- so to emulate a CALL
with a JP
I only have to push the desired return address first. A quick and easy way to do that is to CALL
something, which pushes the address of the following instruction. If the CALL
destination is a single instruction to JP (HL)
the processor will wind up with the program counter set to HL
and the stack containing the desired return address. An additional 17 cycles are used for the indirection, so another option is to set some unused 16-bit register to the desired return address and just PUSH
that. The downside to this is you need to reserve one of the few 16-bit registers available just for that purpose, and the Z80 does not have many to use. A PUSH IY
is 15 cycles, so it would be fractionally faster to load the return address into IY
outside the loop. It’s more common, however, to do an indirect CALL
and not need the additional register, so that’s what I’m doing.
call BOUNCE
add ix, bc
; if the predicate sets ZF, it matched
jr nz, LOOP
inc de
jr LOOP
BOUNCE: jp (hl)
One useful fact about the Z80’s 16-bit addition is that it doesn’t set the zero, sign, or overflow flags. It makes use of the same 4-bit adder as the 8-bit addition instructions, so it’s not clear to me why those flags aren’t also set for 16-bit operands: it may be because the hardware implementation would be difficult, or it may be by design. Either way I can throw in the “+4” operation without resetting the zero flag that should be set or reset by the predicate subroutine.
DONE: call PRINT16
pop ix
pop hl
pop bc
pop af
ret
.endblock
With that done all that remains is to throw together the two predicates required. The first is a “contains” test, which is true if the start and end of one range is greater than or equal and less than or equal the start and end of the other range, respectively.
; IN: IX the range pair to test
; OUT: ZF set if one range contains the other
; A clobbered
.block
@CONTAINS:
; compare the two starts
ld a, (ix+0)
cp (ix+2)
Having done a CP
on the two start values one of three conditions will be true. The left range might start before, at the same point as, or after the right range. A shortcut here is that if the two ranges start at the same point one must contain the other: at least one of the two will be smaller than or equal to the other. Using RET Z
immediately returns if the zero flag is set.
If the two start values are not equal the carry flag will be set if the right range’s start value is larger than the left range’s start value. This determines whether I want to check if the left range’s end value is larger than or smaller than the right range’s end value, which is managed by figuring out which one to compare to the other.
; if the starts are equal, one range definitely contains the other
ret z
; CF is set if B.start > A.start
jr c, TEST_B
; test B.end - A.end
ld a, (ix+3)
cp (ix+1)
At this point the carry flag will be set if the end of the second range is past the end of the first range. What I want, though, is to set the zero flag if the end of the second range is less than or equal to the end of the first range. I could do some fussing about conditionally jumping and doing things to set or reset the zero flag, but instead what I will do is clear A
to zero without touching the carry flag, then rotate the carry flag into A
using RL A
- not RLA
, which doesn’t change the zero flag. This sets bit zero of A
equal to the carry flag, setting the zero flag to the inverse of the carry flag.
ENDS: ld a, 0 ; don't touch carry
rl a ; A=CF
ret
If the right range started after the left range, just invert the order of the operands for the comparison.
TEST_B: ; test A.end - B.end
ld a, (ix+1)
cp (ix+3)
jr ENDS
.endblock
I would like to satisfy myself that this is running more or less to plan, so here’s an execution trace calling the CONTAINS
subroutine for the first input range 2-4,6-8
.
.block
ld sp, 8000h
ld ix, DATA
call CONTAINS
halt
DATA: db 2, 4, 6, 8
.endblock
The trace shows the first comparison () resets ZF and sets CF, because . The RET Z
shortcut does not apply, and the set carry flag indicates A.start < B.start
. The code jumps to TEST_B
which computes the flags for A.end - B.end
(in this case, ). The carry flag is set because B.end
is greater than A.end
, and the code jumps to inverting the carry flag into the zero flag with a rotate. The zero flag is cleared, and the subroutine returns with the right answer delivered.
With that looking good, I can produce the answer for star 1:
STAR1: ld sp, 8000h
ld ix, SAMPLE
ld hl, CONTAINS
call COUNT_FILTER
halt
The predicate for star 2 checks whether the ranges overlap. The difference between a “contains” check and an “overlaps” check is only the order in which the pieces are compared. For “contains” I want the start of one range to be before the start of the other and the end of the range to be after the end other, while for “overlaps” I want the start of one range to be before the end of the other and the end of the range to be after the start of the other. Consider a high level expression of this convoluted sentence:
= (a.start <= b.start and a.end >= b.end)
contains or (b.start <= a.start and b.end >= a.end)
= (a.start <= b.end and a.end >= b.start)
overlaps or (b.start <= a.end and b.end >= a.start)
Translating this to Z80 assembly just means taking a copy of CONTAINS
and changing the displacement added to IX
for the loads and compares:
; IN: IX the range pair to test
; OUT: ZF set if one range overlaps the other
; A clobbered
.block
@OVERLAPS:
; A.start <= B.end ?
ld a, (ix+0)
cp (ix+3)
; If equal the ranges overlap
ret z
; CF is set if B.end > A.start
jr c, TEST_B
; test A.end >= B.start
ld a, (ix+2)
cp (ix+1)
; cf set if B.start > A.end
ENDS: ld a, 0 ; don't touch carry
rl a ; A=CF
ret
TEST_B: ; test B.start >= A.end
ld a, (ix+1)
cp (ix+2)
jr ENDS
.endblock
The second star just calls the new predicate:
STAR2: ld sp, 8000h
ld ix, SAMPLE
ld hl, OVERLAPS
call COUNT_FILTER
halt
Today I will also run both parts on my full puzzle input, which is tucked away in an include file.
DAY04: ld sp, 8000h
ld ix, INPUT
ld hl, CONTAINS
call COUNT_FILTER
ld a, '\n'
out (1), a
ld hl, OVERLAPS
call COUNT_FILTER
ld a, '\n'
out (1), a
halt
include "day04-input.z80" .
You will have to take my word for it that these two answers were correct.
And to wrap up, here’s the include of PRINT16
and the sample input for the day.
include "print16.z80"
.
SAMPLE: db 2, 4, 6, 8
2, 3, 4, 5
db 5, 7, 7, 9
db 2, 8, 3, 7
db 6, 6, 4, 6
db 2, 6, 4, 8
db 0 db
The series
Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | Day 11 | Day 12 | Day 13 | Day 14 | Day 15 | Day 16 | Day 17 | Day 18 | Day 19 | Day 20 | Day 21 | Day 22 | Day 23 | Day 24 | Day 25 |