λbje - Advent of Z80

"An oil painting of a messy campsite with crates everywhere.", Craiyon.

The Advent of Z80 continues with the Elves cleaning up their campsite and me using a different AI image generator for my banner.

        JP      DAY04

Today I am going to be a little bit cheeky and write the Star 1 solution already knowing how Star 2 will differ such that I can re-use the vast bulk of the code. Don’t be too outraged though - when solving these puzzles I’ll often tweak a Star 1 solution to support the added Star 2 requirements, and no-one is reading these posts in the belief that they’re the transcript of a live coding session. In fact, as far as I am aware no-one is reading these posts at all.

And that’s okay: blogging like this is not about you, dear hypothetical reader. It’s about me and my desires to marshal my thoughts more clearly.

The problem for today is filtering lists of pairs of number ranges; the star 1 predicate will match only pairs where one range fully contains the other, while the star 2 predicate will match only pairs where one range overlaps the other. The result in both cases is a count of matching pairs.

A high level solution might read something like:

    input.stream()
         .filter(predicate)
         .count()

A decent compiler should be able to perform stream fusion on a bit of code like this, recognising that the intermediate list isn’t needed for the final output and effectively only maintaining a count of how many elements matched the filter predicate. That’s exactly what I will write in assembler: a subroutine taking an input list and a predicate and returning the count of matching elements in the list.

; IN:   IX      the input range pairs
;       HL      a predicate function to apply to pairs
; OUT:  DE      the number of matching pairs
        .block
@COUNT_FILTER:
        push    af
        push    bc
        push    hl
        push    ix
        ld      de, 0
        ld      bc, 4   ; for ADD IX, BC

The COUNT_FILTER subroutine will be the workhorse of day 4. Its input data consists of four bytes per pair, with all input numbers ranging between 1 and 99 inclusive. As with other days the end of the input will be marked by a zero byte. Its predicate function will take an input address in IX and set the zero flag if the predicate matches.

Why does the predicate set the zero flag on a match? Returning a boolean value from a subroutine will either set some register to a “truthy” value such as zero for false and non-zero for true, or set a flag to indicate the answer. If it’s a register the caller is almost certainly going to want to test the register to set a flag anyway, because the Z80’s conditional execution flows depend on flags. I may as well use a flag directly. I still need to figure out which flag to use, and which value of the flag means “true”. In this case I chose to use the zero flag and have it set when the predicate matches, to make the code I write for the predicates a bit simpler.

LOOP:   ld      a, (ix+0)
        or      a
        jr      z, DONE

The Z80 has nine ways to call a subroutine: eight conditional calls and one unconditional call. All nine of them take an immediate address as the subroutine destination. Calling a dynamic address isn’t possible. What is possible, though, is jumping to a dynamic address. There are twelve ways to jump to another location, the same nine as for calling another location plus JP (HL), JP (IX) and JP (IY), each of which jump to the given register’s location. The only difference between a CALL and a JP is that the processor pushes the program counter onto the stack before jumping for a CALL - so to emulate a CALL with a JP I only have to push the desired return address first. A quick and easy way to do that is to CALL something, which pushes the address of the following instruction. If the CALL destination is a single instruction to JP (HL) the processor will wind up with the program counter set to HL and the stack containing the desired return address. An additional 17 cycles are used for the indirection, so another option is to set some unused 16-bit register to the desired return address and just PUSH that. The downside to this is you need to reserve one of the few 16-bit registers available just for that purpose, and the Z80 does not have many to use. A PUSH IY is 15 cycles, so it would be fractionally faster to load the return address into IY outside the loop. It’s more common, however, to do an indirect CALL and not need the additional register, so that’s what I’m doing.

        call    BOUNCE
        add     ix, bc

        ; if the predicate sets ZF, it matched
        jr      nz, LOOP
        inc     de
        jr      LOOP

BOUNCE: jp      (hl)

One useful fact about the Z80’s 16-bit addition is that it doesn’t set the zero, sign, or overflow flags. It makes use of the same 4-bit adder as the 8-bit addition instructions, so it’s not clear to me why those flags aren’t also set for 16-bit operands: it may be because the hardware implementation would be difficult, or it may be by design. Either way I can throw in the “+4” operation without resetting the zero flag that should be set or reset by the predicate subroutine.

DONE:   call    PRINT16

        pop     ix
        pop     hl
        pop     bc
        pop     af
        ret
        .endblock

With that done all that remains is to throw together the two predicates required. The first is a “contains” test, which is true if the start and end of one range is greater than or equal and less than or equal the start and end of the other range, respectively.

; IN:   IX      the range pair to test
; OUT:  ZF      set if one range contains the other
;       A       clobbered
        .block
@CONTAINS:
        ; compare the two starts
        ld      a, (ix+0)
        cp      (ix+2)

Having done a CP on the two start values one of three conditions will be true. The left range might start before, at the same point as, or after the right range. A shortcut here is that if the two ranges start at the same point one must contain the other: at least one of the two will be smaller than or equal to the other. Using RET Z immediately returns if the zero flag is set.

If the two start values are not equal the carry flag will be set if the right range’s start value is larger than the left range’s start value. This determines whether I want to check if the left range’s end value is larger than or smaller than the right range’s end value, which is managed by figuring out which one to compare to the other.


        ; if the starts are equal, one range definitely contains the other
        ret     z

        ; CF is set if B.start > A.start
        jr      c, TEST_B

        ; test B.end - A.end
        ld      a, (ix+3)
        cp      (ix+1)

At this point the carry flag will be set if the end of the second range is past the end of the first range. What I want, though, is to set the zero flag if the end of the second range is less than or equal to the end of the first range. I could do some fussing about conditionally jumping and doing things to set or reset the zero flag, but instead what I will do is clear A to zero without touching the carry flag, then rotate the carry flag into A using RL A - not RLA, which doesn’t change the zero flag. This sets bit zero of A equal to the carry flag, setting the zero flag to the inverse of the carry flag.

ENDS:   ld      a, 0    ; don't touch carry
        rl      a       ; A=CF
        ret

If the right range started after the left range, just invert the order of the operands for the comparison.

TEST_B: ; test A.end - B.end
        ld      a, (ix+1)
        cp      (ix+3)
        jr      ENDS

        .endblock

I would like to satisfy myself that this is running more or less to plan, so here’s an execution trace calling the CONTAINS subroutine for the first input range 2-4,6-8.

        .block
        ld      sp, 8000h
        ld      ix, DATA
        call    CONTAINS
        halt
DATA:   db      2, 4, 6, 8
        .endblock

The trace shows the first comparison ( $2 - 6$ ) resets ZF and sets CF, because $6 > 2$ . The RET Z shortcut does not apply, and the set carry flag indicates A.start < B.start. The code jumps to TEST_B which computes the flags for A.end - B.end (in this case, $4 - 8$ ). The carry flag is set because B.end is greater than A.end, and the code jumps to inverting the carry flag into the zero flag with a rotate. The zero flag is cleared, and the subroutine returns with the right answer delivered.

With that looking good, I can produce the answer for star 1:

STAR1:  ld      sp, 8000h
        ld      ix, SAMPLE
        ld      hl, CONTAINS
        call    COUNT_FILTER
        halt

The predicate for star 2 checks whether the ranges overlap. The difference between a “contains” check and an “overlaps” check is only the order in which the pieces are compared. For “contains” I want the start of one range to be before the start of the other and the end of the range to be after the end other, while for “overlaps” I want the start of one range to be before the end of the other and the end of the range to be after the start of the other. Consider a high level expression of this convoluted sentence:

contains = (a.start <= b.start and a.end >= b.end)
        or (b.start <= a.start and b.end >= a.end)
overlaps = (a.start <= b.end and a.end >= b.start)
        or (b.start <= a.end and b.end >= a.start)

Translating this to Z80 assembly just means taking a copy of CONTAINS and changing the displacement added to IX for the loads and compares:

; IN:   IX      the range pair to test
; OUT:  ZF      set if one range overlaps the other
;       A       clobbered
        .block
@OVERLAPS:
        ; A.start <= B.end ?
        ld      a, (ix+0)
        cp      (ix+3)

        ; If equal the ranges overlap
        ret     z

        ; CF is set if B.end > A.start
        jr      c, TEST_B

        ; test A.end >= B.start
        ld      a, (ix+2)
        cp      (ix+1)
        ; cf set if B.start > A.end

ENDS:   ld      a, 0    ; don't touch carry
        rl      a       ; A=CF
        ret

TEST_B: ; test B.start >= A.end
        ld      a, (ix+1)
        cp      (ix+2)
        jr      ENDS

        .endblock

The second star just calls the new predicate:

STAR2:  ld      sp, 8000h
        ld      ix, SAMPLE
        ld      hl, OVERLAPS
        call    COUNT_FILTER
        halt

Today I will also run both parts on my full puzzle input, which is tucked away in an include file.

DAY04:  ld      sp, 8000h
        ld      ix, INPUT
        ld      hl, CONTAINS
        call    COUNT_FILTER
        ld      a, '\n'
        out     (1), a
        ld      hl, OVERLAPS
        call    COUNT_FILTER
        ld      a, '\n'
        out     (1), a
        halt

.include        "day04-input.z80"

You will have to take my word for it that these two answers were correct.

And to wrap up, here’s the include of PRINT16 and the sample input for the day.

.include        "print16.z80"

SAMPLE: db      2, 4, 6, 8
        db      2, 3, 4, 5
        db      5, 7, 7, 9
        db      2, 8, 3, 7
        db      6, 6, 4, 6
        db      2, 6, 4, 8
        db      0

The series

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | Day 11 | Day 12 | Day 13 | Day 14 | Day 15 | Day 16 | Day 17 | Day 18 | Day 19 | Day 20 | Day 21 | Day 22 | Day 23 | Day 24 | Day 25 |