Stable Diffusion prompted with "christmas knapsacks piled up on a beach, cartoon style".

It’s day three of the Advent of Z80. As with day 1 and day 2 the post will focus more on the Z80 assembly than on the puzzle, and is a “literate assembly” document that will execute in your browser.

        JP      DAY03

Today’s puzzle introduces some additional solution elements to play with. There’s a bit of string processing to do: input strings need to be split into two equal halves and checked for characters in common.

Here’s the sample input, newline separated, and zero terminated.

SAMPLE:
        db      'vJrwpWtwJgWrhcsFMMfFFhFp\n'
        db      'jqHRNqRjqzjGDLGLrsFMfFZSrLrFZsSL\n'
        db      'PmmdzqPrVvPwwTWBwg\n'
        db      'wMqvLMZHhHMvwLHjbvcjnnSBnvTQFn\n'
        db      'ttgJtRGJQctTZtZT\n'
        db      'CrZsJsPPZsGzwwsLwLmpwMDw\n'
        db      0

Star 1’s challenge is to find the character in common between the first and second halves of each string, convert them to a number, and sum the numbers.

I will start by pre-processing the input from characters to priority numbers, 1-52 for a-z and A-Z. This code will convert all other characters to zero, such that those newlines become NULs instead of being mixed up with converted ’m’s.

        .block
; IN:   hl      the string to convert to priorities
; OUT:          all alphabetic characters at (hl) converted
@PRIORITISE:
        push    af
        push    hl
        dec     hl

LOOP:   inc     hl
        ld      a, (hl)

        ; finish on NUL
        or      a
        jr      z, DONE

        ; test for A-Z
        cp      'A'
        jr      c, ZERO
        cp      'Z'+1
        jr      nc, LOWER
        sub     a, 'A'-27
        ld      (hl), a
        jr      LOOP

        ; test for a-z
LOWER:  cp      'a'
        jr      c, ZERO
        cp      'z'+1
        jr      nc, ZERO
        sub     a, 'a'-1
        ld      (hl), a
        jr      LOOP

ZERO:   xor     a
        ld      (hl), a
        jr      LOOP

DONE:   pop     hl
        pop     af
        ret

        .endblock

Let’s see that in action:

        ld      sp, $8000
        ld      hl, SAMPLE
        call    PRIORITISE
        halt

The next challenge is to find the midpoint of each string, which involves scanning along the string until a NUL is encountered.

        .block
; IN:   hl      the data to find the midpoint of
; OUT:  a       the number of bytes in each half
; This function assumes an even number of bytes. If the terminating NUL
; falls on an odd byte location the function will overshoot the NUL.
@MIDPOINT:
        push    hl
        push    bc

        ld      c, 0
LOOP:   ld      a, (hl)
        or      a
        jr      z, DONE
        inc     c
        inc     hl
        inc     hl
        jr      LOOP

DONE:   ld      a, c
        pop     bc
        pop     hl
        ret
        .endblock

And let’s see that in action on the sample input (converted):

        ld      sp, $8000
        ld      hl, SAMPLE
        call    PRIORITISE
        call    MIDPOINT
        halt

If all went well A will contain the midpoint of vJrwpWtwJgWrhcsFMMfFFhFp, which is 12 or $0c.

The next step is to find the letter in common. Rather than naively doing a quadratic comparison approach I will implement a basic bit-based set for numbers 0 to 52, taking up 7 bytes of space . I will need a function to OR a sequence of numbers into a set, for loading the first set of numbers, and a function to AND another sequence in, for mixing the second set of numbers in.

I will require that sets are aligned such that adding up to 6 to the set address doesn’t overflow 8 bits. Finding a bit address can be done cheaply by dividing by 8 then shifting 1000_0000b right by the remainder. I shift right rather than left such that value nn sets bit nn in the set: if I shifted left everything would still work correctly but the bits within each byte would be reversed.

Here’s some space carved out for a set, aligned to not overflow in 8 bits:

        .align 256, 0, 7
SEEN:   defs    7

There will be two pointers in action: one pointing to the 7 bytes of the set, and one pointing to the source number sequence. I can either use DE or IX/IY for the second pointer. DE is generally faster to work with, as swapping DE and HL with EX DE, HL takes 4 cycles, loading from (HL) takes 7 cycles, and loading from (IX+0) takes 19 cycles. A sequence of exchange, load from (HL), and exchange is the same number of bytes as a single load from (IX+0) but one clock cycle faster, while an INC DE is one byte and six clocks compared to INC IX at two bytes and ten cycles.

Doing OR is straightforward - each value in the source data sets a bit. Loop over the source data, find the address of the bit to set, and set it.

        .block
; IN:   hl      pointer to the source data to load
;       de      pointer to the 7-byte set to update
;       c       the number of values to load
; OUT:          7-byte set ORed with source data
@SET_OR:
        push    af
        push    bc
        push    de
        push    hl

        ; for DJNZ...
        ld      b, c

LOOP:   push    de      ; hang onto this
        ld      a, (hl)
        ld      c, a    ; keep it for later

        ; divide by 8
        srl     a
        srl     a
        srl     a

        ; add to set address
        add     a, e
        ld      e, a

        ; get the low 8 bits
        ld      a, 7
        and     a, c

        ; compute $80 << a
        inc     a
        ld      c, a
        ld      a, $80
SHIFT:  dec     c
        jr      z, DONE
        rrca
        jr      SHIFT

        ; OR with (de)
DONE:   ex      de, hl
        or      a, (hl)
        ld      (hl), a
        ex      de, hl

        inc     hl
        pop     de
        djnz    LOOP

        pop     hl
        pop     de
        pop     bc
        pop     af
        ret
        .endblock

Let’s see what happens when this is run on the first half of the first input string:

        ld      sp, $8000
        ld      hl, SAMPLE
        ld      de, SEEN
        call    PRIORITISE
        call    MIDPOINT
        ld      c, a
        call    SET_OR
        halt

The string vJrwpWtwJgWr translates to priorities 22, 36, 18, 23, 16, 49, 20, 23, 36, 7, 49, 18, or uniquely 7, 16, 18, 20, 22, 23, 36, 49. Writing 7 into the set will turn on bit 7 of byte 0. In byte 2, 16 will set bit 0, 18 will set bit 2, 20, 22, and 23 will set bits 4, 6, and 7. 36 sets bit 4 of byte 4. 49 sets bit 1 of byte 6. All up, these numbers turn into the bit set 01 00 ab 00 08 00 40, which is what I get above.

-abc defg  hijk lmno  pqrs tuvw  xyzA BCDE  FGHI JKLM  NOPQ RSTU  VWXY Z---
0000 0001  0000 0000  1010 1011  0000 0000  0000 1000  0000 0000  0100 0000
       01         00         ab         00         08         00         40

AND is a little less straightforward. Naively one might simply replace the ‘OR’ above with an ‘AND’, but the operation desired is the intersection of two sets. Doing one AND per input value will clear other bits in any updated byte and ignore all bits in any byte that isn’t updated. Preferable is to compute the set for the new input data then perform seven AND operations - this requires an extra seven bytes of space. In day 1 I used the stack to carve out temporary space. I will do that again here, but as I do so I’ll ensure that the resulting space is aligned. I’ll do this a little bit wastefully by reserving 15 bytes instead of 7, and rounding the set address up to the nearest 8 byte boundary.

Doing arithmetic on SP in the frame setup code requires the use of HL. I don’t much want to clobber HL, but in my subroutine preamble I’ll preserve registers I modify during the subroutine anyway. The original value will be available at (IX+2). Accessing word data via IX is extremely slow at 38 cycles, so in I will avoid it as much as possible, but the value is there. However, the first thing I’m going to do is OR the incoming source data into the new buffer, so I will want the original HL straight away, and to put the address of the new buffer into DE, so I’ll swap DE and HL early on.

        .block
; IN:   hl      pointer to the source data to load
;       de      pointer to the 7-byte set to update
;       b       the number of values to load
; OUT:          7-byte set ANDed with source data
@SET_AND:
        ; save modified registers
        push    af      ; ix+8,9
        push    bc      ; ix+6,7
        push    de      ; ix+4,5
        push    hl      ; ix+2,3

        ; set up stack frame, 8-16 bytes of local storage
        push    ix      ; ix+0,1
        ld      ix, 0
        add     ix, sp
        ex      de, hl  ; I want the value in HL
        ld      hl, -16
        add     hl, sp
        ld      sp, hl

        ; align HL to an 8-byte boundary
        ; HL = (HL + 8) & ~7
        ld      bc, 8
        add     hl, bc
        ld      a, 1111_1000b
        and     l
        ld      l, a

        ; initialise the new buffer
        push    de
        push    hl
        ld      d, h
        ld      e, l
        inc     de
        xor     a
        ld      (hl), a
        ld      bc, 6
        ldir

        ; these POPs swap DE and HL
        pop     de      ; = stack buffer
        pop     hl      ; = source data

        ; load the source data into the new buffer
        ld      c, (ix+6)
        call    SET_OR

        ; AND data between the buffers
        ld      l, (ix+4)
        ld      h, (ix+5)
        ld      bc, 07ffh
LOOP:   ld      a, (hl)
        ex      de, hl
        and     (hl)
        ex      de, hl
        ld      (hl), a
        inc     hl
        inc     de
        djnz    LOOP

        ; restore stack frame
        ld      hl, 16
        add     hl, sp
        ld      sp, hl
        pop     ix

        ; restore modified registers
        pop     hl
        pop     de
        pop     bc
        pop     af
        ret
        .endblock

Adding in the AND call should result in a single bit being set:

        ld      sp, $8000
        ld      hl, SAMPLE
        ld      de, SEEN
        call    PRIORITISE
        call    MIDPOINT
        ld      c, a
        call    SET_OR
        add     l
        ld      l, a
        call    SET_AND
        halt
X = vJrwpWtwJgWr && hcsFMMfFFhFp
    -abc defg  hijk lmno  pqrs tuvw  xyzA BCDE  FGHI JKLM  NOPQ RSTU  VWXY Z---
    0000 0001  0000 0000  1010 1011  0000 0000  0000 1000  0000 0000  0100 0000
  & 0001 0010  1000 0000  1001 0000  0000 0000  1000 0001  0000 0000  0000 0000
  = 0000 0000  0000 0000  1000 0000  0000 0000  0000 0000  0000 0000  0000 0000
  =    00         00         80         00         00         00         00

The SET_AND implementation takes around 3,500 clock cycles to execute, around 2ms on a TRS-80 Model 1.

The last function I’ll need will find the first one bit in a set, so I can find the common bit. After finding a non-zero byte I will rlca until the carry flag is set. This instruction rotates the bits in A left, with bit 7 going into the carry flag and bit 0. If the carry flag is set on the first rotate it means bit 7 was set. Because A is non-zero I know at least one bit is set and I don’t need any other loop guard: just keep rotating until carry is set.

As a small help to myself this function will also unset the discovered bit; if all is going well this will be the only set bit and clearing it will leave the buffer ready to use again for the next pair of input strings.

        .block
; IN:   de      the set to read
; OUT:  a       the first set bit, or 0ffh if none are set
@FIRST: push    bc
        push    de

        ; work out of (hl)
        ex      de, hl

        ; for each byte ...
        ld      bc, 700h
BYTE:   ld      a, (hl)

        ; skip zeros
        or      a
        jr      nz, FOUND

        ; move along 8 bits
        ld      a, 8
        add     c
        ld      c, a

        inc     hl
        djnz    BYTE

        ; didn't find anything...
        ld      c, 0ffh

DONE:   ld      a, c

        ; switch these back now
        ex      de, hl

        pop     de
        pop     bc
        ret

        ; found a non-zero byte, clear it and find the first bit
FOUND:  ld      b, 0
        ld      (hl), b
        inc     c
        rlca
        jr      nc, FOUND

        ; overshot it by one...
        dec     c
        jr      DONE

        .endblock

Let’s see if this can successfully find ‘p’ in the sample data:

        .block
        ld      sp, $8000
        ld      de, ONEBIT
        call    FIRST
        halt

        .align 256, 0, 7
ONEBIT: db      00h, 00h, 80h, 00h, 00h, 00h, 00h
        .endblock

This test is succcessful if A is the priority of ‘p’, which is 16 or $10.

It would also be possible to merge SET_AND and FIRST into a single routine that finds an arbitrary bit in common with an input string and a bit set. This would likely shave many cycles off the star 1 test, but at the cost of readability. The Z80 comes from an era where machines had extremely limited resources, and good programmers would always be looking for ways to save a few bytes or cycles in their programs. As this series progresses I will be more aggressively pursuing optimisations, not because there’s any real need for the performance gains here, but to build up more understanding of the ways in which Z80 assembly can be wrangled. Here in day 3, I’ll stick with the more readable version.

Once again, the final output will be a 16-bit number that needs to be printed. Rather than duplicating the same functions over and over again I will refer to external definitions and include them.

        .include        "print16.z80"

I now have all the pieces in place to solve star 1: loop over the input, splitting each string into two, putting the two halves into sets, finding the common entry, summing them up, and printing the answer.

        .block
@STAR1: ld      sp, $8000
        ld      hl, SAMPLE
        ld      de, SEEN

        ; convert to priority values
        call    PRIORITISE

        ; loop over each input
LOOP:   ld      a, (hl)
        or      a
        jr      z, DONE

        ; find the midpoint
        call    MIDPOINT
        ld      b, 0
        ld      c, a

        ; load it in to SEEN
        call    SET_OR

        ; increment HL past it
        add     hl, bc

        ; combine it in SEEN
        call    SET_AND

        ; find the common value and reset SEEN
        call    FIRST

        ; running sum
        push    hl
        push    bc
        ld      c, a    ; b = 0
        ld      hl, (SUM)
        add     hl, bc
        ld      (SUM), hl
        pop     bc
        pop     hl

        ; move to next input
        add     hl, bc
        inc     hl ; NUL terminator
        jr      LOOP

DONE:   ld      de, (SUM)
        call    PRINT16
        halt

        ; running total of priorities in common
SUM:    dw      0

        .endblock

Star 2

Now, the puzzle changes to finding a common letter across a group of three inputs. I can re-use my bit set subroutines, but I need to find the length of each full string instead of just the mid point. Since that’s the mid point times two I can also just re-use that subroutine as well. I will assume that the input is well-formed, again: there will always be a multiple of three lines.

The general flow is the same as for Star 1 - convert to priorities, loop while the input doesn’t start with NUL, load one value in with SET_OR, load other value(s) in with SET_AND, find the FIRST bit set, and sum it up.

        .block
@STAR2: ld      sp, $8000
        ld      hl, SAMPLE
        ld      de, SEEN

        ; convert to priority values
        call    PRIORITISE

        ; loop over each input
LOOP:   ld      a, (hl)
        or      a
        jr      z, DONE

        ; find the midpoint
        call    MIDPOINT
        rla             ; double it
        ld      b, 0
        ld      c, a

        ; load in the first input
        call    SET_OR

        ; increment HL past it
        add     hl, bc
        inc     hl

        ; second input - find midpoint
        call    MIDPOINT
        rla             ; double it
        ld      b, 0
        ld      c, a

        ; mix it with the first input
        call    SET_AND

        ; increment HL past it
        add     hl, bc
        inc     hl

        ; third input - find midpoint
        call    MIDPOINT
        rla             ; double it
        ld      b, 0
        ld      c, a

        ; mix it with the first two inputs
        call    SET_AND

        ; increment HL past it
        add     hl, bc
        inc     hl

        ; find the common value and reset SEEN
        call    FIRST

        ; running sum
        push    hl
        push    bc
        ld      c, a    ; b = 0
        ld      hl, (SUM)
        add     hl, bc
        ld      (SUM), hl
        pop     bc
        pop     hl

        ; go back for more input
        jr      LOOP

DONE:   ld      de, (SUM)
        call    PRINT16
        halt

        ; running total of priorities in common
SUM:    dw      0

        .endblock

Today’s puzzle was again quite a simple one, and the translation to Z80 assembly was correspondingly simple. The second star took around 60,000 cycles to compute on the small sample input alone, clocking in at around 33ms on my imaginary TRS-80 Model 1, or 1.2ms on my real eZ80 board.

DAY03:  call    STAR1
        call    STAR2
        halt

The series

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | Day 11 | Day 12 | Day 13 | Day 14 | Day 15 | Day 16 | Day 17 | Day 18 | Day 19 | Day 20 | Day 21 | Day 22 | Day 23 | Day 24 | Day 25 |