
Advent of Z80 - Day 3, 2022

It’s day three of the Advent of Z80. As with day 1 and day 2 the post will focus more on the Z80 assembly than on the puzzle, and is a “literate assembly” document that will execute in your browser.
JP DAY03
Today’s puzzle introduces some additional solution elements to play with. There’s a bit of string processing to do: input strings need to be split into two equal halves and checked for characters in common.
Here’s the sample input, newline separated, and zero terminated.
SAMPLE:
'vJrwpWtwJgWrhcsFMMfFFhFp\n'
db 'jqHRNqRjqzjGDLGLrsFMfFZSrLrFZsSL\n'
db 'PmmdzqPrVvPwwTWBwg\n'
db 'wMqvLMZHhHMvwLHjbvcjnnSBnvTQFn\n'
db 'ttgJtRGJQctTZtZT\n'
db 'CrZsJsPPZsGzwwsLwLmpwMDw\n'
db 0 db
Star 1’s challenge is to find the character in common between the first and second halves of each string, convert them to a number, and sum the numbers.
I will start by pre-processing the input from characters to priority numbers, 1-52 for a-z and A-Z. This code will convert all other characters to zero, such that those newlines become NULs instead of being mixed up with converted ’m’s.
.block
; IN: hl the string to convert to priorities
; OUT: all alphabetic characters at (hl) converted
@PRIORITISE:
push af
push hl
dec hl
LOOP: inc hl
ld a, (hl)
; finish on NUL
or a
jr z, DONE
; test for A-Z
cp 'A'
jr c, ZERO
cp 'Z'+1
jr nc, LOWER
sub a, 'A'-27
ld (hl), a
jr LOOP
; test for a-z
LOWER: cp 'a'
jr c, ZERO
cp 'z'+1
jr nc, ZERO
sub a, 'a'-1
ld (hl), a
jr LOOP
ZERO: xor a
ld (hl), a
jr LOOP
DONE: pop hl
pop af
ret
.endblock
Let’s see that in action:
ld sp, $8000
ld hl, SAMPLE
call PRIORITISE
halt
The next challenge is to find the midpoint of each string, which involves scanning along the string until a NUL is encountered.
.block
; IN: hl the data to find the midpoint of
; OUT: a the number of bytes in each half
; This function assumes an even number of bytes. If the terminating NUL
; falls on an odd byte location the function will overshoot the NUL.
@MIDPOINT:
push hl
push bc
ld c, 0
LOOP: ld a, (hl)
or a
jr z, DONE
inc c
inc hl
inc hl
jr LOOP
DONE: ld a, c
pop bc
pop hl
ret
.endblock
And let’s see that in action on the sample input (converted):
ld sp, $8000
ld hl, SAMPLE
call PRIORITISE
call MIDPOINT
halt
If all went well A
will contain the midpoint of vJrwpWtwJgWrhcsFMMfFFhFp
, which is 12 or $0c
.
The next step is to find the letter in common. Rather than naively doing a quadratic comparison approach I will implement a basic bit-based set for numbers 0 to 52, taking up 7 bytes of space . I will need a function to OR a sequence of numbers into a set, for loading the first set of numbers, and a function to AND another sequence in, for mixing the second set of numbers in.
I will require that sets are aligned such that adding up to 6 to the set address doesn’t overflow 8 bits. Finding a bit address can be done cheaply by dividing by 8 then shifting 1000_0000b
right by the remainder. I shift right rather than left such that value sets bit in the set: if I shifted left everything would still work correctly but the bits within each byte would be reversed.
Here’s some space carved out for a set, aligned to not overflow in 8 bits:
.align 256, 0, 7
SEEN: defs 7
There will be two pointers in action: one pointing to the 7 bytes of the set, and one pointing to the source number sequence. I can either use DE
or IX/IY
for the second pointer. DE
is generally faster to work with, as swapping DE
and HL
with EX DE, HL
takes 4 cycles, loading from (HL)
takes 7 cycles, and loading from (IX+0)
takes 19 cycles. A sequence of exchange, load from (HL)
, and exchange is the same number of bytes as a single load from (IX+0)
but one clock cycle faster, while an INC DE
is one byte and six clocks compared to INC IX
at two bytes and ten cycles.
Doing OR is straightforward - each value in the source data sets a bit. Loop over the source data, find the address of the bit to set, and set it.
.block
; IN: hl pointer to the source data to load
; de pointer to the 7-byte set to update
; c the number of values to load
; OUT: 7-byte set ORed with source data
@SET_OR:
push af
push bc
push de
push hl
; for DJNZ...
ld b, c
LOOP: push de ; hang onto this
ld a, (hl)
ld c, a ; keep it for later
; divide by 8
srl a
srl a
srl a
; add to set address
add a, e
ld e, a
; get the low 8 bits
ld a, 7
and a, c
; compute $80 << a
inc a
ld c, a
ld a, $80
SHIFT: dec c
jr z, DONE
rrca
jr SHIFT
; OR with (de)
DONE: ex de, hl
or a, (hl)
ld (hl), a
ex de, hl
inc hl
pop de
djnz LOOP
pop hl
pop de
pop bc
pop af
ret
.endblock
Let’s see what happens when this is run on the first half of the first input string:
ld sp, $8000
ld hl, SAMPLE
ld de, SEEN
call PRIORITISE
call MIDPOINT
ld c, a
call SET_OR
halt
The string vJrwpWtwJgWr
translates to priorities 22, 36, 18, 23, 16, 49, 20, 23, 36, 7, 49, 18
, or uniquely 7, 16, 18, 20, 22, 23, 36, 49
. Writing 7 into the set will turn on bit 7 of byte 0. In byte 2, 16 will set bit 0, 18 will set bit 2, 20, 22, and 23 will set bits 4, 6, and 7. 36 sets bit 4 of byte 4. 49 sets bit 1 of byte 6. All up, these numbers turn into the bit set 01 00 ab 00 08 00 40
, which is what I get above.
-abc defg hijk lmno pqrs tuvw xyzA BCDE FGHI JKLM NOPQ RSTU VWXY Z--- 0000 0001 0000 0000 1010 1011 0000 0000 0000 1000 0000 0000 0100 0000 01 00 ab 00 08 00 40
AND is a little less straightforward. Naively one might simply replace the ‘OR’ above with an ‘AND’, but the operation desired is the intersection of two sets. Doing one AND per input value will clear other bits in any updated byte and ignore all bits in any byte that isn’t updated. Preferable is to compute the set for the new input data then perform seven AND operations - this requires an extra seven bytes of space. In day 1 I used the stack to carve out temporary space. I will do that again here, but as I do so I’ll ensure that the resulting space is aligned. I’ll do this a little bit wastefully by reserving 15 bytes instead of 7, and rounding the set address up to the nearest 8 byte boundary.
Doing arithmetic on SP
in the frame setup code requires the use of HL
. I don’t much want to clobber HL, but in my subroutine preamble I’ll preserve registers I modify during the subroutine anyway. The original value will be available at (IX+2)
. Accessing word data via IX
is extremely slow at 38 cycles, so in I will avoid it as much as possible, but the value is there. However, the first thing I’m going to do is OR the incoming source data into the new buffer, so I will want the original HL
straight away, and to put the address of the new buffer into DE
, so I’ll swap DE
and HL
early on.
.block
; IN: hl pointer to the source data to load
; de pointer to the 7-byte set to update
; b the number of values to load
; OUT: 7-byte set ANDed with source data
@SET_AND:
; save modified registers
push af ; ix+8,9
push bc ; ix+6,7
push de ; ix+4,5
push hl ; ix+2,3
; set up stack frame, 8-16 bytes of local storage
push ix ; ix+0,1
ld ix, 0
add ix, sp
ex de, hl ; I want the value in HL
ld hl, -16
add hl, sp
ld sp, hl
; align HL to an 8-byte boundary
; HL = (HL + 8) & ~7
ld bc, 8
add hl, bc
ld a, 1111_1000b
and l
ld l, a
; initialise the new buffer
push de
push hl
ld d, h
ld e, l
inc de
xor a
ld (hl), a
ld bc, 6
ldir
; these POPs swap DE and HL
pop de ; = stack buffer
pop hl ; = source data
; load the source data into the new buffer
ld c, (ix+6)
call SET_OR
; AND data between the buffers
ld l, (ix+4)
ld h, (ix+5)
ld bc, 07ffh
LOOP: ld a, (hl)
ex de, hl
and (hl)
ex de, hl
ld (hl), a
inc hl
inc de
djnz LOOP
; restore stack frame
ld hl, 16
add hl, sp
ld sp, hl
pop ix
; restore modified registers
pop hl
pop de
pop bc
pop af
ret
.endblock
Adding in the AND
call should result in a single bit being set:
ld sp, $8000
ld hl, SAMPLE
ld de, SEEN
call PRIORITISE
call MIDPOINT
ld c, a
call SET_OR
add l
ld l, a
call SET_AND
halt
X = vJrwpWtwJgWr && hcsFMMfFFhFp -abc defg hijk lmno pqrs tuvw xyzA BCDE FGHI JKLM NOPQ RSTU VWXY Z--- 0000 0001 0000 0000 1010 1011 0000 0000 0000 1000 0000 0000 0100 0000 & 0001 0010 1000 0000 1001 0000 0000 0000 1000 0001 0000 0000 0000 0000 = 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = 00 00 80 00 00 00 00
The SET_AND
implementation takes around 3,500 clock cycles to execute, around 2ms on a TRS-80 Model 1.
The last function I’ll need will find the first one bit in a set, so I can find the common bit. After finding a non-zero byte I will rlca
until the carry flag is set. This instruction rotates the bits in A
left, with bit 7 going into the carry flag and bit 0. If the carry flag is set on the first rotate it means bit 7 was set. Because A
is non-zero I know at least one bit is set and I don’t need any other loop guard: just keep rotating until carry is set.
As a small help to myself this function will also unset the discovered bit; if all is going well this will be the only set bit and clearing it will leave the buffer ready to use again for the next pair of input strings.
.block
; IN: de the set to read
; OUT: a the first set bit, or 0ffh if none are set
@FIRST: push bc
push de
; work out of (hl)
ex de, hl
; for each byte ...
ld bc, 700h
BYTE: ld a, (hl)
; skip zeros
or a
jr nz, FOUND
; move along 8 bits
ld a, 8
add c
ld c, a
inc hl
djnz BYTE
; didn't find anything...
ld c, 0ffh
DONE: ld a, c
; switch these back now
ex de, hl
pop de
pop bc
ret
; found a non-zero byte, clear it and find the first bit
FOUND: ld b, 0
ld (hl), b
inc c
rlca
jr nc, FOUND
; overshot it by one...
dec c
jr DONE
.endblock
Let’s see if this can successfully find ‘p’ in the sample data:
.block
ld sp, $8000
ld de, ONEBIT
call FIRST
halt
.align 256, 0, 7
ONEBIT: db 00h, 00h, 80h, 00h, 00h, 00h, 00h
.endblock
This test is succcessful if A
is the priority of ‘p’, which is 16 or $10
.
It would also be possible to merge SET_AND
and FIRST
into a single routine that finds an arbitrary bit in common with an input string and a bit set. This would likely shave many cycles off the star 1 test, but at the cost of readability. The Z80 comes from an era where machines had extremely limited resources, and good programmers would always be looking for ways to save a few bytes or cycles in their programs. As this series progresses I will be more aggressively pursuing optimisations, not because there’s any real need for the performance gains here, but to build up more understanding of the ways in which Z80 assembly can be wrangled. Here in day 3, I’ll stick with the more readable version.
Once again, the final output will be a 16-bit number that needs to be printed. Rather than duplicating the same functions over and over again I will refer to external definitions and include them.
include "print16.z80" .
I now have all the pieces in place to solve star 1: loop over the input, splitting each string into two, putting the two halves into sets, finding the common entry, summing them up, and printing the answer.
.block
@STAR1: ld sp, $8000
ld hl, SAMPLE
ld de, SEEN
; convert to priority values
call PRIORITISE
; loop over each input
LOOP: ld a, (hl)
or a
jr z, DONE
; find the midpoint
call MIDPOINT
ld b, 0
ld c, a
; load it in to SEEN
call SET_OR
; increment HL past it
add hl, bc
; combine it in SEEN
call SET_AND
; find the common value and reset SEEN
call FIRST
; running sum
push hl
push bc
ld c, a ; b = 0
ld hl, (SUM)
add hl, bc
ld (SUM), hl
pop bc
pop hl
; move to next input
add hl, bc
inc hl ; NUL terminator
jr LOOP
DONE: ld de, (SUM)
call PRINT16
halt
; running total of priorities in common
SUM: dw 0
.endblock
Star 2
Now, the puzzle changes to finding a common letter across a group of three inputs. I can re-use my bit set subroutines, but I need to find the length of each full string instead of just the mid point. Since that’s the mid point times two I can also just re-use that subroutine as well. I will assume that the input is well-formed, again: there will always be a multiple of three lines.
The general flow is the same as for Star 1 - convert to priorities, loop while the input doesn’t start with NUL, load one value in with SET_OR
, load other value(s) in with SET_AND
, find the FIRST
bit set, and sum it up.
.block
@STAR2: ld sp, $8000
ld hl, SAMPLE
ld de, SEEN
; convert to priority values
call PRIORITISE
; loop over each input
LOOP: ld a, (hl)
or a
jr z, DONE
; find the midpoint
call MIDPOINT
rla ; double it
ld b, 0
ld c, a
; load in the first input
call SET_OR
; increment HL past it
add hl, bc
inc hl
; second input - find midpoint
call MIDPOINT
rla ; double it
ld b, 0
ld c, a
; mix it with the first input
call SET_AND
; increment HL past it
add hl, bc
inc hl
; third input - find midpoint
call MIDPOINT
rla ; double it
ld b, 0
ld c, a
; mix it with the first two inputs
call SET_AND
; increment HL past it
add hl, bc
inc hl
; find the common value and reset SEEN
call FIRST
; running sum
push hl
push bc
ld c, a ; b = 0
ld hl, (SUM)
add hl, bc
ld (SUM), hl
pop bc
pop hl
; go back for more input
jr LOOP
DONE: ld de, (SUM)
call PRINT16
halt
; running total of priorities in common
SUM: dw 0
.endblock
Today’s puzzle was again quite a simple one, and the translation to Z80 assembly was correspondingly simple. The second star took around 60,000 cycles to compute on the small sample input alone, clocking in at around 33ms on my imaginary TRS-80 Model 1, or 1.2ms on my real eZ80 board.
DAY03: call STAR1
call STAR2
halt
The series
Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | Day 11 | Day 12 | Day 13 | Day 14 | Day 15 | Day 16 | Day 17 | Day 18 | Day 19 | Day 20 | Day 21 | Day 22 | Day 23 | Day 24 | Day 25 |