Victory. Solved the memory corruption bug.

The problem was in the interrupt handling code which I borrowed along with most of the rest of the basic setup code to get things going from Howard Price’s Flappy Bird clone.

The interrupt handlers are stored as a linked list of three values: the line number for a line interrupt, the address of the handler, and the address of the next entry in the list. As I’m not using the line interrupts (yet) this list is a single entry that points back to the start of itself.

At initialisation int.reset is set to the start of this linked list, and after every interrupt the list moves to the next value and sets two memory locations int.jump to the address that contains the address of the jump handler in the list and int.next to the address that contains the address of the next entry in the list. Because there’s only one entry in this list these values never change. I wonder if the extra level of indirection here is going to be important later.

Archer saying Thanks, Freddy Foreshadowing

The problematic code in question is this routine

	@check_int:											; Make sure this is the frame interrupt
	int.status: ld a,0
				bit FrameIntBit,a
				jp z,@+save_regs
		@reset_int_manager:								; If not, reset to start properly next time
				ld hl,(int.reset)
				ld (int.next),hl
				ret

What this routine does is load the status value into a (the 0 has been replaced at runtime by the actual value) and checks if the status is a frame interrupt. If it is not, it resets the int.next to point at the start of the interrupt handler linked list to go back to the start. But that’s not what int.next is supposed to contain: int.next is supposed to contain the address of the memory containing the address, not the address itself.

So the correct thing is to set int.next to int.reset + 3

        @reset_int_manager:
                ld hl,(int.reset)
                FOR 3, inc hl
				ld (int.next),hl

The only question I have is why bit FrameIntBit, a is failing and putting us into the @reset_int_manager code in the first place. There’s no non-frame interrupts set, and the status value is FF which my vague reading of the Sam Coupé development manual would be invalid. So, I dunno. It works now and that’s all that matters.

The hardest part of it was to find a simple way to reproduce the bug: at one point the reproduction steps were to add a breakpoint and skip it 157 times, but even then there were still thousands of instuctions to step through before the bug triggered. Once I worked out the bug was happening inside the interrupt handler, I added a frame interrupt breakpoint and could step the code there and saw funny things happening after the @reset_int_manager code ran. Then lots more staring trying to work out how the indirection worked.

I could delete this code and hardcode it all, but I like having the flexibility of having different interrupt handlers available if I need them. Now I can work on something more interesting now I have scrolling working again.