Wednesday 24 February 2010

Interrupt progress

Had another poke at interrupts late last night. And finally got a simple interrupt handler working. As usual a couple of simple mistakes initially thwarted my efforts.
  1. To start with I was trying to use the FRAMEDONE interrupt from the display controller. But it seems I needed to use VSYNC for HDMI out, and the EVSYNC_ODD/_EVEN for S-Video.
  2. I was using the rfe instruction, but neglected the ! on the register, so it wasn't fixing the stack pointer properly on exit. Somehow the application code managed to run ok for a few seconds with a constantly changing stack!
  3. I forgot to fix lr before saving it on the stack using the srs instruction (subtract 4). Again, rfe was thus skipping an a application instruction every interrupt, and again somehow the code didn't crash immediately either.
Other than using the ARMv6 instructions above, the code is basically straight out of the OMAP TRM § 10.5.3 MPU INTC Preemptive Processing Sequence, the step numbers below relate to that section. I haven't implemented the priority stuff yet (I was hoping it wasn't necessary for such a simple bit of code since I don't need priorities, but it seems it is for other reasons), so it doesn't actually implement re-entrant interrupts, but I might try to get that working before committing it.

        .set    MODE_SUPERVISOR, 0x13

ex_irq:
// 1. save critical registers
sub lr,lr,#4
srsdb #MODE_SUPERVISOR!
cps #MODE_SUPERVISOR
push { r0-r3, r12, lr }

ldr r3,=INTCPS_BASE

// 2,3 save and set priority threshold (not done)

// 4. find interrupt source
ldr r0,[r3,#0x40]

// 5. allow new interrupts
mov r1,#1
str r1,[r3,#INTCPS_CONTROL]

// 6. data sync barrier for reg writes before enable irq
dsb // not sure what options it should use

// 7. enable irq

// 8. jump to handler
ldr r2,=irq_vectors
and r0,r0,#0x7f
ldr lr,=ex_irq_done
ldr pc, [r2, r0, lsl #2]

ex_irq_done:
// 1. disable irq

// 2. restore threshold level (not done)

// 3. restore critical registers
pop { r0-r3, r12, lr }
rfeia sp!

.data
.balign 4
.global irq_vectors
irq_vectors:
.word exception_irq, exception_irq, ...
.word ... total of 96 vectors

The srs instruction and cps instructions are used to run everything on the supervisor stack/in supervisor mode. On entry the code is executing in irq mode, so it first saves lr_irq and spsr_irq onto the supervisor stack (after fixing the return address in lr!), and then switches to supervisor mode. Without the srs instruction (pre ARMv6) things are pretty messy since you either have to muck about with the irq stack first (and last), or have to switch between modes a few times to get everything sorted (see the links at the end of this post).

I also implement a simple vectored interrupt table to simplify the C side of things, although I think I can just use a simple mov lr,pc before jumping to the vector rather than a literal load.

The ARM ARM actually recommends using the system mode for re-entrant interrupts (you can't use the interrupt mode itself because lr could be clobbered), so why am I using the supervisor stack? Partially historically because at first I couldn't work out how to save the state without clobbering some system stack registers (they're shared with user state). But I also have other plans where this scheme might work better, and if nothing else it stops broken code in user-mode crashing interrupts by breaking the stack pointer.

And finally one more thing I noticed whilst reading bits and pieces is that the AAPCS (EABI) specifies that the stack pointer should remain double-word (8-byte) aligned for entry points. I probably read it before but didn't take notice. This just normally means you always need to push an even number of registers onto the stack before calling other functions. Fortunately this just falls out with this code ... but with interrupt handlers which can be invoked at any time, we don't know what the alignment of the stack is so a specific check is needed too, according to the ARM Info Centre (damn, and I definitely know i've read that before just looking it up now - and it has some other important other bits too!).

Hmm, now i'm thinking about it ... i'm not sure I even need re-entrant interrupts at all. I'm thinking of working towards something along the lines of a microkernel architecture similar to AmigaOS or Minix 3, where device drivers are just high priority unprivileged tasks - the Cortex-A8 should be more than fast enough for this to work. All interrupt handlers will need to do is post events to these tasks, and the software will handle the priorities and whatnot. I suspect re-entrant interrupts are much more important in an embedded system where you just leave most of the work to the interrupt handlers, where DMA isn't available for everything, or the CPU speed is a limiting factor.

Specific Handler

The next step after the interrupt handler is the interrupt vector code itself. This is just a plain function call since the entry point has handled all the nitty gritty. But it still has to deal with the hardware - to identify which interrupt caused it to be invoked, and to clear it. Even with 96 interrupts in the interrupt controller, most of them map to multiple physical events.

In the case of the video subsystem, there is a single interrupt DSS_IRQ (25) which can be triggered from 29 different events in either the DISPC module or the DSS module (actually I just noticed there are many more from the DSI module). § 15.3.2.2 Interrupt Requests has a pretty good overview. Fortunately there is a couple of bits in the DSS_IRQSTATUS which lets the code determine which are asserted to simplify processing. After that test is made, each bit needs to be checked in turn and processed accordingly. And finally the interrupt bits must be reset by writing a 1 to each bit in the DISPC_IRQSTATUS or DSI_IRQSTATUS register - otherwise it will go into an infinite loop re-invoking the interrupt as soon as it exits.
void dispc_handler(int id) {
uint32_t dssirq = reg32r(DSS_BASE, DSS_IRQSTATUS);

// see if we have any dispc interrupts
if (dssirq & DSS_DISPC_IRQ) {
uint32_t irqstatus = reg32r(DISPC_BASE, DISPC_IRQSTATUS);

if (irqstatus & DISPC_VSYNC) {
... do vsync code ...
}

// clear all interrupt status bits set
reg32w(DISPC_BASE, DISPC_IRQSTATUS, irqstatus);
}

// check for dsi ints (to clear them)
if (dssirq & DSS_DSI_IRQ) {
// not expecting this, just clear everything
reg32w(DSI_BASE, DSI_IRQSTATUS, ~0);
}
}

This is basically the same process that all interrupt handlers need to go through. Identify the source, handle it, clear the assertion.

There are lots of 'gotchas' with interrupt handler writing at first, but the main thing is to not call any functions which share state with non-interrupt code. e.g. anything non-reentrant, or using hardware registers. Oh, and they should always run as fast as possible - all the `real work' your cpu could be doing is halted the entire time the interrupt is executing, and you could be processing thousands per second in a busy system.

The last piece of the puzzle is the interrupt enable masks. You don't just get all interrupts possible in the system all the time, you can mask (or enable) which ones you want to receive. This is all set-up before interrupts are enabled but after the hardware in question is setup. Here I clear all the status bits as well, just to make sure I don't get an unexpected surprise when I enable CPU interrupts later.
        // disable all but vsync
reg32w(DISPC_BASE, DISPC_IRQENABLE, DISPC_VSYNC);
reg32w(DISPC_BASE, DISPC_IRQSTATUS, ~0);
// dss intterrupt can also receive DSI, so disable those too
reg32w(DSI_BASE, DSI_IRQENABLE, 0);
reg32w(DSI_BASE, DSI_IRQSTATUS, ~0);

I think I have some sort of set-up bug because I think that i'm sometimes getting interrupts when no event i'm testing is asserted. I will have to check the extra DSI interrupts I just noticed whilst writing this - they should all be masked off (should be reset condition anyway, but ...).

My little demo code right now just does a vsync'd smooth-scroll by changing the video dma base registers. The TRM states that the register is a `Shadow register, updated on VFP start period or EVSYNC.' There is another little trick though, it looks like all DISPC registers themselves are shadowed again, so you always have to set the GOLCD bit in DISPC_CONTROL whenever you make changes for them to make their way to the hardware. I guess I realised that anyway, but initially forgot.

        // update the graphic layer 0 address (video out) to scroll it
reg32w(DISPC_BASE, DISPC_GFX_BA0, addr);
reg32w(DISPC_BASE, DISPC_GFX_BA1, addr);
reg32s(DISPC_BASE, DISPC_CONTROL, DISPC_GOLCD, ~0);

I might come up with a more impressive demo before committing though. Actually now I have interrupts working it opens up a lot of possibilities, such as a real sound driver, serial driver, and proper timing events (in a very odd twist, sometimes my delay loops seem to run twice as fast as other times ...).

Links

I came across a couple of links on the internet about bare-metal ARM coding, some of it doesn't apply/wont work on OMAP3, but the general ideas are the same.



Oh, I finally got out in the yard yesterday - if only for a couple of hours. More or less finished the trench for the main retaining wall foundation. Now I just need to get off my lazy bum and order some road-base and sand. Can't say I felt the fittest - easily out of breath, although I'm sure that has something to do with the sleep apnoea, my particularly poor sleep the night before (i let the cat stay in and he was wandering around all night), as well as my bum sitting. Glorious day today, and no meeting organised yet about work, so I should probably get out on a bike. Maybe I can scan a few pawn shops in the extremely unlikely event any have C64's lying around.

No comments: