-
Notifications
You must be signed in to change notification settings - Fork 108
Syscalls & interrupts, Part II
In the first part of this article, we analyzed the execution of system calls or hardware interrupts from their inception through the entry into _irqit
, the kernel funnel routine that all software and hardware interrupts pass through. In this section, we'll trace the same path for a typical interrupt, the hardware timer.
Here's the full code at the start of _irqit:
_irqit:
//
// Make room
//
push %ds
push %si
push %di
//
// Recover data segment
//
// seg cs
mov %cs:ds_kernel,%ds
//
// Determine which stack to use
//
cmpw $1,_gint_count
jc utask // We were in user mode
jz itask // Using a process's kernel stack
ktask: // Already using interrupt stack
//
// Already using interrupt stack, keep using it
//
mov %sp,%si
sub $8,%si // 14 offsets less 6 already on stack
jmp save_regs
//
// Using a process's kernel stack, switch to interrupt stack
//
itask:
mov $_intstack-14,%si // 14 offsets 0-13 of SI below
jmp save_regs
Execution is identical to a system call, until the general interrupt counter (_gint_count) is tested. Since this is a hardware interrupt this time, the system could be executing application code, or perhaps be in the middle of a kernel routine.
If _gint_count is 0, then the path for the timer interrupt would be the same as a system call - the save_regs routine would be called and the registered saved within the current task structure. However, since we know that each call through _irqit increments _gintr_count, we'll look at each case specifically this time.
If _gint_count is 1, this means that an application program is already in the kernel, and that the timer interrupt is interrupting normal kernel code. This case is handled by the "jz itask" (jump if zero) instruction. You'll see here that in this case, rather than SI being set to point into the kernel current task structure, it is set to point to a global _intstack-14. This is 14 bytes (7 words) less than the top of a global kernel array _intstack, which is going to be used as a special interrupt stack, rather than the normal kernel stack used for application system calls or hardware interrupts from user mode. SI is set to 14 less because that's the amount of space required before this same routine switches to the new kernel stack. The registers are saved on the interrupt stack in the same order as they are in the current task struct, but instead onto the interrupt stack:
/* ordering of saved registers on kernel stack after syscall/interrupt entry*/
struct _registers {
/* SI offset 0 2 4 6 8 10 12*/
__u16 ax, bx, cx, dx, di, si, orig_ax, es, ds, sp, ss;
};
Notice that the offsets are 0-12 (14 total) to save the DI through SS registers, which was discussed in Part I.
If _gint_count is 2 or more, this means that the current hardware interrupt is interrupting another interrupt. Although that sounds complicated, it's not really a big deal, since interrupts are prioritized by the 8259 interrupt controller and only allowed when interrupts are re-enabled after saving all the registers in this same routine.
In this case, the conditional jumps to utask and itask are not taken, and code execution falls through to ktask, where the comment says "Already using interrupt stack, keep using it". In this case, since we know the first three instructions of this routine, the push DS, SI and DI registers are already on the interrupt stack, we only need to move SP to SI, and subtract 8, since the first three pushes used the other 6 bytes.
This is somewhat tricky, but the important point is that in the case of interrupting kernel code, or interrupting an interrupt, the system doesn't switch to a normal kernel process stack saved in the task structure, but instead saves all the registers on a 512-byte fixed interrupt stack with the following declaration:
.skip 512,0 // 512 byte interrupt stack
_intstack:
The reason a normal kernel process stack can't be used is because that stack is already in use by the first interrupt, the one that occurred when _gint_count == 0. The saved registers would not be in the places the kernel expects, and ultimately this would mean that task couldn't be scheduled in or out. But we'll return to that in another discussion.
save_regs:
incw _gint_count
pop (%si) // DI
pop 2(%si) // SI
pop 8(%si) // DS
...
Here we are, in the exact same routine as handling a system call, incrementing _gint_count as always, but this time SI points to a special interrupt stack to save the registers.
...
movb %cs:(%di),%al
cmpb $0x80,%al
jne updct
Now we get to the part where the interrupt number is checked. The timer is interrupt 0, so the test for interrupt 80h will fail and we will "jne updct" (jump not equal). This takes us away from the syscall processing and instead into an update count routine.
/*
!
! ----------PROCESS INTERRUPT----------
!
! Update intr_count
!
*/
updct:
incw intr_count // only needed for schedule during interrupt warning
//
// Call the C code
//
sti // Reenable interrupts
mov %sp,%bx // Get pointer to pt_regs
cbw
push %ax // IRQ for later
push %bx // Register base
push %ax // IRQ number
call do_IRQ // Do the work
pop %ax // Clean parameters
pop %bx
pop %ax // Saved IRQ
We updated a new global variable _intr_count, whose purpose is to tell the kernel to error if a reschedule (sleep/wait) is attempted, and then enable interrupts.
The current SP, which points to a _registers struct on the interrupt stack, is copied to BX, and the interrupt number, which happens to be in AL, is sign-extended and saved. The _register array address and IRQ number are then pushed and do_IRQ
is called, which uses the interrupt number to find the previously-registered driver routine to call. The kernel routine request_irq
is used by device drivers for this purpose.
Notice that the kernel-registered interrupt routine is called with interrupt enabled.
After the interrupt routine returns, the stack parameters are removed, and the saved IRQ popped into AX.
So we've seen that any code, including the kernel, can be interrupted at any time, except for portions of routines like this that disable interrupts. But in general, any application or kernel code must be aware that a registered interrupt handler could be called.
HOWEVER, as we will see shortly, all interrupts processed on the interrupt stack (not kernel stack) are guaranteed to return to the previously interrupted code. This means that an interrupt driver routine, which is always called through do_IRQ
using this method, cannot call any kernel sleep routine. In general, it shouldn't do much, since it is interrupting application or kernel code, although it is definitely allowed to "wake up" tasks that are not running (called sleeping).
//
// Send EOI to interrupt controller
//
cli // Disable interrupts to avoid reentering ISR
cmp $16,%ax
jge was_trap // Traps need no reset
or %ax,%ax // Is int #0?
jnz a4
//
// IRQ 0 (timer) has to go on to the bios for some systems
//
decw _bios_call_cnt_l // Will call bios int?
jne a4
movw $5,_bios_call_cnt_l
pushf
lcall *_stashed_irq0_l
jmp was_trap // EOI already sent by bios int
a4:
cmp $8,%ax
mov $0x20,%al // EOI
jb a6 // IRQ on low chip
/*
!
! Reset secondary 8259 if we have taken an AT rather
! than XT irq. We also have to prod the primay
! controller EOI..
!
*/
out %al,$0xA0
jmp a5
a5: jmp a6
a6: out %al,$0x20 // Ack on primary controller
After the interrupt service routine returns, interrupts are disabled again and a test for interrupts 0-15 is checked.
If 16 or greater, this section of code is skipped by jumping to was_trap.
If the interrupt was number 0, then every five calls of (the timer interrupt 0) the BIOS timer interrupt entry point is called, to handle BIOS's that need a timer interrupt (since all interrupt vectors point to ELKS now).
Then, at label a4, the interrupt number is compared to 8 to determine whether to send an EOI (end of interrupt) to the slave as well as well as the master 8259 interrupt controller. Note, that even though interrupts were re-enabled before calling do_IRQ, the 8259 interrupt controller won't allow any equal-or-lower priority interrupts to occur until it has had an EOI sent to it. Since interrupts are now disabled, reseting the 8259 won't cause an immediate interrupt (yet).
was_trap:
//
// Restore intr_count
//
decw intr_count
//
// Now look at rescheduling
//
cmpw $1,_gint_count
jne restore_regs // No
//
// This path will return directly to user space
//
sti // Enable interrupts to help fast devices
call schedule // Task switch
call do_signal // Check signals
cli
//
// Restore registers and return
//
restore_regs:
We now get to the interesting part of handling hardware interrupts. Now that the interrupt handling routine has finished, and the interrupt controller reset with an EOI, the global _intr_count is decremented, possibly allowing task-switching in the scheduler.
The global _gint_count is compared to 1. If not equal to 1, then the registers are restored as always from exit from _irqit, and the interrupted routine, whatever it was, is resumed. Remember, that the only time that _gint_count can be 1 in this routine is when the routine is handling an interrupt from user code. For us now, that means that now that this timer interrupt has completed, it turns out the timer interrupted user code (NOT kernel code), if _gint_count is 1.
Thus, in this special case only, where a hardware interrupt interrupted user code, interrupts are re-enabled and schedule is called.
If there is another task ready to run (possibly woken up by the interrupt handler just executed), the scheduler will pick the first one an d switch stacks.
We won't go into exactly how this works now, but since _gintr_count is 1 at this time (just before decrementing it and restoring registers), this means that the current kernel stack is sitting with all registers saved in the _register array in the task struct. All that is needed is to switch the kernel SP to a different task struct entry, and then the return from schedule will be to this very same routine, but with a different kernel stack. In effect, could be the current, or any other, ready-to-run task struct.
After the return from schedule, the current (newly-woken, or round robin scheduled) task now continues executing, which then calls do_signal for signal processing, then falls into restore_registers, which decrements _gint_count, restores the registers, and return from interrupt:
//
// Restore registers and return
//
restore_regs:
decw _gint_count
pop %ax
pop %bx
pop %cx
pop %dx
...
This concludes this writeup on system calls and interrupt processing. The heart of ELKS is contained within the _irqit routine, the scheduler, and the task structure. Hopefully this illuminates the very controlled execution path ELKS takes to keep the system in a known state at all times.