vsta: crash


=== What to do when your VSTa kernel crashes ===

Alas, the VSTa kernel does crash from time to time. If it happens to
you, there are some steps you can take which will help the problem get
fixed.

First, you need to find out if it's the actual kernel, or if it's a
boot server. A boot server is a regular user program, but it's one of
the intial ones which exist when the kernel starts running. In a
non-DEBUG kernel, the process will just quietly exit--it really is
just a normal user process. But in the DEBUG kernel, the assumption is
that having one of your initial processes die is such a bad thing,
that the system should stop and let you find out what went wrong.

In either case, you end up in the kernel debugger. So the first step
is to figure out if it's the kernel that died, or a boot server.

So how do you tell? Easy. If the message is:

Boot process <pid> dies

then it's a boot process which has failed. Otherwise it's the whole
kernel which has failed.

==== Boot process failure ====

If a user process dies, you can still gather a stack backtrace of
sorts. You don't get symbols, but at least you will have something
which can be mapped to symbols by somebody who has a running VSTa
system with an executable with symbols. You get the stack backtrace by
doing:

tf -- Kernel debugger, ask for trap frame

First note the "eip" value, which was the instruction which took the
user program into the kernel.

Take the "ebp" value, which should be less than 0x80000000. Bitwise OR
in 0x80000000. So if the address looks like 0x7fffff00, you would make
this 0xffffff00. Do "dv 0xffffff00 2".

What you get back is a pair of hex numbers. The first is the next
stack frame in from the fault; the second is the program counter
value. Note the PC value. Then do the same transformation of the stack
frame address, and do, say, "dv 0xffffffc0" (the count 2 will be
remembered from the previous command). This will give you another pair
of hex addresses; again note the PC value, and iterate until you get
zeroes (top stack frame). You now have a stack backtrace, and those PC
values indicate the nested callers.

==== Kernel failure ====

For most assertion failures, you can just do "bt", and the kernel
debugger will give you a stack backtrace. However, in the case of
"kernel fault" errors, you need to do a backtrace of the stack
following from the kernel trap. So again do "tf", and get the trap
frame from the failed instruction.

The kernel backtrace method is much the same as for a user boot
process, except you don't have to OR in 0x80000000 to the values. When
you do "tf", take the program counter ("eip") and do a "= <eip>" on
whatever hex value the kernel debugger indicated. This is the address
of the faulting instruction. Now take the "ebp" value, and do "dv
<addr> 2". You'll get back a pair of numbers. Do "= <PC>" (where <PC>
is the second of those hex numbers), and you'll get the symbolic
location for that program counter value. Now take the first of those
addresses, and do "dv <value>", get another pair of numbers, and work
your way backwards, getting PC values and turning them into symbolic
values.

At your option, either send this directly to vandys@vsta.org, or you can
post it to the appropriate VSTa Wiki discussion forum.  There used
to be a mailing list, but spam has made that a thing of the past.