[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [Doc] writeup for error handling usage in XEN

Hi, all
Those days, we spent some efforts to check severe error handling (panic, 
BUG_ON, BUG, ASSERT) in XEN. We have several round internal discussions as well 
as several mail threads with Keir. Below is the discussion writeup. 

If agreed, after review, we want to place it in XEN document folder or XEN wiki 
since we think it might be helpful to developers.

Thanks a lot for your help!

We found error handling [Panic/BUG_ON/ASSERT/BUG] greatly impacts VM 
Running/service time. So we did some investigation on its usage in current XEN.
Also we have some discussion with Keir. The following writeup logged down them. 
It might be useful to those who have interest in XEN's error handling.

[Current error handler in XEN]
We have five error handlers in XEN. 
1) domain_crash
2) panic
5) BUG
domain_crash only impact the crashed domain, while other four handlers will 
cause whole system/machine halt/reboot.
Panic/BUG_ON/ASSERT/BUG has slight differences:
1) ASSERT only takes effect when DEBUG=y while other three handlers takes effect
   even if DEBUG=y is not used.
2) panic will halt or restart machine based on boot_option.
3) BUG will give more print information besides panic
4) BUG_ON is the "if" added version of BUG
We can see panic, BUG, BUG_ON actually have similar functions.

[Error handler usage guideline]
1) domain_crash VS BUG_ON?
   a) We should keep bug severity/scope in mind. If the bug only affects 
      one domain, use domain_crash to kill the domain instead of panic 
      whole machine.
   b) When one error impacts the HV's overall consistency, even if it only 
      one domain, we prefer to use BUG_ON instead. Use 
      will help different linked software modules to be aware  of the HV's 
      consistency constraints. Below is an example we discussed with Keir 
      which's illustrative: I8254.c/hvm.c (c:\upstream\xen\xen\arch\x86\hvm):  
      BUG_ON(bytes != 1); 
      We want to make sure the handler for a single I/O port never accessed by
      multi-byte I/O port access. Although the illegal-access is not that 
      it still affects HV's consistency constraints. So we choose BUG_ON.
2) How to choose between ASSERT and Panic/BUG_ON/BUG?
   a) In order to collect more error report and save debug effort, ASSERT is 
      preferred when BUG_ON will cause too much overhead in non-debug build.
   b) For consistency and simplicity, BUG_ON should be used instead of 
      panic/BUG as they all have similar behavior
3) When decide to use BUG_ON, be cautious. Please add necessary comments if 
   possible. Only when severe error/HV's consistency constraints broken, 
   should we use it.
4) Don't use BUG_ON for checking expected BIOS issues/settings such as invalid 
   ACPI table. We can turn off those specific features in VMM instead. For 
   example,  if VT-d table is incorrect in BIOS, disable VT-d in the VMM 
   of using BUG_ON.

[Current Status]
We searched [Panic/BUG_ON/ASSERT/BUG] ocurrences in XEN code (cs 18498),
agreed current usage is basically reasonable. Keir also mentioned when check 
in, he tried to make sure that its usage is qualified. Just as Keir's input, 
is an inter-linked set of software modules, and BUG_ON/ASSERT gives some 
description and checking of some of the more subtle interface constraints 
them. Those error handlers will save us tremendous debug efforts.

Attachment: panic.txt
Description: panic.txt

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.