[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [Doc] writeup for error handling usage in XEN

Thanks for posting this Criping.  Since you've started this
discussion, I'd like to add a suggestion for future use:

It would be nice if ASSERT could be enabled at runtime rather
than just at compile time.  If there were a global flag
"enable_asserts" that could be enabled by a Xen grub command
line option, and the ASSERT macro always tested that global
flag before testing the assert-condition, then additional
debug/checking code could be easily enabled with a very
small runtime cost.  (The global variable would be checked
frequently enough that it would always be in cache, and
since it only changes once -- at bootime -- there would be
no cache-synchronization costs.)

> -----Original Message-----
> From: Ke, Liping [mailto:liping.ke@xxxxxxxxx]
> Sent: Thursday, December 04, 2008 12:32 AM
> To: Keir Fraser
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-devel] [Doc] writeup for error handling usage in XEN
> Hi, all
> Those days, we spent some efforts to check severe error 
> handling (panic, BUG_ON, BUG, ASSERT) in XEN. We have several 
> round internal discussions as well as several mail threads 
> with Keir. Below is the discussion writeup. 
> If agreed, after review, we want to place it in XEN document 
> folder or XEN wiki since we think it might be helpful to developers.
> Thanks a lot for your help!
> Regards,
> Criping
> [Background]
> We found error handling [Panic/BUG_ON/ASSERT/BUG] greatly impacts VM 
> Running/service time. So we did some investigation on its 
> usage in current XEN.
> Also we have some discussion with Keir. The following writeup 
> logged down them. 
> It might be useful to those who have interest in XEN's error handling.
> [Current error handler in XEN]
> We have five error handlers in XEN. 
> 1) domain_crash
> 2) panic
> 3) BUG_ON
> 5) BUG
> domain_crash only impact the crashed domain, while other four 
> handlers will cause whole system/machine halt/reboot.
> Panic/BUG_ON/ASSERT/BUG has slight differences:
> 1) ASSERT only takes effect when DEBUG=y while other three 
> handlers takes effect
>    even if DEBUG=y is not used.
> 2) panic will halt or restart machine based on boot_option.
> 3) BUG will give more print information besides panic
> 4) BUG_ON is the "if" added version of BUG
> We can see panic, BUG, BUG_ON actually have similar functions.
> [Error handler usage guideline]
> 1) domain_crash VS BUG_ON?
>    a) We should keep bug severity/scope in mind. If the bug 
> only affects 
>       one domain, use domain_crash to kill the domain instead 
> of panic 
>       whole machine.
>    b) When one error impacts the HV's overall consistency, 
> even if it only impact
>       one domain, we prefer to use BUG_ON instead. Use 
>       will help different linked software modules to be aware 
>  of the HV's 
>       consistency constraints. Below is an example we 
> discussed with Keir 
>       which's illustrative: I8254.c/hvm.c 
> (c:\upstream\xen\xen\arch\x86\hvm):  
>       BUG_ON(bytes != 1); 
>       We want to make sure the handler for a single I/O port 
> never accessed by
>       multi-byte I/O port access. Although the illegal-access 
> is not that fatal, 
>       it still affects HV's consistency constraints. So we 
> choose BUG_ON.
> 2) How to choose between ASSERT and Panic/BUG_ON/BUG?
>    a) In order to collect more error report and save debug 
> effort, ASSERT is 
>       preferred when BUG_ON will cause too much overhead in 
> non-debug build.
>    b) For consistency and simplicity, BUG_ON should be used 
> instead of 
>       panic/BUG as they all have similar behavior
> 3) When decide to use BUG_ON, be cautious. Please add 
> necessary comments if 
>    possible. Only when severe error/HV's consistency 
> constraints broken, 
>    should we use it.
> 4) Don't use BUG_ON for checking expected BIOS 
> issues/settings such as invalid 
>    ACPI table. We can turn off those specific features in VMM 
> instead. For 
>    example,  if VT-d table is incorrect in BIOS, disable VT-d 
> in the VMM instead 
>    of using BUG_ON.
> [Current Status]
> We searched [Panic/BUG_ON/ASSERT/BUG] ocurrences in XEN code 
> (cs 18498),
> agreed current usage is basically reasonable. Keir also 
> mentioned when check 
> in, he tried to make sure that its usage is qualified. Just 
> as Keir's input, XEN 
> is an inter-linked set of software modules, and BUG_ON/ASSERT 
> gives some explicit 
> description and checking of some of the more subtle interface 
> constraints between 
> them. Those error handlers will save us tremendous debug efforts.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.