[Background] We found error handling [Panic/BUG_ON/ASSERT/BUG] greatly impacts VM Running/service time. So we did some investigation on its usage in current XEN. Also we have some discussion with Keir. The following writeup logged down them. It might be useful to those who have interest in XEN¡¯s error handling. [Current error handler in XEN] We have five error handlers in XEN. 1) domain_crash 2) panic 3) BUG_ON 4) ASSERT 5) BUG domain_crash only impact the crashed domain, while other four handlers will cause whole system/machine halt/reboot. Panic/BUG_ON/ASSERT/BUG has slight differences: 1) ASSERT only takes effect when DEBUG=y while other three handlers takes effect even if DEBUG=y is not used. 2) panic will halt or restart machine based on boot_option. 3) BUG will give more print information besides panic 4) BUG_ON is the ¡°if¡± added version of BUG We can see panic, BUG, BUG_ON actually have similar functions. [Error handler usage guideline] 1) domain_crash VS BUG_ON? a) We should keep bug severity/scope in mind. If the bug only affects one domain, use domain_crash to kill the domain instead of panic whole machine. b) When one error impacts the HV's overall consistency, even if it only impact one domain, we prefer to use BUG_ON instead. Use [Panic/BUG_ON/ASSERT/BUG] will help different linked software modules to be aware of the HV's consistency constraints. Below is an example we discussed with Keir which's illustrative: I8254.c/hvm.c (c:\upstream\xen\xen\arch\x86\hvm): BUG_ON(bytes != 1); We want to make sure the handler for a single I/O port never accessed by multi-byte I/O port access. Although the illegal-access is not that fatal, it still affects HV's consistency constraints. So we choose BUG_ON. 2) How to choose between ASSERT and Panic/BUG_ON/BUG? a) In order to collect more error report and save debug effort, ASSERT is preferred when BUG_ON will cause too much overhead in non-debug build. b) For consistency and simplicity, BUG_ON should be used instead of panic/BUG as they all have similar behavior 3) When decide to use BUG_ON, be cautious. Please add necessary comments if possible. Only when severe error/HV's consistency constraints broken, should we use it. 4) Don't use BUG_ON for checking expected BIOS issues/settings such as invalid ACPI table. We can turn off those specific features in VMM instead. For example, if VT-d table is incorrect in BIOS, disable VT-d in the VMM instead of using BUG_ON. [Current Status] We searched [Panic/BUG_ON/ASSERT/BUG] ocurrences in XEN code (cs 18498), agreed current usage is basically reasonable. Keir also mentioned when check in, he tried to make sure that its usage is qualified. Just as Keir's input, XEN is an inter-linked set of software modules, and BUG_ON/ASSERT gives some explicit description and checking of some of the more subtle interface constraints between them. Those error handlers will save us tremendous debug efforts.