[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: only clobber multicall elements without error
>>> On 26.11.18 at 15:23, <jgross@xxxxxxxx> wrote: > On 26/11/2018 15:01, Jan Beulich wrote: >>>>> On 26.11.18 at 14:52, <jgross@xxxxxxxx> wrote: >>> I don't think the hypervisor should explicitly try to make it as hard as >>> possible for the guest to find problems in the code. >> >> That's indeed not the hypervisor's goal. Instead it tries to make >> it as hard as possible for the guest (developer) to make wrong >> assumptions. > > Let's look at the current example why I wrote this patch: > > The Linux kernel's use of multicalls should never trigger any single > call to return an error (return value < 0). A kernel compiled for > productive use will catch such errors, but has no knowledge which > single call has failed, as it doesn't keep track of the single entries > (non-productive kernels have an option available in the respective > source to copy the entries before doing the multicall in order to have > some diagnostic data available in case of such an error). Catching an > error from a multicall right now means a WARN() with a stack backtrace > (for the multicall itself, not for the entry causing the error). > > I have a customer report for a case where such a backtrace was produced > and a kernel crash some seconds later, obviously due to illegally > unmapped memory pages resulting from the failed multicall. Unfortunately > there are multiple possibilities what might have gone wrong and I don't > know which one was the culprit. The problem can't be a very common one, > because there is only one such report right now, which might depend on > a special driver. > > Finding this bug without a known reproducer and the current amount of > diagnostic data is next to impossible. So I'd like to have more data > available without having to hurt performance for the 99.999999% of the > cases where nothing bad happens. > > In case you have an idea how to solve this problem in another way I'd be > happy to follow that route. I'd really like to be able to have a better > clue in case such an error occurs in future. Since you have a production kernel, I assume you also have a production hypervisor. This hypervisor doesn't clobber the arguments if I'm not mistaken. Therefore - in the debugging scenario you (can) have all data available by virtue of the information getting copied in the kernel, - in the release scenario you have all data available since it's left un-clobbered. Am I missing anything (I don't view mixed debug/release setups of kernel and hypervisor as overly important here)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |