[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V4 15/24] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed



Hi Oleksandr,

On 17/01/2021 20:23, Oleksandr wrote:

On 15.01.21 22:55, Julien Grall wrote:
So we need
to check if the I/O has completed and wait again if it hasn't (we will
block the vCPU again until an event is received). This loop makes sure
that all the vCPU works are done before we return to the guest.

The call chain below:
check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io ->
wait_on_xen_event_channel

The worse that can happen here if the vCPU will never run again
(the I/O will never complete). But, in Xen case, if the I/O never
completes then it most likely means that something went horribly
wrong with the Device Emulator. And it is most likely not safe
to continue. So letting the vCPU to spin forever if the I/O never
completes is a safer action than letting it continue and leaving
the guest in unclear state and is the best what we can do for now.

Please note, using this loop we will not spin forever on a pCPU,
preventing any other vCPUs from being scheduled. At every loop
we will call check_for_pcpu_work() that will process pending
softirqs. In case of failure, the guest will crash and the vCPU
will be unscheduled. In normal case, if the rescheduling is necessary
(might be set by a timer or by a caller in check_for_vcpu_work(),
where wait_for_io() is a preemption point) the vCPU will be rescheduled
to give place to someone else.

What you describe here is a bug that was introduced by this series. If you think the code requires a separate patch, then please split off patch #14 so the code callling vcpu_ioreq_handle_completion() happen here.
I am afraid, I don't understand which bug you are talking about, I just tried to explain why using a loop is not bad (there wouldn't be any impact to other vCPUs, etc) and the worse case which could happen. Also I don't see a reason why the code requires a separate patch (probably, if I understood a bug I would see a reason ...) Could you please clarify?

Your commit message begins with:

"This patch adds proper handling of return value of
vcpu_ioreq_handle_completion() which involves using a loop in
leave_hypervisor_to_guest()."

I read this as "there was a bug in the code base and we are now fixing it". AFAICT, this patch would not be necessary if we don't apply patch #14 in Xen (assuming the rest of IOREQ is complete).

Therefore you are fixing a bug that you introduced in the same series.

It is considered as a bad practice because it means
  1) we have to review code that is known "buggy" (patch #14).
  2) adds more churn in the series than necessary

Instead, it would be better to split your changes in
check_for_vcpu_work() from patch #14 and add them here.

[...]

So I would rework the loop to write it as:

while ( check_for_pcpu_work() )
   check_for_pcpu_work();
check_for_pcpu_work();

makes sense, I assume you meant while ( check_for_vcpu_work() ) ...

Yes. Sorry for the typo.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.