> I took your advice and tried to stop the second invocation of wait_event()
> by using the method suggested by Andres which is to merge the recursive
> faults.
>
> <quote>
> 1. read last unconsumed
> 2. compare
> 3. different? write into ring
> 4. same? atomic_read consumer index to ensure still unconsumed
> 5. consumed? write into ring <still a possible race below, but not a tragedy>
> 6. not consumed? dropâ
> <end quote>
Whether that's safe/correct/... I have to leave to Andres; it doesn't
feel well to do things like that.
I'm neither thrilled nor disgusted. If you want to keep track of the first time something happens, this is perfectly fine.
If you want to single step ... well that goes to the point Jan raises below. You can't single-step the hypervisor. I guess the right question to ask is "What would GDB do?". Imagine you are tracing a user-space process with shared memory
and a separate process starts changing that shared memory?
I think that spells the answer.
We have the luxury of defining this interface, and the responsibility to keep it as stable as possible. I think that for extending to PV we should either (i) live with this event merge and not pretend we can even remotely single-step hypervisor
accesses to shared memory -- but keep in mind the potential for ring exhaustion exists; or (ii) discard altogether accesses by Xen to shared memory from the set of trackable operations.
Aravindh, is there a reason not to choose (ii)?
> +bool_t mem_event_check_duplicate(struct mem_event_domain *med,
> + mem_event_request_t *req)
> +{
> + mem_event_front_ring_t *front_ring;
> + mem_event_request_t *ureq;
> + RING_IDX req_event;
> + bool_t rc = 0;
> +
> + mem_event_ring_lock(med);
> +
> + /* Due to the reservations, this step must succeed. */
> + front_ring = &med->front_ring;
> +
> + /* Index of last unconsumed request */
> + req_event = front_ring->sring->req_event - 1;
> + ureq = RING_GET_REQUEST(front_ring, req_event);
> +
> + if ( req->gfn != ureq->gfn )
> + goto out;
> + if ( current->vcpu_id != ureq->vcpu_id )
> + goto out;
> + if ( req->access_r != ureq->access_r )
> + goto out;
> + if ( req->access_w != ureq->access_w )
> + goto out;
> + if ( req->access_x != ureq->access_x )
> + goto out;
> +
> + /* Check consumer index has not moved */
> + if ( req_event == read_atomic(&front_ring->sring->req_event) - 1 )
> + rc = 1;
This should all be one big if() imo.
> With this I am able to prevent recursive identical faults in the runstate
> area causing the assert to fire. However there still exists a corner case. I
> still do not know what to do about the situation when the ring is full, the
> last unconsumed event was not for the runstate area, the runstate area is
> marked not writable and the guest or xen writes to it. I _think_ this would
> again cause the assert to fire. Can you give me some guidance as to what to
> do in this situation? Should I corner case the runstate area?
No, that wouldn't help as that's not the only thing Xen writes to guest
memory.
And no, I have no good idea about how to deal with the situation. I
continue to think that this is the wrong overall approach though, due
to the resultant inconsistency with HVM (where Xen writes are being
ignored), i.e. I believe you ought to rather think about making these
not fault, or deal with the faults gracefully and without needing to
emulate the instructions. One possibility I can see would be to clear
CR0.WP in the fault handler (and set a flag that this was done),
restoring it on the path back out of guest memory access function(s).
But that of course implies that the situation can only ever arise for
writes. And it would need to be done extremely carefully to make
sure you don't inadvertently allow writes to regions that are truly
write protected.
As per above, we could forsake this and choose (ii).
> PS: I did try running with a hacked up version where the max ring buffer
> entries (nr_ents) is 1 and I could not make this situation happen but I guess
> it is still theoretically possible. Or am I mistaken?
No, you aren't - if the ring doesn't get consumed from, it will
unavoidably become full at some point.
Yes the fundamental problem (with PV) remains, if we choose (i).
Andres
I am fine with going with option (ii) and keep things consistent with HVM.
Jan, I will resubmit this patch with the other changes you and Tim asked for.
Thanks,
Aravindh