[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106504: regressions - FAIL



>>> On 05.04.17 at 01:57, <chao.gao@xxxxxxxxx> wrote:
> On Wed, Mar 22, 2017 at 06:47:33AM -0600, Jan Beulich wrote:
>>>>> On 22.03.17 at 05:53, <chao.gao@xxxxxxxxx> wrote:
>>> I have written a xtf test case (many codes are from hvmloader) to
>>> trigger this assertion. The test case is in attachments.
>>
>>Thanks for doing this.
>>
>>> Bottom is the output
>>> of this test. This test initializes PIT channel0 to generate periodic timer
>>> interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. 
>>> And
>>> vCPU1 is used to change IOAPIC RTE 2 frequently.
>>
>>Well, this is certainly helpful (due to some of the conclusions you
>>draw below), but it is very likely not what has caused the assertion
>>to trigger in osstest. So by removing the assertion (as you suggest
>>below) we then will have a silent, non-understood misbehavior.
>>
>>> The assertion can be triggered by guest. To fix assertion failure,
>>> I propose to remove this assertion for the reason below:
>>
>>Of course I agree that a guest triggerable assertion is bad, and
>>hence needs a correction somewhere.
>>
>>> 1. Operations in this test case are very intrusive and abnormal. It updates 
>>> RTE frequently without disabling interrupt source. In this case, I think 
>>> software can't assume hardware works correctly.
>>
>>I guess hardware behavior simply is unspecified in such a case, so
>>it's hard to judge whether it works "correctly".
>>
>>> 2. If we remove this assertion(means we admit pt_vector may be different
>>> from (or bigger than) the vector we set in vIRR in a rare case), the side
>>> effect is that we won't decrease the counter pt->ending_intr_nr in
>>> pt_intr_post() and one more timer interrupt in number is injected to guest. 
>>
>>Which is clearly wrong, afaict, as that may drive the guest clock
>>off (depending on how the guest OS does its accounting).
>>
>>> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
>>> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
>>> the vector in RTE during the window, it will also incur losing or getting
>>> more periodic timer interrupt.
>>
>>Which raises the question whether latching the value read the first
>>time would address the issue you demonstrate with the test case.
>>Or alternatively deferring writes to take effect only once readers
>>are done with their perhaps multiple accesses?
> 
> Hi, Jan.
> 
> I plan to do the following changes:
> 1. get the vector set in vIRR to avoid getting a wrong interrupt vector
>  I think there are two appoaches. One is to extend hvm_isa_irq_assert()
>  to return the vector set in vIRR. Several functions in call trees are
>  also involved. The other is to make vIOAPIC support disabling
>  write operations to RTE. In this case, a rwlock_t is introduced to 
>  protect RTE. pt_update_irq() will disable write operations
>  at first, then get the vector and assert the vector, at last enable
>  write operations. Which one do you think is better?

That's hard to tell without seeing the changes each actually involves.
On the surface I'd probably prefer the 2nd, provided the locking can
be got into a shape where there's no meaningful risk of missing an
unlock on some path.

> 2. let pt_update_irq() pass the periodic timer
>  whose interrupt is to be injected to vmx_intr_assit() which
>  in turn can pass it to pt_intr_post(). After this, pt_intr_post()
>  needn't search the periodic timer that matches the interrupt has
>  been injected. Through this, we can avoid reading the RTE there.

If the RTE can't be changed behind your back, why would you
need this?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.