[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE



>>> On 22.02.17 at 02:53, <chao.gao@xxxxxxxxx> wrote:
> On Tue, Nov 22, 2016 at 05:58:56PM +0800, Jan Beulich wrote:
>>>>> On 18.11.16 at 02:57, <feng.wu@xxxxxxxxx> wrote:
>>>      else
>>> -        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>>> -                             & 0xff) << 8;
>>> +    {
>>> +        new_ire.post.fpd = 0;
>>> +        new_ire.post.res_1 = 0;
>>> +        new_ire.post.res_2 = 0;
>>> +        new_ire.post.urg = 0;
>>> +        new_ire.post.im = 1;
>>> +        new_ire.post.vector = gvec;
>>> +        new_ire.post.res_3 = 0;
>>> +        new_ire.post.res_4 = 0;
>>> +        new_ire.post.res_5 = 0;
>>> +        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
>>> +        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
>>> +        new_ire.post.p = 1;    /* finally, set present bit */
>>> +    }
>>
>>Along the lines of the comment above, you don't fill avail here. Why?
>>
>>Taking both together, I don't see why - after adding said initialization -
>>new_ire needs to start out as a copy of the live IRTE. Instead you
>>could memset() the whole structure to zero, or simply give it an empty
>>initializer (saving you from initializing all the reserved fields, plus some
>>more).
>>
> 
> Yes, it is really confused. Maybe because this field is available to software 
> in posting
> format IRTE. We can reserve the avail field through introducing a flag to
> distinguish initialization from update. We also can clear the avail field 
> every
> time if we don't use it right now, leaving the problem to others who want use
> the avail field.  Do you think which is better?

As its unused, clear it every time. We simply don't know how a
potential use would want it set during update.

>>> @@ -637,7 +657,23 @@ static int msi_msg_to_remap_entry(
>>>      remap_rte->address_hi = 0;
>>>      remap_rte->data = index - i;
>>>
>>> -    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>>> +    if ( !pi_desc )
>>> +        memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>>> +    else
>>> +    {
>>> +        __uint128_t ret;
>>> +
>>> +        old_ire = *iremap_entry;
>>> +        ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
>>> +
>>> +        /*
>>> +         * In the above, we use cmpxchg16 to atomically update the 128-bit 
>>> IRTE,
>>> +         * and the hardware cannot update the IRTE behind us, so the 
>>> return value
>>> +         * of cmpxchg16 should be the same as old_ire. This ASSERT 
>>> validate it.
>>> +         */
>>> +        ASSERT(ret == old_ire.val);
>>> +    }
>>
>>Could you remind me again please why posted format updates need
>>to use cmpxchg16b, but remapped format ones don't? (As a aside,
>>with the code structure as you have it you should move the old_irte
>>declaration here, or omit it altogether, as you could as well pass
>>*iremap_entry directly afaict.)
> 
> Before feng left, I have asked him about this question. He told me that
> the PDA field of posted format IRTE comprises of two parts:
> Posted Descritor Address High[127:96] and Low [63:38]. If we want to update
> PDA field, do it atomically or disable-update-enable. He also said, it had
> been confirmed that cmpxchg16b was supported on all intel platform with VT-d 
> PI.
> If we assume that updates to remapped format IRTE only is to update either
> 64 bit or high 64 bit (except initialition), two 64bit memory write operations
> is enough. 

Two 64-bit memory write operations? Where do you see them? I
only see memcpy(), which for the purposes of the code here is
supposed to be a black box.

>>> @@ -668,7 +704,8 @@ int msi_msg_write_remap_rte(
>>>
>>>      drhd = pdev ? acpi_find_matched_drhd_unit(pdev)
>>>                  : hpet_to_drhd(msi_desc->hpet_id);
>>> -    return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev, msi_desc, msg)
>>> +    return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev,
>>> +                                         msi_desc, msg, NULL, 0)
>>
>>Is this unconditional passing of NULL here really correct?
> 
> Since two parameters are added to this function, we should think about what
> the function does again. the last 2 parameters are optional.
> 
> If they are not present, just means a physical device driver changes its msi
> message. So it notifies iommu to do some changes to IRTE accordingly (the 
> driver doesn't
> know the format of the live IRTE). This is the case above.
> 
> If they are present, it means the msi should be delivered to the vcpu with the
> vector num. To achieve that, the function replaces the old IRTE with a new
> posted format IRTE.

I don't see how this answers my question. In fact it feels like you,
just like Feng, are making assumptions on the conditions under
which the function here is being called _at present_. You should,
however, make the function work correctly for all possible uses,
or add ASSERT()s to clearly expose issues with potential new,
future callers.

>>> @@ -992,35 +986,11 @@ int pi_update_irte(const struct vcpu *v, const struct 
>>> pirq *pirq,
>>>          return -ENODEV;
>>>
>>>      iommu = drhd->iommu;
>>> -    ir_ctrl = iommu_ir_ctrl(iommu);
>>> -    if ( !ir_ctrl )
>>> +    if ( !iommu_ir_ctrl(iommu) )
>>>          return -ENODEV;
>>>
>>> -    spin_lock_irq(&ir_ctrl->iremap_lock);
>>> -
>>> -    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, 
>>> p);
>>> -
>>> -    old_ire = *p;
>>> -
>>> -    /* Setup/Update interrupt remapping table entry. */
>>> -    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
>>> -    ret = cmpxchg16b(p, &old_ire, &new_ire);
>>> -
>>> -    /*
>>> -     * In the above, we use cmpxchg16 to atomically update the 128-bit 
>>> IRTE,
>>> -     * and the hardware cannot update the IRTE behind us, so the return 
>>> value
>>> -     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>>> -     */
>>> -    ASSERT(ret == old_ire.val);
>>> -
>>> -    iommu_flush_cache_entry(p, sizeof(*p));
>>> -    iommu_flush_iec_index(iommu, 0, remap_index);
>>> -
>>> -    unmap_vtd_domain_page(iremap_entries);
>>> -
>>> -    spin_unlock_irq(&ir_ctrl->iremap_lock);
>>> -
>>> -    return 0;
>>> +    return msi_msg_to_remap_entry(iommu, pci_dev, msi_desc, &msi_desc->msg,
>>> +                                  pi_desc, gvec);
>>
>>There are few changes here which appear to have the potential of
>>affecting behavior: Previously you didn't alter msi_desc or the MSI
>>message contained therein (as documented by the pointer having
>>been const). Is this possible updating of message and remap index
>>really benign? In any event any such changes should be reasoned
>>about in the commit message.
> 
> I agree that we can't update message and remap index in this pi_update_irte.
> but msi_msg_to_remap_entry will change msi_desc when msi_desc->remap_index < 
> 0.
> How about splitting part of msi_msg_to_remap_entry to a new function which 
> consumes a const
> msi_desc parameter and pi_update_irte will call the new function?

Well, I can't easily say yes or no here without seeing what the
result would be. Give it a try, and we'll look at the result in v9.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.