[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 4/7] VT-d: Use one function to update both remapped and posted IRTE



On Tue, Nov 22, 2016 at 05:58:56PM +0800, Jan Beulich wrote:
>>>> On 18.11.16 at 02:57, <feng.wu@xxxxxxxxx> wrote:
>> @@ -597,31 +598,50 @@ static int msi_msg_to_remap_entry(
>
>Considering you basically re-do most of the function, I think there's
>some more adjustment necessary (or at least very desirable) here.
>
>>
>>      memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
>>
>> -    /* Set interrupt remapping table entry */
>> -    new_ire.remap.fpd = 0;
>> -    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
>> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>> -    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
>> -    /* Hardware require RH = 1 for LPR delivery mode */
>> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> -    new_ire.remap.avail = 0;
>> -    new_ire.remap.res_1 = 0;
>> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> -                            MSI_DATA_VECTOR_MASK;
>> -    new_ire.remap.res_2 = 0;
>> -    if ( x2apic_enabled )
>> -        new_ire.remap.dst = msg->dest32;
>> +    if ( !pi_desc )
>> +    {
>> +        /* Set interrupt remapping table entry */
>> +        new_ire.remap.fpd = 0;
>> +        new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 
>> 0x1;
>> +        new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>
>These two "& 0x1" seem unnecessary, as the fields are 1 bit only
>anyway.
>

Ok, will remove them.

>> +        new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 
>> 0x1;
>
>This masking, however, has always been puzzling me: dlm is a 3 bit
>field, and the MSI message field is a 3-bit one too. Hence the
>masking should also be dropped here - if the message field is wrong
>(i.e. holding an unsupported value), there's no good reason to try
>to compensate for it here. If at all the function should refuse to do
>the requested translation.
>
>> +        /* Hardware require RH = 1 for LPR delivery mode */
>> +        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> +        new_ire.remap.avail = 0;
>> +        new_ire.remap.res_1 = 0;
>> +        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> +                                MSI_DATA_VECTOR_MASK;
>> +        new_ire.remap.res_2 = 0;
>> +        if ( x2apic_enabled )
>> +            new_ire.remap.dst = msg->dest32;
>> +        else
>> +            new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>> +                                 & 0xff) << 8;
>> +
>> +        new_ire.remap.res_3 = 0;
>> +        new_ire.remap.res_4 = 0;
>> +        new_ire.remap.p = 1;    /* finally, set present bit */
>
>I don't understand this comment. The order of operations does not
>matter at all, and in fact you now set p before being done with all
>other fields. Please drop such misleading comments, or make them
>useful/correct.

Yes, The order does not matter. I will drop this comment.

>
>Also, the only field you don't explicitly set here is .im. Why?
>
>> +    }
>>      else
>> -        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
>> -                             & 0xff) << 8;
>> +    {
>> +        new_ire.post.fpd = 0;
>> +        new_ire.post.res_1 = 0;
>> +        new_ire.post.res_2 = 0;
>> +        new_ire.post.urg = 0;
>> +        new_ire.post.im = 1;
>> +        new_ire.post.vector = gvec;
>> +        new_ire.post.res_3 = 0;
>> +        new_ire.post.res_4 = 0;
>> +        new_ire.post.res_5 = 0;
>> +        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
>> +        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
>> +        new_ire.post.p = 1;    /* finally, set present bit */
>> +    }
>
>Along the lines of the comment above, you don't fill avail here. Why?
>
>Taking both together, I don't see why - after adding said initialization -
>new_ire needs to start out as a copy of the live IRTE. Instead you
>could memset() the whole structure to zero, or simply give it an empty
>initializer (saving you from initializing all the reserved fields, plus some
>more).
>

Yes, it is really confused. Maybe because this field is available to software 
in posting
format IRTE. We can reserve the avail field through introducing a flag to
distinguish initialization from update. We also can clear the avail field every
time if we don't use it right now, leaving the problem to others who want use
the avail field.  Do you think which is better?

>And of course, along the lines of ...
>
>>      if ( pdev )
>>          set_msi_source_id(pdev, &new_ire);
>>      else
>>          set_hpet_source_id(msi_desc->hpet_id, &new_ire);
>
>... this, I see little reason for common fields to be initialized separately
>on each path. According to the code above that's only p (leaving
>aside fields which get zeroed), but anyway. Perhaps there should
>even be a common sub-structure of the union ...

I can add a patch in this series to do this.

>
>> @@ -637,7 +657,23 @@ static int msi_msg_to_remap_entry(
>>      remap_rte->address_hi = 0;
>>      remap_rte->data = index - i;
>>
>> -    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>> +    if ( !pi_desc )
>> +        memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
>> +    else
>> +    {
>> +        __uint128_t ret;
>> +
>> +        old_ire = *iremap_entry;
>> +        ret = cmpxchg16b(iremap_entry, &old_ire, &new_ire);
>> +
>> +        /*
>> +         * In the above, we use cmpxchg16 to atomically update the 128-bit 
>> IRTE,
>> +         * and the hardware cannot update the IRTE behind us, so the return 
>> value
>> +         * of cmpxchg16 should be the same as old_ire. This ASSERT validate 
>> it.
>> +         */
>> +        ASSERT(ret == old_ire.val);
>> +    }
>
>Could you remind me again please why posted format updates need
>to use cmpxchg16b, but remapped format ones don't? (As a aside,
>with the code structure as you have it you should move the old_irte
>declaration here, or omit it altogether, as you could as well pass
>*iremap_entry directly afaict.)

Before feng left, I have asked him about this question. He told me that
the PDA field of posted format IRTE comprises of two parts:
Posted Descritor Address High[127:96] and Low [63:38]. If we want to update
PDA field, do it atomically or disable-update-enable. He also said, it had
been confirmed that cmpxchg16b was supported on all intel platform with VT-d PI.
If we assume that updates to remapped format IRTE only is to update either
64 bit or high 64 bit (except initialition), two 64bit memory write operations
is enough. 

>
>> @@ -668,7 +704,8 @@ int msi_msg_write_remap_rte(
>>
>>      drhd = pdev ? acpi_find_matched_drhd_unit(pdev)
>>                  : hpet_to_drhd(msi_desc->hpet_id);
>> -    return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev, msi_desc, msg)
>> +    return drhd ? msi_msg_to_remap_entry(drhd->iommu, pdev,
>> +                                         msi_desc, msg, NULL, 0)
>
>Is this unconditional passing of NULL here really correct?

Since two parameters are added to this function, we should think about what
the function does again. the last 2 parameters are optional.

If they are not present, just means a physical device driver changes its msi
message. So it notifies iommu to do some changes to IRTE accordingly (the 
driver doesn't
know the format of the live IRTE). This is the case above.

If they are present, it means the msi should be delivered to the vcpu with the
vector num. To achieve that, the function replaces the old IRTE with a new
posted format IRTE.

>
>> @@ -946,17 +947,12 @@ int pi_update_irte(const struct vcpu *v, const struct 
>> pirq *pirq,
>>      const uint8_t gvec)
>>  {
>>      struct irq_desc *desc;
>> -    const struct msi_desc *msi_desc;
>> -    int remap_index;
>> +    struct msi_desc *msi_desc;
>>      int rc = 0;
>>      const struct pci_dev *pci_dev;
>>      const struct acpi_drhd_unit *drhd;
>>      struct iommu *iommu;
>> -    struct ir_ctrl *ir_ctrl;
>> -    struct iremap_entry *iremap_entries = NULL, *p = NULL;
>> -    struct iremap_entry new_ire, old_ire;
>>      const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> -    __uint128_t ret;
>>
>>      desc = pirq_spin_lock_irq_desc(pirq, NULL);
>>      if ( !desc )
>> @@ -976,8 +972,6 @@ int pi_update_irte(const struct vcpu *v, const struct 
>> pirq *pirq,
>>          goto unlock_out;
>>      }
>>
>> -    remap_index = msi_desc->remap_index;
>> -
>>      spin_unlock_irq(&desc->lock);
>>
>>      ASSERT(pcidevs_locked());
>> @@ -992,35 +986,11 @@ int pi_update_irte(const struct vcpu *v, const struct 
>> pirq *pirq,
>>          return -ENODEV;
>>
>>      iommu = drhd->iommu;
>> -    ir_ctrl = iommu_ir_ctrl(iommu);
>> -    if ( !ir_ctrl )
>> +    if ( !iommu_ir_ctrl(iommu) )
>>          return -ENODEV;
>>
>> -    spin_lock_irq(&ir_ctrl->iremap_lock);
>> -
>> -    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
>> -
>> -    old_ire = *p;
>> -
>> -    /* Setup/Update interrupt remapping table entry. */
>> -    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
>> -    ret = cmpxchg16b(p, &old_ire, &new_ire);
>> -
>> -    /*
>> -     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
>> -     * and the hardware cannot update the IRTE behind us, so the return 
>> value
>> -     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
>> -     */
>> -    ASSERT(ret == old_ire.val);
>> -
>> -    iommu_flush_cache_entry(p, sizeof(*p));
>> -    iommu_flush_iec_index(iommu, 0, remap_index);
>> -
>> -    unmap_vtd_domain_page(iremap_entries);
>> -
>> -    spin_unlock_irq(&ir_ctrl->iremap_lock);
>> -
>> -    return 0;
>> +    return msi_msg_to_remap_entry(iommu, pci_dev, msi_desc, &msi_desc->msg,
>> +                                  pi_desc, gvec);
>
>There are few changes here which appear to have the potential of
>affecting behavior: Previously you didn't alter msi_desc or the MSI
>message contained therein (as documented by the pointer having
>been const). Is this possible updating of message and remap index
>really benign? In any event any such changes should be reasoned
>about in the commit message.

I agree that we can't update message and remap index in this pi_update_irte.
but msi_msg_to_remap_entry will change msi_desc when msi_desc->remap_index < 0.
How about splitting part of msi_msg_to_remap_entry to a new function which 
consumes a const
msi_desc parameter and pi_update_irte will call the new function?

>
>Jan
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@xxxxxxxxxxxxx
>https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.