[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.



>>> On 25.03.16 at 10:27, <quan.xu@xxxxxxxxx> wrote:
> On March 18, 2016 6:20pm, <JBeulich@xxxxxxxx> wrote:
>> >>> On 17.03.16 at 07:54, <quan.xu@xxxxxxxxx> wrote:
>> > --- a/xen/drivers/passthrough/iommu.c
>> > +++ b/xen/drivers/passthrough/iommu.c
>> > @@ -182,7 +182,11 @@ void __hwdom_init iommu_hwdom_init(struct
>> domain *d)
>> >                   ((page->u.inuse.type_info & PGT_type_mask)
>> >                    == PGT_writable_page) )
>> >                  mapping |= IOMMUF_writable;
>> > -            hd->platform_ops->map_page(d, gfn, mfn, mapping);
>> > +            if ( hd->platform_ops->map_page(d, gfn, mfn, mapping) )
>> > +                printk(XENLOG_G_ERR
>> > +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx)
>> failed.\n",
>> > +                       gfn, mfn);
>> > +
>> 
>> Printing one message here is certainly necessary, but what if the failure 
> repeats
>> for very many pages? 
> 
> Yes, to me, it is ok, but I am open to your suggestion.
> 
>> Also %#lx instead of 0x%lx please, and a blank before the
>> opening parenthesis.
>> 
> OK, just check it:
> 
> ..
> "IOMMU: Map page gfn: %#lx (mfn: %#lx) failed.\n"
> ..
> 
> Right?

Almost: Generally no full stop in log messages please. And I think
the word "page" is redundant here. As rule of thumb: Log
messages should give as much as possible useful information (which
includes the requirement for distinct ones to be distinguishable in
resulting logs) with as little as possible verbosity.

>> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
>> >          iommu = drhd->iommu;
>> >          iommu_flush_context_global(iommu, 0);
>> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
>> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
>> > +        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
>> > +
>> > +        if ( rc > 0 )
>> > +        {
>> > +            iommu_flush_write_buffer(iommu);
>> 
>> Why is this needed all of the sudden?
> 
> As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my machine, and 
> I can find the following log message:
> """
> (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
> (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
> """
> __iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be called to 
> flush every IOMMU.

For one what you say suggests that right now this is being done
for some (one?) IOMMU(s), which I don't see being the case. And
then what you say _still_ doesn't say _why_ this is now needed all
of the sudden. If, in the course of doing your re-work here, you
find pre-existing issues with the code, please split the necessary
fixes out of your re-work and submit them separately with proper
explanations in their commit messages.

>> > +            rc = 0;
>> > +        }
>> > +        else if ( rc < 0 )
>> > +        {
>> > +            printk(XENLOG_G_ERR "IOMMU: IOMMU flush all failed.\n");
>> > +            break;
>> > +        }
>> 
>> Is a log message really advisable here?
>> 
> 
> To me, It looks tricky too. I was struggling to make decision. For scheme B, 
> I would try to do as below:
> 
> if ( iommu_flush_all() )
>     printk("... nnn ...");
> 
> but there are 4 function calls, if so, to me, it looks redundant.
> 
> Or, could I ignore the print out for iommu_flush_all() failed?

Directing the question back is not helpful: You should know better
than me why you had added a log message there in the first place.
And it is this "why" which is to judge about whether it should stay
there, move elsewhere, or get dropped altogether.

>> > @@ -622,7 +640,7 @@ static void dma_pte_clear_one(struct domain
>> *domain, u64 addr)
>> >      if ( pg_maddr == 0 )
>> >      {
>> >          spin_unlock(&hd->arch.mapping_lock);
>> > -        return;
>> > +        return -ENOMEM;
>> >      }
>> 
>> addr_to_dma_page_maddr() gets called with "alloc" being false, so there can't
>> be any memory allocation failure here. There simply is nothing to do in this
>> case.
>> 
> 
> I copy it from iommu_map_page().
> 
> Good, then the error of iommu_unmap_page() looks only from flush (the crash 
> is at least obvious), then error handling can be lighter weight--
> We may return an error, but don't roll back the failed operation.
> Right?

I don't think so, and I can only re-iterate: There can't be any error
here, so there's no error code to forward up the call tree. IOW the
"pg_maddr == 0" case simply means "nothing to do" here.

>> > -void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
>> > +int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
>> >  {
>> >      u32 id;
>> > +    int rc = 0;
>> >
>> >      id = pci_conf_read32(0, 0, 0, 0, 0);
>> >      if ( IS_CTG(id) )
>> >      {
>> >          /* quit if ME does not exist */
>> >          if ( pci_conf_read32(0, 0, 3, 0, 0) == 0xffffffff )
>> > -            return;
>> > +            return -ENOENT;
>> 
>> Is this really an error? IOW, do all systems which satisfy IS_CTG() have 
>> such a
>> device?
>> 
> To be honest, I didn't know much about me_wifi_quirk.

In such a case - how about asking the maintainers, who are your
colleagues? And that of course after having looked at the history
in an attempt to gain some understanding.

> Now, IMO I don't need to deal with me_wifi_quirk().

Well, you clearly can't ignore it.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.