[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs



On Fri, Jun 23, 2017 at 02:58:28AM -0600, Jan Beulich wrote:
> >>> On 22.06.17 at 19:13, <roger.pau@xxxxxxxxxx> wrote:
> > On Fri, May 19, 2017 at 09:21:56AM -0600, Jan Beulich wrote:
> >> >>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
> >> > +static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
> >> > +{
> >> > +    struct vpci_header *header = &pdev->vpci->header;
> >> > +    unsigned int i;
> >> > +    int rc = 0;
> >> > +
> >> > +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> >> > +    {
> >> > +        paddr_t gaddr = map ? header->bars[i].gaddr
> >> > +                            : header->bars[i].mapped_addr;
> >> > +        paddr_t paddr = header->bars[i].paddr;
> >> > +
> >> > +        if ( header->bars[i].type != VPCI_BAR_MEM &&
> >> > +             header->bars[i].type != VPCI_BAR_MEM64_LO )
> >> > +            continue;
> >> > +
> >> > +        rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)),
> >> > +                         _mfn(PFN_DOWN(paddr)), 
> >> > PFN_UP(header->bars[i].size),
> >> 
> >> The PFN_UP() indicates a problem: For sub-page BARs you can't
> >> blindly map/unmap them without taking into consideration other
> >> devices sharing the same page.
> > 
> > I'm not sure I follow, the start address of BARs is always aligned to
> > a 4KB boundary, so there's no chance of the same page being used by
> > two different BARs at the same time.
> 
> I'm not sure where you're taking this from. Modern BIOSes may
> aim at doing so, but for one I'm sure I've seen smaller alignment
> quite often on older machines, and then my most modern AMD
> one has these three devices, for example:

Right, I guess I will have to somehow check for overlapping regions,
how inconvenient.

> 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
> SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
>       Subsystem: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA 
> Controller [AHCI mode]
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>       Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>       Latency: 64
>       Interrupt: pin A routed to IRQ 22
>       Region 0: I/O ports at 2430 [size=8]
>       Region 1: I/O ports at 2424 [size=4]
>       Region 2: I/O ports at 2428 [size=8]
>       Region 3: I/O ports at 2420 [size=4]
>       Region 4: I/O ports at 2400 [size=16]
>       Region 5: Memory at c8014000 (32-bit, non-prefetchable) [size=1K]
>       Capabilities: [60] Power Management version 2
>               Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>               Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>       Capabilities: [70] SATA HBA v1.0 InCfgSpace
>       Kernel driver in use: ahci
>       Kernel modules: ahci
> 
> 00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI 
> SB7x0/SB8x0/SB9x0 USB EHCI Controller (prog-if 20 [EHCI])
>       Subsystem: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
> EHCI Controller
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>       Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>       Latency: 64, Cache Line Size: 32 bytes
>       Interrupt: pin B routed to IRQ 17
>       Region 0: Memory at c8014400 (32-bit, non-prefetchable) [size=256]
>       Capabilities: [c0] Power Management version 2
>               Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0+,D1+,D2+,D3hot+,D3cold-)
>               Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>               Bridge: PM- B3+
>       Capabilities: [e4] Debug port: BAR=1 offset=00e0
>       Kernel driver in use: ehci_hcd
>       Kernel modules: ehci-hcd
> 
> 00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI 
> SB7x0/SB8x0/SB9x0 USB EHCI Controller (prog-if 20 [EHCI])
>       Subsystem: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
> EHCI Controller
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>       Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>       Latency: 64, Cache Line Size: 32 bytes
>       Interrupt: pin B routed to IRQ 19
>       Region 0: Memory at c8014800 (32-bit, non-prefetchable) [size=256]
>       Capabilities: [c0] Power Management version 2
>               Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0+,D1+,D2+,D3hot+,D3cold-)
>               Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>               Bridge: PM- B3+
>       Capabilities: [e4] Debug port: BAR=1 offset=00e0
>       Kernel driver in use: ehci_hcd
>       Kernel modules: ehci-hcd
> 
> > The size is indeed not aligned to 4KB, but I don't see how this can
> > cause collisions with other BARs unless the domain is actively trying
> > to make the BARs overlap, in which case there's not much Xen can do.
> 
> The above is not what Dom0 did, but how the system boots up.
> And this "there's not much Xen can do" is what I've been trying
> to get at with my comment: A solution is needed here for your
> approach to vPCI handling to be viable.

Checking for overlap seem to be the only sensible option here. Xen is
in no position to relocate BARs.

> >> > +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
> >> > +                          union vpci_val val, void *data)
> >> > +{
> >> > +    struct vpci_bar *bar = data;
> >> > +    uint32_t wdata = val.double_word;
> >> > +    bool hi = false, unset = false;
> >> > +
> >> > +    ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO 
> >> > ||
> >> > +           bar->type == VPCI_BAR_MEM64_HI);
> >> > +
> >> > +    if ( wdata == GENMASK(31, 0) )
> >> 
> >> I'm afraid this again doesn't match real hardware behavior: As the
> >> low bits are r/o, writes with them having any value, but all other
> >> bits being 1 should have the same effect. I notice that while I had
> >> fixed this for the ROM BAR in Linux'es pciback, I should have also
> >> fixed this for ordinary ones.
> > 
> > I've changed this to:
> > 
> >     switch ( bar->type )
> >     {
> >     case VPCI_BAR_MEM:
> >         size_mask = GENMASK(31, 12);
> 
> Relating to the comment further up - where's this 12 coming from?

Hm, this is from the "PCI Express Technology" Mindshare book, which
states that for 32bit memory BARs bits [11,4] are hardcoded to 0 and
for 64bit BARs bits [25,4] are also hardcoded to 0. I've been
searching for such statement in the PCI Local Bus specification
version 3.0, but I don't seem to be able to find any references to
this. I guess I will do:

switch ( bar->type )
{
case VPCI_BAR_MEM32:
case VPCI_BAR_MEM64_LO:
    size_mask = PCI_BASE_ADDRESS_MEM_MASK;
    break;
case VPCI_BAR_MEM64_HI:
    size_mask = ~0u;
default:
    ASSERT_UNREACHABLE();
    return;
}

> >         break;
> >     case VPCI_BAR_MEM64_LO:
> >         size_mask = GENMASK(31, 26);
> 
> And this 26?
> 
> >         break;
> >     case VPCI_BAR_MEM64_HI:
> >         size_mask = GENMASK(31, 0);
> >         break;
> >     default:
> >         ASSERT_UNREACHABLE();
> >         break;
> 
> You want to return here.
> 
> >> > +    }
> >> > +
> >> > +    ASSERT(IS_ALIGNED(bar->gaddr, PAGE_SIZE));
> >> 
> >> Urgh.
> > 
> > Removed.
> 
> With your comment further up, you should have refused to do so
> (i.e. I'm getting the impression you're not really sure about that
> supposed 4k alignment).

No, it's not 4K aligned.

> >> > +        if ( (bars[i].type == VPCI_BAR_MEM && addr == GENMASK(31, 12)) 
> >> > ||
> >> > +             addr == GENMASK(63, 26) )
> >> 
> >> Where is this 26 coming from?
> >> Perhaps
> >> 
> >>     if ( addr == GENMASK(bars[i].type == VPCI_BAR_MEM ? 31 : 63, 12) )
> > 
> > I'm checking the memory decode bit here instead in order to figure out
> > if the BAR is not positioned.
> > 
> >> ? Albeit I'm unconvinced GENMASK() is useful to be used here anyway
> >> (see also below).
> > 
> > Right, regardless of the specific usage above, what would you
> > recommend regarding the usage of GENMASK?
> > 
> > Julien suggested introducing GENMASK_ULL. Should I go that route, or
> > introduce something locally for vPCI?
> 
> Back when GENMASK() was introduced to our code base I've
> already indicated that I'm not really in favor of it. I don't think
> it really helps readability all that much (to me, plain hex
> numbers are easier to grok, albeit I admit ones extending
> beyond 8 or 10 digits are less easy to digest; sadly the once
> proposed [by Intel, I think, in the early ia64 days] language
> extension to permit _ separators in numbers doesn't appear
> to have made it anywhere).

I could also switch GENMASK to use long long instead, but I'm not sure
if that's going to break existing callers. Let me try to see if I can
get away without using it (although I kind of liked it for coding
masks).

> >> > +        } bars[6];
> >> 
> >> What about the ROM and SR-IOV ones?
> > 
> > I've implemented support for the expansion ROM BAR (which I still need
> > to figure out how to test),
> 
> There should be hardly any graphics card without a ROM. For
> remote boot purposes also most NICs come with a ROM, albeit
> many BIOSes allow turning it off. Most SCSI cards I've seem
> have a (configuration) ROM too.

OK, but testing NICs ROMs is going to be impossible from a PVH Dom0. I
guess graphics cards, although most of my boxes are headless.

> > but I would like to defer SR-IOV for later
> > because it involves a non-trivial amount of work, and with this series
> > one can already boot a PVH Dom0 (minus SR-IOV of course).
> 
> That's likely okay as long as there's a suitable, much beloved
> "fixme" comment somewhere.

OK, the vpci.h header seems like the best place to add such a fixme
comment.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.