[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space



>>> On 29.05.17 at 14:57, <roger.pau@xxxxxxxxxx> wrote:
> On Fri, May 19, 2017 at 05:35:47AM -0600, Jan Beulich wrote:
>> >>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
>> > +    /* Read/write of unset register. */
>> > +    VPCI_READ_CHECK(8, 4, 0xffffffff);
>> > +    VPCI_READ_CHECK(8, 2, 0xffff);
>> > +    VPCI_READ_CHECK(8, 1, 0xff);
>> > +    VPCI_WRITE(10, 2, 0xbeef);
>> > +    VPCI_READ_CHECK(10, 2, 0xffff);
>> > +
>> > +    /* Read of multiple registers */
>> > +    VPCI_CHECK_REG(7, 1, 0xbd);
>> > +    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
>> 
>> I think a variant accessing mixed size registers would also be
>> desirable here. Perhaps it would be best to exhaustively test
>> all possible variations (there aren't that many after all). Same
>> for writes and partial accesses (below) then.
> 
> So you mean to scan the whole space (from 0 to 128 on this test) using
> all possible register sizes for both read and write? That would indeed
> be feasible.

Not sure what the "from 0 to 128" is meant to apply to. What I mean
is to test all combinations of (mix of) register sizes and access sizes.
I.e. all combinations making up a single 4-byte field ((1,1,1,1),
(1,1,2), (2,1,1), (2,2), (4)) and all four 1-byte accesses, both 2-byte
ones, and the sole possible 4-byte one.

>> > @@ -256,6 +257,152 @@ void register_g2m_portio_handler(struct domain *d)
>> >      handler->ops = &g2m_portio_ops;
>> >  }
>> >  
>> > +/* Do some sanity checks. */
>> > +static int vpci_access_check(unsigned int reg, unsigned int len)
>> > +{
>> > +    /* Check access size. */
>> > +    if ( len != 1 && len != 2 && len != 4 )
>> > +    {
>> > +        gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
>> > +                 reg, len);
>> 
>> I think many of such gdprintk()s want to go away before this series
>> gets committed.
> 
> OK, I've found them useful while developing, but I guess they are not
> really useful outside from that context. I guess there's no way to
> leave them in place, maybe a Kconfig option?

That seems overkill. I wouldn't so much mind the messages (they
get compiled out for non-debug builds anyway), but the clutter
they introduce to code: In some cases half of your functions are
the invocation of gdprintk() ...

>> > +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
>> > +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
>> 
>> Plain bool please.
> 
> Sadly struct hvm_io_ops (which is where this function is used) expects
> a bool_t as return.

I don't follow - bool_t is simply a typedef of bool nowadays, and
typedefs are all equivalent as far as the C type system goes.

>> Again the question - what's the bare hardware equivalent of
>> returning X86EMUL_UNHANDLEABLE here?
> 
> All 1's I assume (or other random garbage). Would you be OK with me
> adding a "fail" label that sets data to all 1's and returns X86EMUL_OKAY?

That would probably be okay (despite my dislike of labels and goto-s).

>> > +#include <xen/sched.h>
>> > +#include <xen/vpci.h>
>> > +
>> > +extern const vpci_register_init_t __start_vpci_array[], 
>> > __end_vpci_array[];
>> > +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>> > +#define vpci_init __start_vpci_array
>> 
>> What is this last one good for?
> 
> It's used by xen_vpci_add_handlers in order to call the init
> functions, I can drop it and call __start_vpci_array[i](...) if that's
> better.

Well, if there were several users, I could see the benefit of an
abbreviating #define. But for a single user the #define adds
more code / clutter than is being saved on the use site.

>> > +    struct rb_node node;
>> > +};
>> > +
>> > +int xen_vpci_add_handlers(struct pci_dev *pdev)
>> 
>> __hwdom_init (I notice setup_one_hwdom_device() wrongly isn't
>> annotated so).
> 
> OK, so you really want the init handlers to be inside of the
> .init.rodata section then.

Only if that's correct, and it is correct as long as all possible call trees
root in a __hwdom_init function. (To avoid misunderstanding: This
clearly can't be .init.rodata uniformly, as __hwdom_init isn't always
an alias of __init).

>> > +    struct vpci_register r = {
>> > +        .offset = reg,
>> > +        .size = size,
>> > +    };
>> > +
>> > +    ASSERT(vpci_locked(pdev->domain));
>> > +
>> > +    node = pdev->vpci->handlers.rb_node;
>> > +    while ( node )
>> > +    {
>> > +        struct vpci_register *t =
>> 
>> const
> 
> Making both of them const means the return must also be const, and
> that's not suitable by some of the consumers (ie:
> xen_vpci_remove_register for example).

In that case there's no choice then, okay.

>> > +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t read_handler,

Btw., I only now notice this further strange xen_ prefix here.

>> > +                          vpci_write_t write_handler, unsigned int offset,
>> > +                          unsigned int size, void *data)
>> > +{
>> > +    struct rb_node **new, *parent;
>> > +    struct vpci_register *r;
>> > +
>> > +    /* Some sanity checks. */
>> > +    if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||
>> 
>> Off by one again in the offset check.
> 
> Fixed to be > 0xfff. Should this maybe be added to pci_regs.h as
> PCI_MAX_REGISTER? 

Could be done, but then we need one for base and one for
extended config space. May want to check whether Linux has
invented some good names for these by now.

>> > +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset)
>> > +{
>> > +    struct vpci_register *r;
>> > +
>> > +    vpci_lock(pdev->domain);
>> > +    r = vpci_find_register(pdev, offset, 1 /* size doesn't matter here. 
>> > */);
>> 
>> I'm not sure about this - is there anything wrong with the caller,
>> knowing the size, also passing it? You could then even refuse
>> requests to remove a register where (offset,size) doesn't match
>> the recorded values (as vpci_find_register() will return any
>> overlapping one).
> 
> Well, I think the important bit is to check that what
> vpci_find_register returns matches what the called expects to
> remove.

Correct. Debuggability would call for checking both offset and size.

>> > +/* Helper macros for the read/write handlers. */
>> > +#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)
>> 
>> What do e and s stand for here?
> 
> e = end, s = start (in bytes).

And where do you start counting. Having seen the rest of the
series I'm actually rather unconvinced use these macros results
in better code - to me, plain hex numbers are quite a bit easier
to read as long as the number of digits doesn't go meaningfully
beyond 10 or so.

>> > +#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8
>> 
>> And at least o here?
> 
> o = offset (in bytes)

I think simply writhing the shift expression is once again more
clear to the reader than using a macro which is longer to read
and type and which has semantics which aren't immediately
clear from its name.

> I can rename ADD_RESULT to APPEND_RESULT or something more descriptive
> if you think it's going to make it easier to understand.

I'd prefer if the name "merge" appeared in that name - I don't see
this being usable strictly only to append to either side of a value.

>> > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
>> 
>> The function being other than void, same question as earlier:
>> What's the bare hardware equivalent of this returning other
>> than zero?
> 
> I though it would be useful to have some more fine-grained error
> reporting if that's ever needed, although as you say, from a hardware
> point of view any errors will be simply reported as the value obtained
> being all 1's.
> 
> The question is, should this already return all 1's, or should the
> called translate failures into all 1's?

If you leave this to the callers, overall code would likely become
less readable, so I'd prefer this to be done in a central place.

>> > +    /* Find the vPCI register handler. */
>> > +    r = vpci_find_register(pdev, reg, size);
>> 
>> With the overlap handling in vpci_find_register() I can't see how
>> this would reliably return the correct (lowest) register when the
>> request spans multiple ones.
> 
> It doesn't need to, if there's a lower register it will be found by
> the recursive call to xen_vpci_read done below (before calling into
> the handler pointed by r).

Since that's quite non-obvious, to me this is another argument
against using recursion here.

>> > +static int vpci_write_helper(struct pci_dev *pdev,
>> > +                             const struct vpci_register *r, unsigned int 
>> > size,
>> > +                             unsigned int offset, uint32_t data)
>> > +{
>> > +    union vpci_val val = { .double_word = data };
>> > +    int rc;
>> > +
>> > +    ASSERT(size <= r->size);
>> > +    if ( size != r->size )
>> > +    {
>> > +        rc = r->read(pdev, r->offset, &val, r->priv_data);
>> > +        if ( rc )
>> > +            return rc;
>> > +        val.double_word &= ~GENMASK_BYTES(size + offset, offset);
>> > +        data &= GENMASK_BYTES(size, 0);
>> > +        val.double_word |= data << (offset * 8);
>> > +    }
>> > +
>> > +    return r->write(pdev, r->offset, val, r->priv_data);
>> > +}
>> 
>> I'm not sure that writing back the value read is correct in all cases
>> (think of write-only or rw1c registers or even offsets where reads
>> and writes access different registers altogether). I think the write
>> handlers will need to be made capable of dealing with partial writes.
> 
> That seems to be what pciback does fro writes: read, merge value,
> write back (drivers/xen/xen-pciback/conf_space.c
> xen_pcibk_config_write):
> 
> err = conf_space_read(dev, cfg_entry, field_start,
>                     &tmp_val);
> if (err)
>       break;
> 
> tmp_val = merge_value(tmp_val, value, get_mask(size),
>                     offset - field_start);
> 
> err = conf_space_write(dev, cfg_entry, field_start,
>                      tmp_val);
> 
> I would prefer to do it this way in order to avoid adding more
> complexity to the handlers themselves. So far I haven't found any such
> registers (rw1c) in the PCI config space, do you have references to
> any of them?

The status register.

>> > +/* Helpers for locking/unlocking. */
>> > +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
>> > +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
>> > +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
>> 
>> While for the code layering you don't need recursive locks, did you
>> consider using them nevertheless so that spin_is_locked() return
>> values are actually meaningful for your purposes?
> 
> I'm not sure I follow, spin_is_locked already return meaningful values
> for my purpose AFAICT.

For non-recursive locks this tells you whether _any_ CPU holds
the lock, yet normally you want to know whether the CPU you
run on does.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.