[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Qemu-devel] [PATCH V9 3/8] Introduce HostPCIDevice to access a pci device on the host.



On Wed, Mar 21, 2012 at 20:30, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> On Wed, Mar 21, 2012 at 06:29:00PM +0000, Anthony PERARD wrote:
>> Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx>
>> Acked-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>
> So this interface is really LinuxSysfsPCIDevice.
> For example the assumption that you can just open
> device by pci address is broken with vfio.
> Domain number is also not something anyone
> besides linux knows about.
>
> If I were you I would just call it xen- ....
> and if it comes in handy it can be later renamed.

Ok, I will rename that XenHostPCIDevice.

>> ---
>> ÂMakefile.target   Â|  Â3 +
>> Âhw/host-pci-device.c | Â278 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>> Âhw/host-pci-device.h | Â 75 ++++++++++++++
>> Â3 files changed, 356 insertions(+), 0 deletions(-)
>> Âcreate mode 100644 hw/host-pci-device.c
>> Âcreate mode 100644 hw/host-pci-device.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 63cf769..0ccfd5b 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -232,6 +232,9 @@ obj-$(CONFIG_NO_XEN) += xen-stub.o
>>
>> Âobj-i386-$(CONFIG_XEN) += xen_platform.o
>>
>> +# Xen PCI Passthrough
>> +obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += host-pci-device.o
>> +
>> Â# Inter-VM PCI shared memory
>> ÂCONFIG_IVSHMEM =
>> Âifeq ($(CONFIG_KVM), y)
>> diff --git a/hw/host-pci-device.c b/hw/host-pci-device.c
>> new file mode 100644
>> index 0000000..3dacb30
>> --- /dev/null
>> +++ b/hw/host-pci-device.c
>> @@ -0,0 +1,278 @@
>> +/*
>> + * Copyright (C) 2011 Â Â Â Citrix Ltd.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2. ÂSee
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "host-pci-device.h"
>> +
>> +#define PCI_MAX_EXT_CAP \
>> + Â Â((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 
>> 4))
>
> namespace pollution.
> name all things HOST_PCI_....
>
> in this case, open-coding will make things clearer.
>
>
>> +
>> +enum error_code {
>
> seems unused. So why name the type?
>
>> + Â ÂERROR_SYNTAX = 1,
>
> We return -1 on error, just do that and you won't need ERROR_SYNTAX.

Ok, I'll remove this.

>> +};
>> +
>> +static int path_to(const HostPCIDevice *d,
>> + Â Â Â Â Â Â Â Â Â const char *name, char *buf, ssize_t size)
>> +{
>> + Â Âreturn snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%x/%s",
>> + Â Â Â Â Â Â Â Â Â Âd->domain, d->bus, d->dev, d->func, name);
>> +}
>
> users ignore return value. Also, want to check no overflow
> and assert?

I will check the return value in this function an then return 0 or -1.

>> +
>> +static int get_resource(HostPCIDevice *d)
>> +{
>> + Â Âint i, rc = 0;
>> + Â ÂFILE *f;
>> + Â Âchar path[PATH_MAX];
>> + Â Âunsigned long long start, end, flags, size;
>> +
>> + Â Âpath_to(d, "resource", path, sizeof (path));
>
> I think this might not fit, snprintf needs an extra byte for \0.

I just check snprintf write size byte including the \0, sw we should
just give the size of the buffer.

>> + Â Âf = fopen(path, "r");
>> + Â Âif (!f) {
>> + Â Â Â Âfprintf(stderr, "Error: Can't open %s: %s\n", path, 
>> strerror(errno));
>> + Â Â Â Âreturn -errno;
>> + Â Â}
>> +
>> + Â Âfor (i = 0; i < PCI_NUM_REGIONS; i++) {
>> + Â Â Â Âif (fscanf(f, "%llx %llx %llx", &start, &end, &flags) != 3) {
>
> People mentioned that scanf is not a good way to parse input.
> Applies here.

Ok, I'll do a manual parsing :(.

>> + Â Â Â Â Â Âfprintf(stderr, "Error: Syntax error in %s\n", path);
>> + Â Â Â Â Â Ârc = ERROR_SYNTAX;
>> + Â Â Â Â Â Âbreak;
>> + Â Â Â Â}
>> + Â Â Â Âif (start) {
>> + Â Â Â Â Â Âsize = end - start + 1;
>> + Â Â Â Â} else {
>> + Â Â Â Â Â Âsize = 0;
>> + Â Â Â Â}
>> +
>> + Â Â Â Âif (i < PCI_ROM_SLOT) {
>> + Â Â Â Â Â Âd->io_regions[i].base_addr = start;
>> + Â Â Â Â Â Âd->io_regions[i].size = size;
>> + Â Â Â Â Â Âd->io_regions[i].flags = flags;
>> + Â Â Â Â} else {
>> + Â Â Â Â Â Âd->rom.base_addr = start;
>> + Â Â Â Â Â Âd->rom.size = size;
>> + Â Â Â Â Â Âd->rom.flags = flags;
>> + Â Â Â Â}
>> + Â Â}
>> +
>> + Â Âfclose(f);
>> + Â Âreturn rc;
>> +}
>> +
>> +static int get_hex_value(HostPCIDevice *d, const char *name,
>> + Â Â Â Â Â Â Â Â Â Â Â Â unsigned long *pvalue)
>
> why long?

Do be a bit generic I suppose, but I just use this function for
vendor_id and device_id, I probably just need an int.

>> +{
>> + Â Âchar path[PATH_MAX];
>> + Â ÂFILE *f;
>> + Â Âunsigned long value;
>> +
>> + Â Âpath_to(d, name, path, sizeof (path));
>> + Â Âf = fopen(path, "r");
>> + Â Âif (!f) {
>> + Â Â Â Âfprintf(stderr, "Error: Can't open %s: %s\n", path, 
>> strerror(errno));
>> + Â Â Â Âreturn -errno;
>> + Â Â}
>> + Â Âif (fscanf(f, "%lx\n", &value) != 1) {
>> + Â Â Â Âfprintf(stderr, "Error: Syntax error in %s\n", path);
>> + Â Â Â Âfclose(f);
>> + Â Â Â Âreturn ERROR_SYNTAX;
>> + Â Â}
>> + Â Âfclose(f);
>> + Â Â*pvalue = value;
>> + Â Âreturn 0;
>> +}
>> +
>> +static bool pci_dev_is_virtfn(HostPCIDevice *d)
>> +{
>> + Â Âchar path[PATH_MAX];
>> + Â Âstruct stat buf;
>> +
>> + Â Âpath_to(d, "physfn", path, sizeof (path));
>> + Â Âreturn !stat(path, &buf);
>> +}
>> +
>
> Don't start names with pci_.
> It would also be better to avoid things like path_to IMO.

Do you mean avoiding the name or the purpose of the function path_to ?
For the name, I can probably rename it to sysfs_device_path()

>> +static int host_pci_config_fd(HostPCIDevice *d)
>
> So this opens if needed, and returns.
> Why not explicitly open on get?
> then you won't need these hacks.

Ok, I'll change that.

>> +{
>> + Â Âchar path[PATH_MAX];
>> +
>> + Â Âif (d->config_fd < 0) {
>> + Â Â Â Âpath_to(d, "config", path, sizeof (path));
>
> sizeof path
>
>> + Â Â Â Âd->config_fd = open(path, O_RDWR);
>> + Â Â Â Âif (d->config_fd < 0) {
>> + Â Â Â Â Â Âfprintf(stderr, "HostPCIDevice: Can not open '%s': %s\n",
>> + Â Â Â Â Â Â Â Â Â Âpath, strerror(errno));
>
> strerror is not thread safe
>
>> + Â Â Â Â}
>> + Â Â}
>> + Â Âreturn d->config_fd;
>> +}
>> +static int host_pci_config_read(HostPCIDevice *d, int pos, void *buf, int 
>> len)
>> +{
>> + Â Âint fd = host_pci_config_fd(d);
>
> You open file on each access?
>
>> + Â Âint res = 0;
>
> why initialize here?
>
>> +
>> +again:
>> + Â Âres = pread(fd, buf, len, pos);
>> + Â Âif (res != len) {
>> + Â Â Â Âif (res < 0 && (errno == EINTR || errno == EAGAIN)) {
>> + Â Â Â Â Â Âgoto again;
>
> code loops with while or for.

ok.

>> + Â Â Â Â}
>> + Â Â Â Âfprintf(stderr, "%s: read failed: %s (fd: %i)\n",
>> + Â Â Â Â Â Â Â Â__func__, strerror(errno), fd);
>> + Â Â Â Âreturn -errno;
>> + Â Â}
>> + Â Âreturn 0;
>> +}
>> +static int host_pci_config_write(HostPCIDevice *d,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â int pos, const void *buf, int len)
>> +{
>> + Â Âint fd = host_pci_config_fd(d);
>> + Â Âint res = 0;
>> +
>> +again:
>> + Â Âres = pwrite(fd, buf, len, pos);
>> + Â Âif (res != len) {
>> + Â Â Â Âif (res < 0 && (errno == EINTR || errno == EAGAIN)) {
>> + Â Â Â Â Â Âgoto again;
>> + Â Â Â Â}
>> + Â Â Â Âfprintf(stderr, "%s: write failed: %s\n",
>> + Â Â Â Â Â Â Â Â__func__, strerror(errno));
>> + Â Â Â Âreturn -errno;
>> + Â Â}
>> + Â Âreturn 0;
>> +}
>> +
>
> same comments as above. also,
> Don't report errors with fprintf.
>
>> +int host_pci_get_byte(HostPCIDevice *d, int pos, uint8_t *p)
>> +{
>> + Â Âuint8_t buf;
>> + Â Âint rc = host_pci_config_read(d, pos, &buf, 1);
>> + Â Âif (rc == 0) {
>
> !rc.
>
>> + Â Â Â Â*p = buf;
>
> why not pass in p directly?
>
>> + Â Â}
>> + Â Âreturn rc;
>> +}
>> +int host_pci_get_word(HostPCIDevice *d, int pos, uint16_t *p)
>> +{
>> + Â Âuint16_t buf;
>> + Â Âint rc = host_pci_config_read(d, pos, &buf, 2);
>> + Â Âif (rc == 0) {
>
> !rc.
>
>> + Â Â Â Â*p = le16_to_cpu(buf);
>> + Â Â}
>> + Â Âreturn rc;
>> +}
>
> This looks wrong wrt endian-ness.

It's seams that PCI config space registers are little-endian, so,
get/read a word/dword from the pci config space should be converted
from little-endian to the cpu endian-ness.

>> +int host_pci_get_long(HostPCIDevice *d, int pos, uint32_t *p)
>> +{
>> + Â Âuint32_t buf;
>> + Â Âint rc = host_pci_config_read(d, pos, &buf, 4);
>> + Â Âif (rc == 0) {
>> + Â Â Â Â*p = le32_to_cpu(buf);
>> + Â Â}
>> + Â Âreturn rc;
>> +}
>
> Add empty lines between {}

It's look nicer when I fold the function to only see one line :), but,
I add this empty lines.

>> +int host_pci_get_block(HostPCIDevice *d, int pos, uint8_t *buf, int len)
>> +{
>> + Â Âreturn host_pci_config_read(d, pos, buf, len);
>> +}
>
> when would this be useful?

It's used to initialize the "emulated" config space (of pci.h) and
every time a pci config read or write is issued by the guest.

>> +
>> +int host_pci_set_byte(HostPCIDevice *d, int pos, uint8_t data)
>> +{
>> + Â Âreturn host_pci_config_write(d, pos, &data, 1);
>> +}
>> +int host_pci_set_word(HostPCIDevice *d, int pos, uint16_t data)
>> +{
>> + Â Âdata = cpu_to_le16(data);
>> + Â Âreturn host_pci_config_write(d, pos, &data, 2);
>> +}
>> +int host_pci_set_long(HostPCIDevice *d, int pos, uint32_t data)
>> +{
>> + Â Âdata = cpu_to_le32(data);
>> + Â Âreturn host_pci_config_write(d, pos, &data, 4);
>> +}
>> +int host_pci_set_block(HostPCIDevice *d, int pos, uint8_t *buf, int len)
>> +{
>> + Â Âreturn host_pci_config_write(d, pos, buf, len);
>> +}
>> +
>> +uint32_t host_pci_find_ext_cap_offset(HostPCIDevice *d, uint32_t cap)
>
> Why 32? Ext config offsets are < 12 bit.

No apparent reason, the user of this function was just expecting a uint32.

>> +{
>> + Â Âuint32_t header = 0;
>> + Â Âint max_cap = PCI_MAX_EXT_CAP;
>> + Â Âint pos = PCI_CONFIG_SPACE_SIZE;
>> +
>> + Â Âdo {
>> + Â Â Â Âif (host_pci_get_long(d, pos, &header)) {
>> + Â Â Â Â Â Âbreak;
>> + Â Â Â Â}
>> + Â Â Â Â/*
>> + Â Â Â Â * If we have no capabilities, this is indicated by cap ID,
>> + Â Â Â Â * cap version and next pointer all being 0.
>> + Â Â Â Â */
>> + Â Â Â Âif (header == 0) {
>> + Â Â Â Â Â Âbreak;
>> + Â Â Â Â}
>> +
>> + Â Â Â Âif (PCI_EXT_CAP_ID(header) == cap) {
>> + Â Â Â Â Â Âreturn pos;
>> + Â Â Â Â}
>> +
>> + Â Â Â Âpos = PCI_EXT_CAP_NEXT(header);
>> + Â Â Â Âif (pos < PCI_CONFIG_SPACE_SIZE) {
>> + Â Â Â Â Â Âbreak;
>> + Â Â Â Â}
>> +
>> + Â Â Â Âmax_cap--;
>> + Â Â} while (max_cap > 0);
>> +
>> + Â Âreturn 0;
>> +}
>> +
>> +HostPCIDevice *host_pci_device_get(uint8_t bus, uint8_t dev, uint8_t func)
>
> Why skip domain in the interface?
> Also, HostPCIDevice structure is public so there is little value
> in allocating, just get it by pointer and init/cleanup.

You mean like pci_bus_new_inplace ? Ok, I'll do that.

>> +{
>> + Â ÂHostPCIDevice *d = NULL;
>> + Â Âunsigned long v = 0;
>> +
>> + Â Âd = g_new0(HostPCIDevice, 1);
>> +
>> + Â Âd->config_fd = -1;
>> + Â Âd->domain = 0;
>> + Â Âd->bus = bus;
>> + Â Âd->dev = dev;
>> + Â Âd->func = func;
>> +
>> + Â Âif (host_pci_config_fd(d) == -1) {
>> + Â Â Â Âgoto error;
>> + Â Â}
>> + Â Âif (get_resource(d) != 0) {
>
> just get_resource(d).
>
>> + Â Â Â Âgoto error;
>> + Â Â}
>> +
>> + Â Âif (get_hex_value(d, "vendor", &v)) {
>> + Â Â Â Âgoto error;
>> + Â Â}
>> + Â Âd->vendor_id = v;
>> + Â Âif (get_hex_value(d, "device", &v)) {
>> + Â Â Â Âgoto error;
>> + Â Â}
>> + Â Âd->device_id = v;
>> + Â Âd->is_virtfn = pci_dev_is_virtfn(d);
>> +
>> + Â Âreturn d;
>> +error:
>> + Â Âif (d->config_fd >= 0) {
>> + Â Â Â Âclose(d->config_fd);
>> + Â Â}
>> + Â Âg_free(d);
>> + Â Âreturn NULL;
>> +}
>> +
>> +void host_pci_device_put(HostPCIDevice *d)
>> +{
>> + Â Âif (d->config_fd >= 0) {
>> + Â Â Â Âclose(d->config_fd);
>> + Â Â}
>> + Â Âg_free(d);
>> +}
>> diff --git a/hw/host-pci-device.h b/hw/host-pci-device.h
>> new file mode 100644
>> index 0000000..c8880eb
>> --- /dev/null
>> +++ b/hw/host-pci-device.h
>> @@ -0,0 +1,75 @@
>> +#ifndef HW_HOST_PCI_DEVICE
>> +# Âdefine HW_HOST_PCI_DEVICE
>
> Don't put space after #.
>
> Also HOST_PCI_DEVICE_H would be less likely to confuse.
>
>> +
>> +#include "pci.h"
>> +
>> +/*
>> + * from linux/ioport.h
>> + * IO resources have these defined flags.
>> + */
>> +#define IORESOURCE_BITS     0x000000ff   Â/* Bus-specific bits */
>> +
>> +#define IORESOURCE_TYPE_BITS Â Â0x00000f00 Â Â Â/* Resource type */
>> +#define IORESOURCE_IO Â Â Â Â Â 0x00000100
>> +#define IORESOURCE_MEM Â Â Â Â Â0x00000200
>> +#define IORESOURCE_IRQ Â Â Â Â Â0x00000400
>> +#define IORESOURCE_DMA Â Â Â Â Â0x00000800
>> +
>> +#define IORESOURCE_PREFETCH Â Â 0x00001000 Â Â Â/* No side effects */
>> +#define IORESOURCE_READONLY Â Â 0x00002000
>> +#define IORESOURCE_CACHEABLE Â Â0x00004000
>> +#define IORESOURCE_RANGELENGTH Â0x00008000
>> +#define IORESOURCE_SHADOWABLE Â 0x00010000
>> +
>> +#define IORESOURCE_SIZEALIGN Â Â0x00020000 Â Â Â/* size indicates alignment 
>> */
>> +#define IORESOURCE_STARTALIGN Â 0x00040000 Â Â Â/* start field is alignment 
>> */
>> +
>> +#define IORESOURCE_MEM_64 Â Â Â 0x00100000
>> +
>> + Â Â/* Userland may not map this resource */
>> +#define IORESOURCE_EXCLUSIVE Â Â0x08000000
>> +#define IORESOURCE_DISABLED Â Â 0x10000000
>> +#define IORESOURCE_UNSET Â Â Â Â0x20000000
>> +#define IORESOURCE_AUTO Â Â Â Â 0x40000000
>> + Â Â/* Driver has marked this resource busy */
>> +#define IORESOURCE_BUSY Â Â Â Â 0x80000000
>> +
>
> Why do above make sense in an API?
> Abstract it in some reasonable way, don't just expose
> flags from sysfs as is.

Ok.

>> +
>
> kill extra empty lines
>
>> +typedef struct HostPCIIORegion {
>> + Â Âunsigned long flags;
>> + Â Âpcibus_t base_addr;
>> + Â Âpcibus_t size;
>> +} HostPCIIORegion;
>> +
>> +typedef struct HostPCIDevice {
>> + Â Âuint16_t domain;
>> + Â Âuint8_t bus;
>> + Â Âuint8_t dev;
>> + Â Âuint8_t func;
>> +
>> + Â Âuint16_t vendor_id;
>> + Â Âuint16_t device_id;
>> +
>> + Â ÂHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
>> + Â ÂHostPCIIORegion rom;
>> +
>> + Â Âbool is_virtfn;
>> +
>> + Â Âint config_fd;
>> +} HostPCIDevice;
>> +
>> +HostPCIDevice *host_pci_device_get(uint8_t bus, uint8_t dev, uint8_t func);
>> +void host_pci_device_put(HostPCIDevice *pci_dev);
>> +
>> +int host_pci_get_byte(HostPCIDevice *d, int pos, uint8_t *p);
>> +int host_pci_get_word(HostPCIDevice *d, int pos, uint16_t *p);
>> +int host_pci_get_long(HostPCIDevice *d, int pos, uint32_t *p);
>> +int host_pci_get_block(HostPCIDevice *d, int pos, uint8_t *buf, int len);
>> +int host_pci_set_byte(HostPCIDevice *d, int pos, uint8_t data);
>> +int host_pci_set_word(HostPCIDevice *d, int pos, uint16_t data);
>> +int host_pci_set_long(HostPCIDevice *d, int pos, uint32_t data);
>> +int host_pci_set_block(HostPCIDevice *d, int pos, uint8_t *buf, int len);
>> +
>> +uint32_t host_pci_find_ext_cap_offset(HostPCIDevice *s, uint32_t cap);
>> +
>> +#endif /* !HW_HOST_PCI_DEVICE */
>> --
>> Anthony PERARD
>



-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.